一周叮、基礎(chǔ)知識(shí)
Emoji 實(shí)際上是 UTF-8 (Unicode) 字符集上的特殊字符运授,多數(shù)基本 Emoji 都被分配到 Unicode 編碼表 1 號(hào)平面的 U+1F300–1F6FF 和 U+1F900–1FAFF 兩個(gè)區(qū)域,由2個(gè)字符組成董瞻。
膚色修飾:大多數(shù)與人相關(guān)的 Emoji 默認(rèn)是黃色的,所以后來引入了五個(gè)新碼點(diǎn)作為修飾符:
U+1F3FB
江锨、U+1F3FC
、U+1F3FD
、U+1F3FE
酱塔、U+1F3FF
。膚色修飾符追加到現(xiàn)有的 Emoji 后形成新的樣式:
U+1F44B
(?? ) +U+1F3FD
= ????符號(hào)變體或組合:一個(gè)普通的字后連接一個(gè)或多個(gè)變體危虱、組合標(biāo)識(shí)(字符)羊娃,組合形成的 Emoji :
U+25C0
+U+FE0F
= ??
U+27A1
+U+FE0F
= ??
1
+U+FE0F
+U+20E3
= 1??國旗:每個(gè)國旗由2個(gè)地區(qū)標(biāo)識(shí)符組合而成,地區(qū)標(biāo)識(shí)符的對(duì)應(yīng)碼點(diǎn)范圍為
U+1F1E6
~U+1F1FF
埃跷,等同于2個(gè)指定范圍的普通 Emoji 字符組成蕊玷。
U+1F1E8
+U+1F1F3
= ????零寬度連接符(ZWJ):多個(gè)基礎(chǔ) Emoji 通過零寬度連接符(
U+200D
)形成的復(fù)雜 Emoji:
??+U+200D
+??= ????
??+U+200D
+??+U+200D
+??= ??????
??+U+200D
+??+U+200D
+??+U+200D
+??= ????????序列:一個(gè)基礎(chǔ) Emoji 加上多個(gè)標(biāo)簽字符 (
U+E0020
~U+E007F
)并以 Tag Cancel(U+E007
)結(jié)尾,組合形成一個(gè)復(fù)雜 Emoji:
U+1F3F4
(??) +U+E0067
+U+E0062
+U+E0065
+U+E006E
+U+E0067
+U+E007F
= ??特殊符號(hào):
特殊符號(hào)只有1個(gè)字符捌蚊,有些符號(hào)在某些環(huán)境下會(huì)被當(dāng)做Emoj處理:?集畅、?、?;
Unicode 只是約定了碼點(diǎn)到 emoji 的映射關(guān)系缅糟,并沒有約定 Emoji 圖形挺智,每個(gè) Emoji 字體文件可以按照自己的想法設(shè)計(jì) Emoji。
二窗宦、解決方案
- 除了一些特殊符號(hào)形式的 Emoji赦颇,其他Emoji至少有2個(gè)字符,所以先根據(jù)第二個(gè)字符類型判斷是否為Emoji赴涵,使用
Character.UnicodeBlock.of
和Character.getType
方法判定每個(gè)字符的類型媒怯。 - 通過第二個(gè)字符類型判斷當(dāng)前2個(gè)字符為 Emoji 后:
1)判斷是否有后續(xù)修飾
2)判斷處理國旗類型;判斷處理膚色修飾髓窜;判斷處理 Emoji 序列標(biāo)簽扇苞;判斷處理零寬度連接符;判斷處理連續(xù)變體寄纵、組合標(biāo)識(shí)鳖敷;按照普通 Emoji 處理; - 處理單字符的特殊符號(hào)程拭,這一類型內(nèi)有的屬于 Emoji定踱,有的不是,目前全部簡(jiǎn)單的按照普通 Emoji 處理恃鞋;
三崖媚、完整代碼
package com.zpf.tool;
import java.util.List;
public class EmojiUtil {
public static boolean isEmojiNationalFlag(int codePoint) {
return codePoint >= 127462 && codePoint <= 127487;
}
// String str = new String(new int[]{0x1F44B, 0x1F3FD}, 0, 2);
public static boolean isEmojiSkinColor(int codePoint) {
return codePoint >= 127995 && codePoint <= 127999;
}
// String str = new String(new int[]{0x1F3F4, 0xE0067, 0xE0062, 0xE0065, 0xE006E, 0xE0067, 0xE007F}, 0, 7);
public static boolean isEmojiTagEnd(int codePoint) {
return codePoint == 917631;
}
public static boolean isEmojiTagSpec(int codePoint) {
return codePoint >= 917536 && codePoint <= 917630;
}
public static boolean isEmojiDecorateBlock(Character.UnicodeBlock block) {
if (block == null) {
return false;
}
return block.equals(Character.UnicodeBlock.VARIATION_SELECTORS)
|| block.equals(Character.UnicodeBlock.VARIATION_SELECTORS_SUPPLEMENT)
|| block.equals(Character.UnicodeBlock.COMBINING_HALF_MARKS)
|| block.equals(Character.UnicodeBlock.COMBINING_MARKS_FOR_SYMBOLS)
|| block.equals(Character.UnicodeBlock.COMBINING_DIACRITICAL_MARKS)
|| block.equals(Character.UnicodeBlock.COMBINING_DIACRITICAL_MARKS_SUPPLEMENT);
}
public static void pickAllEmoji(CharSequence data, StringBuilder removeResult, List<String> emojiList) {
if (removeResult == null && emojiList == null) {
return;
}
if (removeResult != null) {
removeResult.delete(0, removeResult.length());
}
if (emojiList != null) {
emojiList.clear();
}
if (data == null || data.length() == 0) {
return;
}
StringBuilder emojiBuilder = new StringBuilder();
int i = 0;
int j;
Character.UnicodeBlock block;
while (i < data.length()) {
if (i + 1 < data.length()) {
block = Character.UnicodeBlock.of(data.charAt(i + 1));
if (isEmojiDecorateBlock(block) || Character.UnicodeBlock.LOW_SURROGATES.equals(block)) {
if (i + 2 >= data.length()) {
emojiBuilder.append(data, i, i + 2);
break;
}
j = handleNationalFlag(data, i, emojiBuilder, emojiList);
if (i != j) {
i = j;
continue;
}
j = handleHumanSkin(data, i, emojiBuilder, emojiList);
if (i != j) {
i = j;
continue;
}
j = handleTagSequence(data, i, emojiBuilder, emojiList);
if (i != j) {
i = j;
continue;
}
emojiBuilder.append(data, i, i + 2);
i = handleNextChar(data, i + 2, emojiBuilder, emojiList);
continue;
}
}
recordEmoji(emojiBuilder, emojiList);
int type = Character.getType(data.charAt(i));
if (type == (int) Character.OTHER_SYMBOL) {//特殊符號(hào)一律按照Emoji處理
if (emojiList != null) {
emojiList.add(String.valueOf(data.charAt(i)));
}
} else if (removeResult != null) {
removeResult.append(data.charAt(i));
}
i++;
}
recordEmoji(emojiBuilder, emojiList);
}
private static int handleNextChar(CharSequence data, int i, StringBuilder emojiBuilder, List<String> emojiList) {
if (i >= data.length()) {
return i;
}
char nextChar = data.charAt(i);
if (nextChar == '\u200D') {//零寬度連接符
emojiBuilder.append(nextChar);
return i + 1;
}
int j = i;
Character.UnicodeBlock block;
while (j < data.length()) {
nextChar = data.charAt(j);
block = Character.UnicodeBlock.of(nextChar);
if (isEmojiDecorateBlock(block)) {
emojiBuilder.append(nextChar);
j++;
} else {
break;
}
}
if (i != j) {
recordEmoji(emojiBuilder, emojiList);
}
return j;
}
private static int handleNationalFlag(CharSequence data, int i, StringBuilder emojiBuilder, List<String> emojiList) {
int codePoint = Character.codePointAt(data, i);
if (isEmojiNationalFlag(codePoint)) {//處理國旗類型
recordEmoji(emojiBuilder, emojiList);//提交未處理
if (i + 3 < data.length()) {
codePoint = Character.codePointAt(data, i + 2);
if (isEmojiNationalFlag(codePoint)) {
emojiBuilder.append(data, i, i + 4);
recordEmoji(emojiBuilder, emojiList);
i = i + 4;
}
}
i = i + 2;
}
return i;
}
private static int handleHumanSkin(CharSequence data, int i, StringBuilder emojiBuilder, List<String> emojiList) {
if (i + 3 >= data.length()) {
return i;
}
int codePoint = Character.codePointAt(data, i + 2);
if (isEmojiSkinColor(codePoint)) {//膚色修飾
emojiBuilder.append(data, i, i + 4);
recordEmoji(emojiBuilder, emojiList);
i = i + 4;
}
return i;
}
private static int handleTagSequence(CharSequence data, int i, StringBuilder emojiBuilder, List<String> emojiList) {
if (i + 3 >= data.length()) {
return i;
}
int codePoint = Character.codePointAt(data, i + 2);
if (isEmojiTagSpec(codePoint)) {
emojiBuilder.append(data, i, i + 4);
i = i + 4;
while (i < data.length()) {
codePoint = Character.codePointAt(data, i);
if (isEmojiTagSpec(codePoint)) {
emojiBuilder.append(data, i, i + 2);
i = i + 2;
} else if (isEmojiTagEnd(codePoint)) {
emojiBuilder.append(data, i, i + 2);
recordEmoji(emojiBuilder, emojiList);
i = i + 2;
break;
} else { //error
break;
}
}
emojiBuilder.delete(0, emojiBuilder.length());
} else if (isEmojiTagEnd(codePoint)) {
emojiBuilder.append(data, i, i + 4);
recordEmoji(emojiBuilder, emojiList);
i = i + 4;
}
return i;
}
private static void recordEmoji(StringBuilder builder, List<String> emojiList) {
if (builder != null && builder.length() > 0) {
if (emojiList != null) {
emojiList.add(builder.toString());
}
builder.delete(0, builder.length());
}
}
}
2024.03.13