Instead of blacklisting some elements, how about creating a whitelist of the characters you do wish to keep? This way you don’t need to worry about every new emoji being added.
String characterFilter = "[^\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{Cf}\\p{Cs}\\s]";
String emotionless = aString.replaceAll(characterFilter,"");
So:
[\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{Cf}\\p{Cs}\\s]is a range representing all numeric (\\p{N}), letter (\\p{L}), mark (\\p{M}), punctuation (\\p{P}), whitespace/separator (\\p{Z}), other formatting (\\p{Cf}) and other characters aboveU+FFFFin Unicode (\\p{Cs}), and newline (\\s) characters.\\p{L}specifically includes the characters from other alphabets such as Cyrillic, Latin, Kanji, etc.- The
^in the regex character set negates the match.
Example:
String str = "hello world _# ηγγγγγγ«γ‘γ―οΌγη§γ―γΈγ§γ³γ¨η³γγΎγγπ₯";
System.out.print(str.replaceAll("[^\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{Cf}\\p{Cs}\\s]",""));
// Output:
// "hello world _# ηγγγγγγ«γ‘γ―οΌγη§γ―γΈγ§γ³γ¨η³γγΎγγ"
If you need more information, check out the Java documentation for regexes.