How to determine a string is english or arabic?

Here is a simple logic that I just tried:

  public static boolean isProbablyArabic(String s) {
    for (int i = 0; i < s.length();) {
        int c = s.codePointAt(i);
        if (c >= 0x0600 && c <= 0x06E0)
            return true;
        i += Character.charCount(c);            
    }
    return false;
  }

It declares the text as arabic if and only if an arabic unicode code point is found in the text. You can enhance this logic to be more suitable for your needs.

The range 0600 – 06E0 is the code point range of Arabic characters and symbols (See Unicode tables)

Leave a Comment