Using JavaScript to check whether a string contains Japanese characters (including kanji)

Question

Check whether this works or not. I found this website that seems to list all the characters in Unicode that might be used in Japanese text.

The corresponding regex (for single character) would be:

/[\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf]/
  -------------_____________-------------_____________-------------_____________
   Punctuation   Hiragana     Katakana    Full-width       CJK      CJK Ext. A
                                            Roman/      (Common &      (Rare)    
                                          Half-width    Uncommon)
                                           Katakana

The ranges are (as quoted from the site):

3000 - 303f: Japanese-style punctuation
3040 - 309f: Hiragana
30a0 - 30ff: Katakana
ff00 - ff9f: Full-width Roman characters and half-width Katakana
4e00 - 9faf: CJK unified ideographs – Common and uncommon Kanji
3400 - 4dbf: CJK unified ideographs Extension A – Rare Kanji

I have changed the ranges a bit:

I have changed from ff00 - ffef to ff00 - ff9f for Full-width Roman characters and half-width Katakana. The code points from ffa0 - ffdc contains Hangul half-width characters, which is not what you want. You may want to re-add the code points from ffe0 - ffef, but they are mostly half-width punctuations or full-width currency symbols.

You can check the site and take off any range you don’t want, or are sure that it will not appear in your input.

Leave a Comment Cancel reply