regex – detect cjk characters in a sentence using ruby! – Education Career Blog

$str = "This is a string containing 中文 characters. Some more characters - 中华人民共和国 ";

How do I detect if there are chinese characters in this string but i have no idea how to do it.Any clues?

,

Regular expressions are not the way to go here. You should rather have code similar to the following (disclaimer: I’m not a Ruby programmer):

# coding: utf-8
str = "This is a string containing 中文 characters. Some more characters - 中华人民共和国 ";

str.each_char { |c|
  if c.ord >= 0x4E00 && c.ord <= 0x9FFF
    # found a chinese character - process it somehow.
    puts c
  end   
}

You’re esentially checking for if the character is in the range of common Chinese characters in Unicode. This is not the complete range of hanzi (Chinese characters). If you need to detect rare or historic characters you’ll simply have to add the ranges listed here to the boolean check.

Leave a Comment