Get CSV Data from Clipboard (pasted from Excel) that contains accented characters

Excel stores the string on the clipboard using the Unicode character encoding. The reason you get a square when you try to read the string in ANSI is that there is no representation for that character in your system’s ANSI codepage. You should just use Unicode. If you’re going to be dealing with localization issues, … Read more

Replacing diacritics in Javascript

In modern browsers and node.js you can use unicode normalization to decompose those characters followed by a filtering regex. str.normalize(‘NFKD’).replace(/[^\w]/g, ”) If you wanted to allow characters such as whitespaces, dashes, etc. you should extend the regex to allow them. str.normalize(‘NFKD’).replace(/[^\w\s.-_\/]/g, ”) var str=”áàâäãéèëêíìïîóòöôõúùüûñçăşţ”; var asciiStr = str.normalize(‘NFKD’).replace(/[^\w]/g, ”); console.info(str, asciiStr); NOTES: This method does … Read more

Test if string contains only letters (a-z + é ü ö ê å ø etc..)

I don’t know the actual reason for doing this, but if you want to use it as a pre-check for, say, login names oder user nicknames, I’d suggest you enter the characters yourself and don’t use the whole ‘alpha’ characters you’ll find in unicode, because you probably won’t find an optical difference in the following … Read more

Java string searching ignoring accents

Make use of java.text.Normalizer and a shot of regex to get rid of the diacritics. public static String removeDiacriticalMarks(String string) { return Normalizer.normalize(string, Form.NFD) .replaceAll(“\\p{InCombiningDiacriticalMarks}+”, “”); } Which you can use as follows: String value = “Joáo”; String comparisonMaterial = removeDiacriticalMarks(value); // Joao

How to remove accents from values in columns?

The pandas method is to use the vectorised str.normalize combined with str.decode and str.encode: In [60]: df[‘Country’].str.normalize(‘NFKD’).str.encode(‘ascii’, errors=”ignore”).str.decode(‘utf-8’) Out[60]: 0 Aland Islands 1 Aland Islands 2 Albania 3 Albania 4 Albania Name: Country, dtype: object So to do this for all str dtypes: In [64]: cols = df.select_dtypes(include=[np.object]).columns df[cols] = df[cols].apply(lambda x: x.str.normalize(‘NFKD’).str.encode(‘ascii’, errors=”ignore”).str.decode(‘utf-8’)) df … Read more

How to protect against diacritics such as Zalgo text

is there even a limit?! Not intrinsically in Unicode. There is the concept of a ‘Stream-Safe’ format in UAX-15 that sets a limit of 30 combiners… Unicode strings in general are not guaranteed to be Stream-Safe, but this could certainly be taken as a sign that Unicode don’t intend to standardise new characters that would … Read more

Remove accents from String

java.text.Normalizer is there in Android (on latest versions anyway). You can use it. EDIT For reference, here is how to use Normalizer: string = Normalizer.normalize(string, Normalizer.Form.NFD); string = string.replaceAll(“[^\\p{ASCII}]”, “”); (pasted from the link in comments below)

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)