String.maketrans for English and Persian numbers

See unidecode library which converts all strings into UTF8. It is very useful in case of number input in different languages. In Python 2: >>> from unidecode import unidecode >>> a = unidecode(u”۰۱۲۳۴۵۶۷۸۹”) >>> a ‘0123456789’ >>> unidecode(a) ‘0123456789’ In Python 3: >>> from unidecode import unidecode >>> a = unidecode(“۰۱۲۳۴۵۶۷۸۹”) >>> a ‘0123456789’ >>> … Read more

Unicode Encode Error when writing pandas df to csv

You have unicode values in your DataFrame. Files store bytes, which means all unicode have to be encoded into bytes before they can be stored in a file. You have to specify an encoding, such as utf-8. For example, df.to_csv(‘path’, header=True, index=False, encoding=’utf-8′) If you don’t specify an encoding, then the encoding used by df.to_csv … Read more

TypeError: ufunc ‘subtract’ did not contain a loop with signature matching types dtype(‘

I got the same error, but in my case I am subtracting dict.key from dict.value. I have fixed this by subtracting dict.value for corresponding key from other dict.value. cosine_sim = cosine_similarity(e_b-e_a, w-e_c) here I got error because e_b, e_a and e_c are embedding vector for word a,b,c respectively. I didn’t know that ‘w’ is string, … Read more

UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xe9′ in position 7: ordinal not in range(128) [duplicate]

You need to encode Unicode explicitly before writing to a file, otherwise Python does it for you with the default ASCII codec. Pick an encoding and stick with it: f.write(printinfo.encode(‘utf8’) + ‘\n’) or use io.open() to create a file object that’ll encode for you as you write to the file: import io f = io.open(filename, … Read more

UnicodeDecodeError: (‘utf-8’ codec) while reading a csv file [duplicate]

Known encoding If you know the encoding of the file you want to read in, you can use pd.read_csv(‘filename.txt’, encoding=’encoding’) These are the possible encodings: https://docs.python.org/3/library/codecs.html#standard-encodings Unknown encoding If you do not know the encoding, you can try to use chardet, however this is not guaranteed to work. It is more a guess work. import … Read more

Removing unicode \u2026 like characters in a string in python2.7 [duplicate]

Python 2.x >>> s ‘This is some \\u03c0 text that has to be cleaned\\u2026! it\\u0027s annoying!’ >>> print(s.decode(‘unicode_escape’).encode(‘ascii’,’ignore’)) This is some text that has to be cleaned! it’s annoying! Python 3.x >>> s=”This is some \u03c0 text that has to be cleaned\u2026! it\u0027s annoying!” >>> s.encode(‘ascii’, ‘ignore’) b”This is some text that has to be … Read more

Why does ENcoding a string result in a DEcoding error (UnicodeDecodeError)?

“你好”.encode(‘utf-8′) encode converts a unicode object to a string object. But here you have invoked it on a string object (because you don’t have the u). So python has to convert the string to a unicode object first. So it does the equivalent of “你好”.decode().encode(‘utf-8′) But the decode fails because the string isn’t valid ascii. … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)