utf-8 – Page 39 – Tarik Billa

Is it possible to force Excel recognize UTF-8 CSV files automatically?

September 11, 2022 by Tarik

Alex is correct, but as you have to export to csv, you can give the users this advice when opening the csv files: Save the exported file as a csv Open Excel Import the data using Data–>Import External Data –> Import Data Select the file type of “csv” and browse to your file In the … Read more

Why does modern Perl avoid UTF-8 by default?

September 11, 2022 by Tarik

𝙎𝙞𝙢𝙥𝙡𝙚𝙨𝙩 ℞: 𝟕 𝘿𝙞𝙨𝙘𝙧𝙚𝙩𝙚 𝙍𝙚𝙘𝙤𝙢𝙢𝙚𝙣𝙙𝙖𝙩𝙞𝙤𝙣𝙨 Set your PERL_UNICODE envariable to AS. This makes all Perl scripts decode @ARGV as UTF‑8 strings, and sets the encoding of all three of stdin, stdout, and stderr to UTF‑8. Both these are global effects, not lexical ones. At the top of your source file (program, module, library, dohickey), prominently … Read more

Best way to convert text files between character sets?

September 11, 2022 by Tarik

Stand-alone utility approach iconv -f ISO-8859-1 -t UTF-8 in.txt > out.txt -f ENCODING the encoding of the input -t ENCODING the encoding of the output You don’t have to specify either of these arguments. They will default to your current locale, which is usually UTF-8.

UTF-8, UTF-16, and UTF-32

September 11, 2022 by Tarik

UTF-8 has an advantage in the case where ASCII characters represent the majority of characters in a block of text, because UTF-8 encodes these into 8 bits (like ASCII). It is also advantageous in that a UTF-8 file containing only ASCII characters has the same encoding as an ASCII file. UTF-16 is better where ASCII … Read more

Excel to CSV with UTF8 encoding [closed]

September 10, 2022 by Tarik

A simple workaround is to use Google Spreadsheet. Paste (values only if you have complex formulas) or import the sheet then download CSV. I just tried a few characters and it works rather well. NOTE: Google Sheets does have limitations when importing. See here. NOTE: Be careful of sensitive data with Google Sheets. EDIT: Another … Read more

What is the difference between UTF-8 and Unicode?

September 9, 2022 by Tarik

To expand on the answers others have given: We’ve got lots of languages with lots of characters that computers should ideally display. Unicode assigns each character a unique number, or code point. Computers deal with such numbers as bytes… skipping a bit of history here and ignoring memory addressing issues, 8-bit computers would treat an … Read more

Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence

September 8, 2022 by Tarik

Use the ensure_ascii=False switch to json.dumps(), then encode the value to UTF-8 manually: >>> json_string = json.dumps(“ברי צקלה”, ensure_ascii=False).encode(‘utf8’) >>> json_string b'”\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94″‘ >>> print(json_string.decode()) “ברי צקלה” If you are writing to a file, just use json.dump() and leave it to the file object to encode: with open(‘filename’, ‘w’, encoding=’utf8′) as json_file: json.dump(“ברי צקלה”, json_file, … Read more

What’s the difference between UTF-8 and UTF-8 without BOM?

September 6, 2022 by Tarik

The UTF-8 BOM is a sequence of bytes at the start of a text stream (0xEF, 0xBB, 0xBF) that allows the reader to more reliably guess a file as being encoded in UTF-8. Normally, the BOM is used to signal the endianness of an encoding, but since endianness is irrelevant to UTF-8, the BOM is … Read more

What’s the difference between utf8_general_ci and utf8_unicode_ci?

August 30, 2022 by Tarik

For those people still arriving at this question in 2020 or later, there are newer options that may be better than both of these. For example, utf8_unicode_520_ci. All these collations are for the UTF-8 character encoding. The differences are in how text is sorted and compared. _unicode_ci and _general_ci are two different sets of rules … Read more

UTF-8 all the way through

August 29, 2022 by Tarik

Data Storage: Specify the utf8mb4 character set on all tables and text columns in your database. This makes MySQL physically store and retrieve values encoded natively in UTF-8. Note that MySQL will implicitly use utf8mb4 encoding if a utf8mb4_* collation is specified (without any explicit character set). In older versions of MySQL (< 5.5.3), you’ll … Read more