mojibake – Tarik Billa

How to replace ï¿½ in a string

September 8, 2023 by Tarik

That’s the Unicode Replacement Character, \uFFFD. (info) Something like this should work: String strImport = “For some reason my �double quotes� were lost.”; strImport = strImport.replaceAll(“\uFFFD”, “\””);

Facebook JSON badly encoded

April 24, 2023 by Tarik

I can indeed confirm that the Facebook download data is incorrectly encoded; a Mojibake. The original data is UTF-8 encoded but was decoded as Latin-1 instead. I’ll make sure to file a bug report. What this means is that any non-ASCII character in the string data was encoded twice. First to UTF-8, and then the … Read more

How to convert these strange characters? (Ã«, Ã, Ã¬, Ã¹, Ã)

March 29, 2023 by Tarik

These are utf-8 encoded characters. Use utf8_decode() to convert them to normal ISO-8859-1 characters.

Getting â€™ instead of an apostrophe(‘) in PHP

March 6, 2023 by Tarik

To convert to HTML entities: <?php echo mb_convert_encoding( file_get_contents(‘http://www.tvrage.com/quickinfo.php?show=Surviver&ep=20×02&exact=0’), “HTML-ENTITIES”, “UTF-8” ); ?> See docs for mb_convert_encoding for more encoding options.

How do I remove ï»¿ from the beginning of a file?

November 13, 2022 by Tarik

Three words for you: Byte Order Mark (BOM) That’s the representation for the UTF-8 BOM in ISO-8859-1. You have to tell your editor to not use BOMs or use a different editor to strip them out. To automatize the BOM’s removal you can use awk as shown in this question. As another answer says, the … Read more

“â€™” showing on page instead of ” ‘ “

November 11, 2022 by Tarik

So what’s the problem, It’s a ’ (RIGHT SINGLE QUOTATION MARK – U+2019) character which is being decoded as CP-1252 instead of UTF-8. If you check the encodings table, then you see that this character is in UTF-8 composed of bytes 0xE2, 0x80 and 0x99. If you check the CP-1252 code page layout, then you’ll … Read more