How to replace � in a string
That’s the Unicode Replacement Character, \uFFFD. (info) Something like this should work: String strImport = “For some reason my �double quotes� were lost.”; strImport = strImport.replaceAll(“\uFFFD”, “\””);
That’s the Unicode Replacement Character, \uFFFD. (info) Something like this should work: String strImport = “For some reason my �double quotes� were lost.”; strImport = strImport.replaceAll(“\uFFFD”, “\””);
I can indeed confirm that the Facebook download data is incorrectly encoded; a Mojibake. The original data is UTF-8 encoded but was decoded as Latin-1 instead. I’ll make sure to file a bug report. What this means is that any non-ASCII character in the string data was encoded twice. First to UTF-8, and then the … Read more
These are utf-8 encoded characters. Use utf8_decode() to convert them to normal ISO-8859-1 characters.
To convert to HTML entities: <?php echo mb_convert_encoding( file_get_contents(‘http://www.tvrage.com/quickinfo.php?show=Surviver&ep=20×02&exact=0’), “HTML-ENTITIES”, “UTF-8” ); ?> See docs for mb_convert_encoding for more encoding options.
Three words for you: Byte Order Mark (BOM) That’s the representation for the UTF-8 BOM in ISO-8859-1. You have to tell your editor to not use BOMs or use a different editor to strip them out. To automatize the BOM’s removal you can use awk as shown in this question. As another answer says, the … Read more
So what’s the problem, It’s a ’ (RIGHT SINGLE QUOTATION MARK – U+2019) character which is being decoded as CP-1252 instead of UTF-8. If you check the encodings table, then you see that this character is in UTF-8 composed of bytes 0xE2, 0x80 and 0x99. If you check the CP-1252 code page layout, then you’ll … Read more