Should we HTML-encode special characters before storing them in the database?

Don’t HTML-encode your characters before storage. You should store as pure a form of your data as possible. HTML encoding is needed because you are going to display the data on an HTML page, so do the encoding during the processing of the data to create the page. For example, suppose you decide you’re also … Read more

Guessing the encoding of text represented as byte[] in Java

The following method solves the problem using juniversalchardet, which is a Java port of Mozilla’s encoding detection library. public static String guessEncoding(byte[] bytes) { String DEFAULT_ENCODING = “UTF-8”; org.mozilla.universalchardet.UniversalDetector detector = new org.mozilla.universalchardet.UniversalDetector(null); detector.handleData(bytes, 0, bytes.length); detector.dataEnd(); String encoding = detector.getDetectedCharset(); detector.reset(); if (encoding == null) { encoding = DEFAULT_ENCODING; } return encoding; } The … Read more

Are character set names case-sensitive in HTTP?

[Here is the result of my research.] RFC 2616 clause 3.4 says the following: HTTP character sets are identified by case-insensitive tokens. The complete set of tokens is defined by the IANA Character Set registry [19]. charset = token The IANA Character Set registry is now maintained here. At the very top of this document … Read more

Get/set file encoding with javascript’s FileReader

If your HTML page is in UTF-8 and your file is in ISO-8859-1. This is working: reader.readAsText(file, ‘ISO-8859-1′); I don’t have any Windows-1251 file so I was not able to test it but it looks like that the ‘CP1251’ is supported (by Google Chrome at least), so: reader.readAsText(file, ‘CP1251’); If none of this is working. … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)