I want to allow any valid UTF-8 characters in usernames and passwords.
Abandon all hope. Basic Authentication and Unicode don’t mix.
There is no standard(*) for how to encode non-ASCII characters into a Basic Authentication username:password token before base64ing it. Consequently every browser does something different:
- Opera uses UTF-8;
- IE uses the system’s default codepage (which you have no way of knowing, other than it’s never UTF-8), and silently mangles characters that don’t fit into to it using the Windows ‘guess a random character that looks a bit like the one you wanted or maybe just not’ secret recipe;
- Mozilla uses only the lower byte of character codepoints, which has the effect of encoding to ISO-8859-1 and mangling the non-8859-1 characters irretrievably… except when doing XMLHttpRequests, in which case it uses UTF-8;
- Safari and Chrome encode to ISO-8859-1, and fail to send the authorization header at all when a non-8859-1 character is used.
*: some people interpret the standard to say that either:
- it should be always ISO-8859-1, due to that being the default encoding for including raw 8-bit characters directly included in headers;
- it should be encoded using RFC2047 rules, somehow.
But neither of these proposals are on topic for inclusion in a base64-encoded auth token, and the RFC2047 reference in the HTTP spec really doesn’t work at all since all the places it might potentially be used are explicitly disallowed by the ‘atom context’ rules of RFC2047 itself, even if HTTP headers honoured the rules and extensions of the RFC822 family, which they don’t.
In summary: ugh. There is little-to-no hope of this ever being fixed in the standard or in the browsers other than Opera. It’s just one more factor driving people away from HTTP Basic Authentication in favour of non-standard and less-accessible cookie-based authentication schemes. Shame really.