An InputStream
reads raw octet (8 bit) data. In Java, the byte
type is equivalent to the char
type in C. In C, this type can be used to represent character data or binary data. In Java, the char
type shares greater similarities with the C wchar_t
type.
An InputStreamReader
then will transform data from some encoding into UTF-16. If “a你们” is encoded as UTF-8 on disk, it will be the byte sequence 61 E4 BD A0 E4 BB AC
. When you pass the InputStream
to InputStreamReader
with the UTF-8 encoding, it will be read as the char sequence 0061 4F60 4EEC
.
The character encoding API in Java contains the algorithms to perform this transformation. You can find a list of encodings supported by the Oracle JRE here. The ICU project is a good place to start if you want to understand the internals of how this works in practice.
As Alexander Pogrebnyak points out, you should almost always provide the encoding explicitly. byte
-to-char
methods that do not specify an encoding rely on the JRE default, which is dependent on operating systems and user settings.