How to print UTF-8 strings to std::cout on Windows?

At last, I’ve got it working. This answer combines input from Miles Budnek, Paul, and mkluwe with some research of my own. First, let me start with code that will work on Windows 10. After that, I’ll walk you through the code and explain why it won’t work out of the box on Windows 7.

#include <string>
#include <iostream>
#include <Windows.h>
#include <cstdio>

int main() {
    // Set console code page to UTF-8 so console known how to interpret string data
    SetConsoleOutputCP(CP_UTF8);

    // Enable buffering to prevent VS from chopping up UTF-8 byte sequences
    setvbuf(stdout, nullptr, _IOFBF, 1000);

    std::string test = u8"Greek: αβγδ; German: Übergrößenträger";
    std::cout << test << std::endl;
}

The code starts by setting the code page, as suggested by Miles Budnik. This will tell the console to interpret the byte stream it receives as UTF-8, not as some variation of ANSI.

Next, there is a problem in the STL code that comes with Visual Studio. std::cout prints its data to a stream buffer of type std::basic_filebuf. When that buffer receives a string (via std::basic_streambuf::sputn()), it won’t pass it on to the underlying file as a whole. Instead, it will pass each byte separately. As explained by mkluwe, if the console receives a UTF-8 byte sequence as individual bytes, it won’t interpret them as a single code point. Instead, it will treat them as multiple characters. Each byte within a UTF-8 byte sequence is an invalid code point on its own, so you’ll see �’s instead. There is a related bug report for Visual Studio, but it was closed as By Design. The workaround is to enable buffering for the stream. As an added bonus, that will give you better performance. However, you may now need to regularly flush the stream as I do with std::endl, or your output may not show.

Lastly, the Windows console supports both raster fonts and TrueType fonts. As pointed out by Paul, raster fonts will simply ignore the console’s code page. So non-ASCII Unicode characters will only work if the console is set to a TrueType Font. Up until Windows 7, the default is a raster font, so the user will have to change it manually. Luckily, Windows 10 changes the default font to Consolas, so this part of the problem should solve itself with time.

Leave a Comment