What’s the difference between an “encoding,” a “character set,” and a “code page”?

A ‘character set’ is just what it says: a properly-specified list of distinct characters. An ‘encoding’ is a mapping between a character set (typically Unicode today) and a (usually byte-based) technical representation of the characters. UTF-8 is an encoding, but not a character set. It is an encoding of the Unicode character set(*). The confusion … Read more

How do I correct the character encoding of a file?

Follow these steps with Notepad++ 1- Copy the original text 2- In Notepad++, open new file, change Encoding -> pick an encoding you think the original text follows. Try as well the encoding “ANSI” as sometimes Unicode files are read as ANSI by certain programs 3- Paste 4- Then to convert to Unicode by going … Read more

Text was truncated or one or more characters had no match in the target code page When importing from Excel file

I assume you’re trying to import this using an Excel Source in the SSIS dialog? If so, the problem is probably that SSIS samples some number of rows at the beginning of your spreadsheet when it creates the Excel source. If on the [ShortDescription] column it doesn’t notice anything too large, it will default to … Read more

How do you properly use WideCharToMultiByte

Here’s a couple of functions (based on Brian Bondy’s example) that use WideCharToMultiByte and MultiByteToWideChar to convert between std::wstring and std::string using utf8 to not lose any data. // Convert a wide Unicode string to an UTF8 string std::string utf8_encode(const std::wstring &wstr) { if( wstr.empty() ) return std::string(); int size_needed = WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), … Read more

tech