Parsing xml string to an xml document fails if the string begins with section

Question

My first thought was that the encoding is Unicode when parsing XML from a .NET string type.
It seems, though that XDocument’s parsing is quite forgiving with respect to this.

The problem is actually related to the UTF8 preamble/byte order mark (BOM), which is a three-byte signature optionally present at the start of a UTF-8 stream. These three bytes are a hint as to the encoding being used in the stream.

You can determine the preamble of an encoding by calling the GetPreamble method on an instance of the System.Text.Encoding class.
For example:

// returns { 0xEF, 0xBB, 0xBF }
byte[] preamble = Encoding.UTF8.GetPreamble();

The preamble should be handled correctly by XmlTextReader, so simply load your XDocument from an XmlTextReader:

XDocument xml;
using (var xmlStream = new MemoryStream(fileContent))
using (var xmlReader = new XmlTextReader(xmlStream))
{
    xml = XDocument.Load(xmlReader);
}

Leave a Comment Cancel reply