My first thought was that the encoding is Unicode when parsing XML from a .NET string type.
It seems, though that XDocument’s parsing is quite forgiving with respect to this.
The problem is actually related to the UTF8 preamble/byte order mark (BOM), which is a three-byte signature optionally present at the start of a UTF-8 stream. These three bytes are a hint as to the encoding being used in the stream.
You can determine the preamble of an encoding by calling the GetPreamble
method on an instance of the System.Text.Encoding
class.
For example:
// returns { 0xEF, 0xBB, 0xBF }
byte[] preamble = Encoding.UTF8.GetPreamble();
The preamble should be handled correctly by XmlTextReader
, so simply load your XDocument
from an XmlTextReader
:
XDocument xml;
using (var xmlStream = new MemoryStream(fileContent))
using (var xmlReader = new XmlTextReader(xmlStream))
{
xml = XDocument.Load(xmlReader);
}