I wouldn’t offer this answer except that you worked so hard to document it and it’s been upvoted with no answer after a month. So, here goes. Your only choices appear to be to change the data or change the tool.
Probably, I am clearly doing something wrong and missing the obvious. Could someone please explain to me what I am doing wrong here?
When the tool is broken and the vendor doesn’t care, it’s mistake to keep trying. It’s time to switch. You put a lot of effort into researching exactly how it’s broken and demonstrating it violates not only the RFC but the tool’s own prior version. How much more evidence do you need?
CSV is a boat anchor too. If you have the option, you’re better off using an ordinary delimited file format. For lots of applications, tab-delimited is good. The best delimiter IMO is ‘\’ because that character has no place in English text. (On the other hand it won’t work for data containing Windows pathnames.)
CSV has two problems as an exchange format. First, it’s not all that standard; different applications recognize different versions, whatever the RFC may say. Second (and related) is that it doesn’t constitute a regular language in CS terms, which is why it can’t be parsed as a regular expression. Compare with ^([^\t]*\t)*[\t]*$
for a tab-delimited line. The practical implication of the complexity of CSV’s definition is (see above) the relative dearth of tools to handle them and their tendency to be incompatible, particularly during the wee hours.
If you give CSV and DTS the boot, you have good options, one of which is bcp.exe
. It’s very fast, and safe because Microsoft hasn’t been tempted to update it for years. I don’t know much about DTS, but in case you have to use it for automation, IIRC there is a way to invoke external utilities. Beware though, that bcp.exe
does not return error status to the shell dependably.
If you’re determined to use DTS and to stick with CSV, then really your best remaining option is to write a view that prepares the data appropriately for it. I would, if backed into that corner, create a schema called, say, “DTS2012CSV”, so that I could write select * from DTS2012CSV.tablename
, giving anyone who cares a fighting chance to understand it (because you’ll document it, won’t you, in comments in the view text?). If need be, others can copy its technique for other broken extracts.
HTH.