TStringList splitting bugs

Not much… One is that, these bugs show up rarely with test data, but not so rarely in real world.

All it takes is one case. Test data is not random data, one user with one failure case should submit the data and voilĂ , we’ve got a test case. If no one can provide test data, maybe there’s no bug/failure?

There’s no standard specification for CSV.

That one sure helps with the confusion. Without a standard specification, how do you prove something is wrong? If this is left to one’s own intuition, you might get into all kinds of troubles. Here’s some from my own happy interaction with government issued software; My application was supposed to export data in CSV format, and the government application was supposed to import it. Here’s what got us into a lot of trouble several years in a row:

  • How do you represent empty data? Since there’s no CSV standard, one year my friendly gov decided anything goes, including nothing (two consecutive commas). Next they decided only consecutive commas are OK, that is, Field,"",Field is not valid, should be Field,,Field. Had a lot of fun explaining to my customers that the gov app changed validation rules from one week to the next…
  • Do you export ZERO integer data? This was probably an bigger abuse, but my “gov app” decided to validate that also. At one time it was mandatory to include the 0, then it was mandatory NOT to include the 0. That is, at one time Field,0,Field was valid, next Field,,Field was the only valid way…

And here’s an other test-case where (my) intuition failed:

1997, Ford, E350, “Super, luxurious truck”

Please note the space between , and "Super, and the very lucky comma that follows "Super. The parser employed by TStrings only sees the quote char if it immediately follows the delimiter. That string is parsed as:

[1997]
[ Ford]
[ E350]
[ "Super]
[ luxurious truck"]

Intuitively I’d expect:

[1997]
[ Ford]
[ E350]
[Super luxurious truck]

But guess what, Excel does it the same way Delphi does it…

Conclusion

  • TStrings.CommaText is fairly good and nicely implemented, at least the Delphi 2010 version I looked at is quite effective (avoids multiple string allocations, uses a PChar to “walk” the parsed string) and works about the same as Excel’s parser does.
  • In the real world you’ll need to exchange data with other software, written using other libraries (or no libraries at all), where people might have miss-interpreted some of the (missing?) rules of CSV. You’ll have to adapt, and it’ll probably not be a case of right-or-wrong but a case of “my clients need to import this crap”. If that happens, you’ll have to write your own parser, one that adapts to the requirements of the 3rd party app you’d be dealing with. Until that happens, you can safely use TStrings. And when it does happen, it might not be TString‘s fault!

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)