January 25th, 2006

XML Follies

Things I learned about XML today:

  • It would be easier to work with if I was using someone else's parser, instead of the undocumented home-grown one that I'm stuck with for the moment.
  • UTF-16 is all well and good, but even if Excel reads an XML file in that format, it will damned well write it back out in UTF-8.
  • Exactly why Excel complains about denormalized data remains a mystery, but by deleting the attribute of a tag that was nested two levels out, you can get it to quit complaining and do what you want.
  • Word will also write out the file in UTF-8 format and will thoughtfully remove the CR/LF information. There's nothing like a 59,000 character line.
  • Unicode isn't more trouble than it's worth, but it would be nice if Microsoft's support for it was a bit better.