Qwicap version 1.4b9 was released on May 31. It includes the minor enhancements of the unannounced beta 7 and 8 releases (logging in Qwicap.reportException
, and more informative exception messages from the MutableMarkup
class, respectively). The work on beta 9 is of a whole 'nother order, hence this announcement.
Last Friday, after lunch, I was looking forward to a quiet afternoon, and was reading my usual collection of computing-related web sites when I arrived at Elliot Rusty Harold's blog, Cafe au Lait. As usual there was a lot of material there, and, also as usual, I wasn't even going to try to deal with it all; I was just there for the quote of the day, which turned out to be about character set support in various web application development systems. And it included a harmless looking little link attached to the word "Unicode" (just like that one, in fact). I clicked the innocent word and that was the end of my peaceful Friday afternoon. The link was to Joel Spolsky's 2003 article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). If you haven't read it, consider it recommended. If you're in a hurry, just read the bit at the end about the problems faced by web browsers when they need to interpret the bytes of a web page. That gives you the flavor of many of the problems posed by character set handling. And they're fairly obvious problems, if you happen to think about them. Unfortunately, I hadn't.
I wasn't exactly unaware of character set issues (I sorted out EBCDIC to ASCII translation issues back in the days of the IBM 3033 and the original IBM PC - issues that had IBM's support engineers embarassed and stumped), and I've been a big fan of Unicode, but while reading that I article I quickly realized that I'd failed to grasp just how significant character set issues are, or can be, to everyday software development. Specifically, in the case of Qwicap, I realized that I'd ignored character set issues when I wrote Qwicap's XML engine. I knew that, having emerged from Reader
objects, the characters in memory were Unicode (sort of) and I'd just left my concern at that, without thinking about the encoding of the source materials that went into those Reader
objects. I'd also neglected an issue associated with transmitting web pages to clients: They were always sent as UTF-8, regardless of what the document markup called-for. (In a way, that's OK because a character set specification in the HTTP "Content-Type" header, which Qwicap was in the habit of supplying, takes precedence over any specification in the markup, but it's both rude and confusing to ignore the document author's wishes as expressed by their markup.)
So, seeing all of those basic oversights at once was quite a kick in the teeth (self-inflicted, no less). And, of course, one of the great things about open source development is that you have the opportunity to make big, embarassing mistakes in public. On the bright side, if I knew how difficult it was going to be to develop Qwicap when I started, I might have come to my senses and found something less painful to occupy my time, like hitting myself in the head with blunt objects. So, since I still think Qwicap is a good idea, and I yet cling to the hope that it'll find a niche for itself beyond the bounds of my worthy employer, The University of Texas at Austin, (and for all I know, perhaps it already has) maybe it's just as well that I didn't fully appreciate what I was getting myself into when I first had this idea.
Anyway, with the release of version 1.4b9, Qwicap is now character-set aware, and I can take some comfort from the facts that (1) nobody but me ever noticed this problem in Qwicap, and (2) there are vastly more popular web application development schemes that still neglect character set issues. In the latter case, though they were absurdly late in coming, the fixes in 1.4b9 do become one more feather in Qwicap's hat. Of course, I still feel plenty stupid.
No comments:
Post a Comment