Monday, December 11, 2006

Qwicap 1.4a42 Released

Qwicap 1.4a42 has been released. (Download it from SourceForge.) This version has been in the works for almost six months, off and on. It was bedeviled by two problems: (1) a thread synchronization bug that could manifest when a user submitted new form data before the processing of their previous hit was complete, and (2) a misunderstanding between myself, a colleague who administers some of our servers, some obscure error reports from the JVM, and some Google searches, that led us to conclude (incorrectly) that Qwicap absolutely had to have a thread pool in order to operate reliably over long periods of time.

With regard to the thread synchronization bug: Ouch. I could not reproduce it when running the server and the web browser on the same box (my normal method of working), even when the box was a multi-processor machine. If a colleague, Kevin Wood, hadn't noticed this bug and then invested serious effort in exploring it, I don't know when it would have been found. Just to keep it interesting, the fixes to Qwicap 1.4a37 that he provided to me didn't work in his test setup when I supplied him with builds from the official codebase that incorporated all of his changes. (Those builds worked fine on my machine, naturally.) The builds he produced from the older codebase worked fine in his test setup, and mine. Even after we both sat down and satisfied ourselves that I hadn't mangled, or omitted, any of his modifications, the problem persisted in my builds, when he ran the tests on his box. In the end, he set up a box for me that was configured essentially the same as his machine, and I did my own testing. Eventually, I identified one last race condition that even he'd missed. (I throw no stones here; if I hadn't missed it in the first place, he couldn't have missed it in the second.)

With regard to the perceived need for a thread pool, I put a lot of work into building, instrumenting and testing a thread pool based on the mistaken belief that our production servers were encountering a memory-related resource exhaustion problem related to the total number of threads created over the lifetime of any given Qwicap application. In the end, it turned-out that the problem was confined to the amount of memory consumed by the stacks of the threads running at any given time, when running under Java 1.5, which dramatically increased the default stack size relative to Java 1.4 (it seems to have gone from 256K to a megabyte or more). Initially, configuring our servers to run Tomcat in a 64-bit address space solved the problem, and later configuring them to use only 256K as the default stack size provided an alternate solution.

So, Qwicap doesn't turn-out, strictly speaking, to need a thread pool to operate reliably, but I implemented one before that became clear to us, and therefore it now has a thread pool. There's no point in ripping it out, because (a) some people will actually want it, (b) a lot of nice status data gathering machinery became tied to it, (c) it can be configured such that it doesn't hang onto any threads, if you wish, and therefore won't get in the way of people who don't want pooling, and (d) the configuration parameters for the pool allow the size of the Qwicap thread stacks to be controlled, which goes to the heart of the real problem.

Other noteworthy changes in Qwicap 1.4a42 include:

  • The concept of "blocking listeners" has been added. Such listeners, registered using the new Qwicap.addBlockingListener method, are invoked just before and after Qwicap blocks to wait for user activity. This applies to the prompt methods, and the redirect method, and will apply to any future blocking methods. Thus a "blocking listener" can perform actions like releasing and reacquiring limited resources like database connections, or log application activity, transparently from the perspective of a web application's code.
  • There is now a "service data recorder" (SDR), for lack of a better term, built into Qwicap. The SDR accumulates data in hourly chunks (up to 72 of them). Each hour's data includes information on the thread pool (total size, number of active and inactive threads, number of threads created and allowed to die, number of threads unavailable due to the max. pool size limit), user sessions (number completed and expired, and the mininum, average and maximum duration), hits (total number, hits directed to invalid pages, and the minimum, average and maximum response time), and RAM usage (amounts free, used, and total). For the time being, the SDR reports are available only in a human-readable format, because I am hesitant to commit to what data will be gathered, how it will be represented, etc.

At this point, my main goal for Qwicap is to finalize the 1.4 release. I'm prepared to declare it feature-complete now. (Therefore, I should have released it as a "beta", rather than an "alpha".) While it doesn't include all of the features I'd originally intended it to have, it's the most significant revision to Qwicap thus far, and at some point you just have to draw a line across such an effort, declare it complete, and add the missing features to the list for the next major revision. Furthermore, the current release version, 1.3.3, is almost a year old at this point, and lacks a number of bug fixes and important improvements that are present in 1.4. (Backporting the fixes and improvements to common code became an unsupportable burden some time ago, with the never-quite-ready-for-release version 1.3.4.) The thought of people putting-up with 1.3-isms when they should be benefitting from 1.4-isms is really getting on my nerves. So, expect the 1.4 release in the not-too-distant future, bug reports permitting.

No comments:

Post a Comment