Now the locking scheme that I implemented wasn't completely bulletproof -- you could, for instance, obtain a pointer to an object along with the matching lock and then unlock the object while retaining the cached pointer. I could have written a lot of code to prevent this.
Or I could just look at the programmer sadly and say, "If you did this, I cannot help you."
My locking scheme works well, but -- really unfortunately -- it hasn't quite managed to make it to the tip of the code line. I'm hoping that it will shortly escape into the wild. (Actually, it's available in a patch to an older version. But it was a 700+ file merge to bring all of those changes forward to the current release and that requires an unfortunate amount of testing before it should be set free.)
One of the problems with the old locking scheme was that it was not simple at all. In our continuing efforts to "fix" it, we piled multiple read/write locks onto the same poor, overburdened object. There was the read/write lock. There was the cache lock, which was intended to keep documents from being purged from memory. There was the loading lock, indicating that we were trying to load a document into memory. And there was the horrid (and originally my idea) intent-to-write lock which was really intended to manage long-lived reservations of a resource for a running batch process such as consolidation.
Eep. So when I rewrote locking, I threw out all of the locks except for a simple read/write lock (which, after reading the literature some more, I am considering just reducing to an even simpler write lock and seeing how it works) and the intent-to-write lock, which was converted from a mutex to a check-out type lock, since we already used check out to reserve resources for long periods of time.
And after I bashed all of the code to fit, things worked pretty well.
Unfortunately, today I am debugging some of the old locking code which seems to be giving us a problem in our consolidation. And I find that I am attempting to lock a mutex that controls a resource that has just been deleted out from under me somehow between the time that I got the pointer to the mutex and the time that I actually tried to take the lock.
I could probably fix this.
All I have to do is rewrite the locking scheme.
*thud* *thud* *thud*