Taking things to the next level
When I was testing my application, I would make many calls to the system and watch for the web page to update, control the queues, and watch the stats build up. However, one thing I was not paying attention to was the number of simultaneous calls and the call volume itself. I was really only making a call or two at a time, and I was not really fully testing my application. When I put it into production for the first time, it was also the first time I ever saw what happened when multi-threaded software had an irresolvable conflict in the locking contention, better known as a dead lock. I was quite familiar with the segmentation fault, as I had encountered many of those along the way when I was working on my own modules, but I was surprised to also see a rise in the number of inexplicable random ones happening only some of the time.
A segmentation fault is a violation that an application commits where it makes inappropriate access to memory by destroying the same memory more than once or accessing memory addresses that are out of bounds or do not exist. You will run into them a lot in C programming, since you have lower-level access to the operating system and there is nothing to protect you from making errors besides your own discipline. I don’t give up easily, which you could consider a curse or a blessing, so when I started to encounter some problems, I was prepared to get to the bottom of it. I spent countless hours studying the output from the GNU debugger and trying to simulate the traffic that caused my problems. After a little trial-and-error, we found success! I managed to duplicate the crash in my test lab using a load generator. I even managed to figure out where the problem was and fixed it! That was a great feeling that lasted right up until later that afternoon, when I learned there was another new problem with similar symptoms somewhere else in the code.
I managed to slowly back out the features in my application that increased the likelihood of a deadlock or segmentation fault, but I could not completely eliminate all the problems. I eventually discovered that the app_queue
module was causing most of my grief, which was not the best news considering that was the module I was using the most in my call-center application. Some of the changes I wanted to make were too intrusive for inclusion in the main code distribution, so I ended up using my own copy of the code so I could continue to update the rest of Asterisk. This kept things stable, but only stable enough to seek another solution.
By this time I had written a fairly large amount of features into Asterisk and was really starting to have some big ideas for new functionality. I created a new concept called “function variables" allowing modules to expose an interface that could be expanded from the Dialplan (if you read the rest of this book, that idea may sound familiar). I was still wrestling with the queue problems, so I got together with another Asterisk community member and started brainstorming on a new ACD queue module for Asterisk called mod_icd
.
ICD stood for Intelligent Call Distribution, a play on the acronym ACD meaning Automatic Call Distribution. We had identified all the shortcomings of the app_queue
module with regards to functionality, and we had a common interest in making a more stable module that would not cause countless crashes and deadlocks. We had a working prototype and a lot of work to do. We used state machines and higher-level memory management abstractions with data pools and several other inspiring concepts that we felt were lacking in the standard Asterisk. The problem was, I think we over-engineered the module too much, almost as if we were trying to edge out the entire Asterisk core, which was of course not completely possible being only a loadable module within that core.
We never quite finished mod_icd
. It was late 2004 and my opportunities with call-center solutions lay smashed on the rocks, washed away by the unforgiving seas of segmentation faults and deadlocks. We started focusing more on other telephony services that did not involve queuing. I developed a new offering of toll-free termination and fax-to-e-mail services. Using several new features I added to the mainline Asterisk and some of my less-popular modules that were not approved, I built a cluster of seven Asterisk boxes and connected them to a large telecom circuit. This deployment of Asterisk was not problem-free but, on the bright side, if some of the machines crashed, there were more to take its place while we restarted them.