Diagnostic Tests
Learn about thread dumps, waiting threads, and blocked threads.
We'll cover the following
Thread dumps
The thread dumps on the front-end application servers revealed a similar pattern across all the DRPs. A few threads were busy making a call to the back-end, and most of the others were waiting for an available connection to call the back-end. The waiting threads were all blocked on a resource pool, one that had no timeout. If the back-end stopped responding, then the threads making the calls would never return, and the ones that were blocked would never have their chance to make their calls. In short, every single request-handling thread, all 3,000 of them, were tied up doing nothing, perfectly explaining our observation of low CPU usage: all 100 DRPs were idle, waiting forever for an answer that would never come.
Attention swung to the order management system. Thread dumps on that system revealed that some of its 450 threads were occupied making calls to an external integration point, as shown in the following figure. As you probably have guessed, all other threads were blocked waiting to make calls to that external integration point. That system handles scheduling for home delivery. We immediately paged the operations team for that system. It’s managed by a different group that does not have 24/7 support staff. They pass a pager around on rotation.
Get hands-on with 1400+ tech skills courses.