Shortly after the session

The incident had started about 20 minutes before Daniel called me. The operations center had escalated to the on-site team. David, the operations manager, had decided to bring me in as well. Too much was on the line for our client to worry about interrupting a vacation day. Besides, I had told them not to hesitate to call me if I was needed. We knew a few things at this point, twenty minutes into the incident:

  • Session counts were very high, higher than the day before.

  • Network bandwidth usage was high but not hitting a limit.

  • Application server page latency (response time) was high.

  • Web, application, and database CPU usage were really low.

  • Search servers, our usual culprit, were responding well. System stats looked healthy.

  • Request-handling threads were almost all busy. Many of them had been working on their requests for more than five seconds.

In fact, the page latency wasn’t just high. Because requests were timing out, it was effectively infinite. The statistics showed us only the average of requests that were completed.

Response time

Response time is always a lagging indicator. we can only measure the response time on requests that are done.

Get hands-on with 1400+ tech skills courses.