Estimation of Processing Time of an API
Learn to estimate the processing time of an API.
We'll cover the following...
In the previous lesson, we learned that the response time is a combination of latency and processing time, as given in the following equation:
Let's start by estimating the processing time of an API.
Processing time
The processing time of a server is defined as the time a server takes to process a request to prepare a response. This is one of the important factors that affect response time. Therefore, estimating processing time is an important part of estimating the total response time of a service.
The illustration below is a high-level architecture of what constitutes processing time in an API. The server interacts with the database to execute queries for data retrieval that might also involve file handling. It includes the round trip from the API gateway to downstream services, the request execution time, and the response preparation time.
There is no rule of thumb to calculate the exact processing time. It depends on several things, like the services, the components within the services, and the technologies (both hardware and software). Usually, the processing involves analyzing a query and fetching the data from the server’s memory or corresponding database. The processing time will primarily depend on three factors that are listed below:
The type of request
The application server’s time to handle a request
Database query execution time
The processing time depends on the machine’s specification, which is processing the user’s request. There are plenty of servers available with different specifications supporting different requirements. We’ll consider a typical server from Amazon Web Services (AWS) whose specifications are defined below:
Server Specifications
Component | Specification |
Sockets | 2 |
Processor | Intel Xeon X2686 |
RAM | 240 GB |
Cores | 36 cores (72 hardware threads) |
Cache (L3) | 45 MB |
Storage | 15 TB |
Request processing estimation
In this section, we’ll estimate the time a server takes to handle a request depending on the type of request. Mainly, there are two types of requests that are bound by either CPU or memory.
CPU bound: These are requests where the CPU acts as a limiting factor.
Memory bound: These are requests where the memory acts as a limiting factor.
Let's say that each CPU-bound request takes 200 milliseconds (ms), and each memory-bound request takes 50 ms to complete. The requests per second (RPS) for each are calculated using the following formulas.
The following terms are used in this calculation:
...