...
/Zoom API Design Evaluation and Latency Budget
Zoom API Design Evaluation and Latency Budget
Analyze the non-functional requirements and estimate the response time of the Zoom meeting API.
We'll cover the following...
Introduction
Modeling a complex service is a time-consuming process that may require many rounds of fine-tuning. In this lesson, we’ll discuss how we can achieve the non-functional requirements, especially real-time communication, and estimate the response time of our proposed Zoom meeting API.
Non-functional requirements
Let's discuss the non-functional requirements for our Zoom API one by one:
Availability and reliability
We ensure the availability of our services by dividing servers according to different roles. For example, the meeting service handles requests to create, update, add participants, and so on, while the media controller handles client requests for managing meeting sessions. By adopting a role-based style, we can separate different workflows. In the event of a failure, if one service goes down, the other can still run normally, making our system resilient to complete outages. Additionally, services and data are replicated across different geological regions to avoid single points of failure (SPOF). We also have API monitoring and circuit breakers to identify and handle bad situations as quickly as possible. We limit concurrent meeting requests based on the account type for efficient resource management. For free users, we also limit the maximum time for a meeting to avoid the overloading of servers.
Security
We use TLS/1.3 for normal communication, and to exchange AES keys for multimedia transmission. After successfully sharing the key, the connection is upgraded to WebSockets for AES-encrypted data transfers. We implement authentication/authorization using a login mechanism and OAuth, and OpenID Connect with PKCE flows for third-party interactions (see: the authorization framework). Connecting to the media router requires an access token. Guest (unregistered) participants can also join using their access token, which is only issued when the host accepts their join request.
Scalability
Locally distributed media routers make scaling services easier. We also have decoupled media routers and media controllers, which allow us to deploy multiple media routers in an area controlled by a single controller, making this a cost-effective solution. Stateless communication between the conferencing service and the media controller allows efficient resource management during workload peaks.
Point to Ponder
What determines the maximum number of users a service like Zoom can handle in a single meeting?
Optimization and tradeoffs
The stateful nature of WebSockets can be a scalability issue for our service, which is inevitable due to the two-way and real-time nature of the service. However, we may scale our service by increasing the number of regional media servers, which is an expensive solution, but there is always some sort of tradeoff.
Additionally, because we’ve learned from a previous lesson ...