...

/

Back-of-the-envelope Calculations for the Model Training

Back-of-the-envelope Calculations for the Model Training

Learn the essentials of back-of-the-envelope calculations to estimate the training and deployment resources required for GenAI applications.

One of the most critical steps when designing and implementing generative AI (GenAI) systems is resource estimation. Whether training a new model or deploying an inference system for millions of users, having a solid understanding of the required resources—such as computational power, storage, and bandwidth—can save significant time and cost. However, detailed calculations can be time-consuming and require access to precise technical details. This is where back-of-the-envelope calculations (BOTECs) come into play.

Back-of-the-envelope calculations (BOTECs) are quick, rough estimations that provide a high-level understanding of what a system might require regarding resources. These calculations aren’t about pinpoint accuracy but getting a reliable ballpark figure to guide decision-making. For instance, we might want to estimate how many GPUs we’ll need before training a large language model or how long the training will take. Similarly, for deployment, we can use BOTECs to predict the number of inference servers required to handle user requests.

Press + to interact
Resource estimation in GenAI systems
Resource estimation in GenAI systems

In this lesson, we’ll explore the basics of BOTECs for training and deploying GenAI systems, starting with the training time and resource estimation.

Why do BOTECs matter in GenAI?

In the context of GenAI, BOTECs are invaluable for managing resources and optimizing performance. These systems operate at an immense scale—measured in billions of parameters, terabytes of data, and thousands of GPUs—making accurate estimations crucial. Even minor inaccuracies in initial assessments can compound into significant resource waste or system inefficiencies as the system scales. Using BOTECs, developers can make informed decisions early in the design process, ensuring smoother execution and better optimization.

BOTECs are also valuable in technical interviews, particularly for roles in AI system design and machine learning engineering. Interviewers often assess candidates’ ability to think critically and estimate system requirements without relying on exact tools or data. Quickly and logically breaking down complex problems—such as estimating GPU usage, training time, or storage needs—exhibits technical knowledge and practical problem-solving skills. Mastering BOTECs can help candidates stand out by showcasing their ability to approach real-world challenges precisely and confidently.

In this chapter, we will focus on two types of BOTECs:

  1. Resources estimations for training a model

  2. Resources estimations for deploying the model

Let’s start with the model training estimations to devise some mathematical formulations:

Training estimations

In the training phase of building a generative AI system using a large model like the Llama model, we want to determine how long it would take to train it on a certain amount of data. The Llama 2-70B model took around 1.7 million GPU hours using NVIDIA A100 GPUs. Calculating this beforehand allows us to plan resources accordingly and build systems in a realistic time frame. For example, using one GPU to train Llama 2-70B would take ~195 years, while using a cluster of 10,000 GPUs would take around one week. There is also the cost aspect.

It is reported that Llama 2-70B took around $3.9 M to train. Once we estimate the GPU hours it would take to train a model, we can also predict the cost with some certainty.

As we know, training any model usually means we are passing our data through the model and updating its internal weights. Both these processes require mathematical calculations like multiplying or adding certain values. To estimate the time it takes to train a model, we need the answers to these two questions:

  1. How many calculations will be performed?

  2. How many calculations can be done in a certain time (e.g., one second)?

The calculations we refer to here are FLOPS (floating point operations). A floating point is simply a number with a decimal point. Without diving into the specifics of a model’s architecture, we cannot know how many calculations will be performed during one forward/backward pass, so we need to make some informed guesses.

The BOTEC calculations and equations presented in this course are intended as a foundational blueprint for understanding resource estimation in GenAI systems. While they provide valuable insights and a framework for analysis, they should not be taken as precise references for real-world System Design. The actual resource requirements of GenAI systems can vary significantly depending on numerous factors beyond the scope of these simplified examples. However, the principles and techniques discussed here will be a solid reference for designing and optimizing GenAI systems throughout this course.

Let’s look at an example of how we can estimate these FLOPS:

Press + to interact
A simple neural network
A simple neural network

The network is configured with two input nodes, three hidden layer nodes, and one output node.

The total number of parameters in the network are:

For one feedforward pass through the network, we will need the following calculations:

Where b1b_1 is the bias value of the first layer. We can see that the first few calculations require 4+4+44 + 4 + 4 ...