Your proven path to success in Machine Learning Interviews – developed by FAANG engineers. Unlock ML loops at top companies with a System Design approach.

System design is an important component of any ML interview. Being able to efficiently solve open-ended machine learning problems is a key skill that can set you apart from other engineers and increase the level of seniority at which you’re hired.

This course helps you build that skill, and goes over some of the most popularly asked interview problems at big tech companies. You’ll walk step-by-step through solving these problems, focusing in particular on how to design machine learning systems rather than just answering trivia-style questions. 

Once you’re done with the course, you’ll be able to not just ace the machine learning interview at any tech company, but impress them with your ability to think about systems at a high level. If you have a machine learning or system design interview coming up, you’ll find the course tremendously valuable.

Grokking the Machine Learning Interview

As we work on a machine learning-based system, our goal is generally to improve our metrics (engagement rate, etc.) while ensuring that we meet the capacity and performance requirements. 

Major performance and capacity discussions come in during the following two phases of building a machine learning system:

1. *Training time*: How much training data and capacity is needed to build our predictor?

2. *Evaluation time*: What are the [Service level agreement(SLA)](https://en.wikipedia.org/wiki/Service-level_agreement) that we have to meet while serving the model and capacity needs?  

We need to consider the performance and capacity along with optimization for the ML task at hand, i.e., measure the complexity of the ML system at the training and evaluation time and use it in the decision process of building our ML system architecture as well as in the selection of the ML modeling technique.


# Complexities consideration for an ML system

Machine learning algorithms have three different types of complexities:

- **Training complexity**
   
   The training complexity of a machine learning algorithm is the time taken by it to train the model for a given task.

 
- **Evaluation complexity**
    
  The evaluation complexity of a machine learning algorithm is the time taken by it to evaluate the input at testing time.
   

- **Sample complexity**
   
  The sample complexity of a machine learning algorithm is the total number of training samples required to learn a target function successfully.

   > 📝 Sample complexity changes if the model capacity changes. For example, for a deep neural network, the number of training examples has to be considerably larger than decision trees and linear regression. 








# Comparison of training and evaluation complexities


You can see how the training and evaluation complexities can be used to evaluate which model will be best for a given task and resources.

Assume that 

- $n$ is the number of the training samples
- $f$ is the number of features 
- $n_{trees}$ is the number of trees (for tree-based algorithms) 
- $n_{l_i}$  is the number of neurons at  $i^{th}$ layer in a neural network
- $e$ is the number of epochs
- $d$ is the max depth of the tree

The training and prediction complexity can be approximated in terms of asymptotic analysis as follows:

| **Algorithm**| **Training time**  | **Evaluation time**|
| ------------- |:-------------:| -----|
| Linear/Logistic Regression (Batch)     | $O(nfe)$ | $O(f)$ |
|Neural Network| Exponential (varies per implementation) | $O(fn_{l_1}+n_{l_1}n_{l_2}+...)$
|Multiple Additive Regression Trees (MART)| $O(ndfn_{trees})$|$O(fdn_{trees})$

## Analysis

- The evaluation complexity of the _linear regression_ algorithm is equal to the complexity of a single-layer neural network-based algorithm. Linear regression is the best choice if we want to save time on training and evaluation.
Let’s assume the model evaluates one example in 5 $\mu$s. For 100k examples, it would take 100k x 5 $\mu$s = 500 ms execution time on a single machine.

  For example, for the ad prediction system, the service level agreement(SLA) says that we need to select the relevant ads from the pool of ads in 300 ms. Given this request, we need a fast algorithm. Here linear regression would serve the purpose. 

- Relatively *deep neural network* takes a lot more time in both training and evaluation. Its need for training data is also high. However, it's ability to learn complex tasks such as image segmentation and language understanding, is much higher, and it gives more accurate predictions in comparison to other models. Therefore a deep neural network is a viable choice if it is well suited for the task at hand and capacity isn't a problem.

- *MART* is a tree-based algorithm that has a greater computation cost than linear models, but it is much faster than a deep neural network. Tree-based algorithms are able to generalize well using a moderately-sized training dataset. Therefore, if our training data is limited to a few million examples and capacity/performance is critical, they will be a good choice.


# Performance and capacity considerations in large scale system
Consider that a search system(e.g., Google, Bing) gets a query "computer science" that matches 100 million web pages. The ML-based system wants to respond with the most relevant web pages for the searcher while meeting the system's constraints. These constraints are generally referred to as Service level agreements (SLA). There can be many SLAs around availability and fault tolerance but for our discussion of designing ML systems, _performance_ and _capacity_ are the most important to think about when designing the system. **Performance** based SLA ensures that we return the results back within a given time frame (e.g. 500ms) for 99% of queries. **Capacity** refers to the load that our system can handle, e.g., the system can support 1000 QPS (queries per second).

If we evaluate every document using a relatively fast model such as tree-based or linear regression and it takes 1$\mu$s, our simple model would still take 100s to run for 100 million documents that matched the query "computer science".

This is where distributed systems come in handy; we will distribute the load of a single query among multiple shards, e.g., we can divide the load among 1000 machines and can still execute our fast model on 100 million documents in 100ms (100s/1000).


This lesson provides a quick introduction to performance and capacity considerations and discusses why they matter when designing a solution to a machine learning problem.

Introduction

Practical ML Techniques/Concepts

Search Ranking

Feed Based System

Recommendation System

Self-Driving Car: Image Segmentation

Entity Linking System

Ad Prediction System

Performance and Capacity Considerations

Complexities consideration for an ML system

Comparison of training and evaluation complexities