Grokking the Generative AI System Design/

...

Preliminary Machine Learning Concepts

Learn about neural network architecture, its types, and the key concepts of transformers. Get an understanding of how these concepts apply in GenAI systems.

We'll cover the following...

Mastering the core principles of neural networks and their variants is crucial for designing large-scale GenAI systems capable of tasks like text, image, speech, and video generation. In this lesson, we will explore the following foundational concepts:

Neural networks
Convolutional neural networks (CNNs)
Recurrent neural networks (RNNs)
Transformers network
Attention mechanisms

These machine-learning concepts are the backbone of modern GenAI systems, which allow machines to learn patterns, generate creative outputs, and scale efficiently. By understanding these concepts, we can better design and optimize the complex System Designs required for real-world GenAI applications.

Let’s describe each of the above concepts, starting with neural networks.

Neural network architecture

Neural networks are computational models inspired by the human brain. They are designed to recognize patterns and make predictions by processing data through interconnected layers (discussed below) of nodes (also known as neurons). Neural network architecture refers to the structure and organization of a neural network, including the arrangement of its layers, nodes (neurons), and connections. It defines how the data flows through the network, learns, and makes predictions or decisions.

Let’s discuss the essential components of a neural network.

Components of neural network

Here are the key components of neural network architecture, though we will focus on only a few in this discussion:

Neurons: The basic processing unit of a neural network is a neuron or a node. Each neuron takes a feature vector such as $(\text{x}_{1}, \text{x}_{2},\text{ . . . } ,\text{x}_{\text{m}})$ , multiplies it with corresponding weights such as $(\text{w}_{1}, \text{w}_{2},..., \text{w}_{\text{m}})$ , adds a bias, and sums all of them, and the result is passed through an activation function ( $\sigma$ ) to produce nonlinearityNon-linearity in a neural network refers to the ability of the model to capture and represent complex relationships in data that cannot be described by a straight line or simple equation. It allows the network to learn patterns, such as curves or interactions, which are essential for tasks like image recognition and natural language processing.. The mathematical formulation is as follows:

Where:

Activation functions ( $\sigma$ ): An activation function is a mathematical function applied to the output of a neural network node to introduce nonlinearity into the network, enabling it to learn complex patterns. Common activation functions are sigmoid, ReLU (Rectified linear unit), and softmax.
Weights and bias: A weight represents the strength of the connectionA connection in a neural network is the link between two neurons (nodes) where information flows. between two neurons. On the other hand, bias allows the model to shift the activation functionShifting the activation function is the ability to adjust a neuron's output by adding a constant value, called a bias, before applying the activation function., improving its ability to fit the data.

The architecture of a simple neural network is provided below: