What are the parameters in ChatGPT-3?

Just like in any version of Generative Pre-trained Transformer (GPT), parameters in ChatGPT-3 are the learned variables that encode the knowledge acquired during the training process to make relevant predictions.

Neural networks

A neural network consists of several interconnected neurons (just like a human brain) which makes predictions according to the weights and biases over the input data.

The weights and biases are mentioned with W and B in the following figure, respectively. These are called parameters which are optimized during the process of backpropagation to increase the accuracy of the output.

This neural network is not fully connected
This neural network is not fully connected

In neural networks, every neuron in the hidden and output layer is calculated depending upon the weights and biases of their subsequent previous layers.

For example, the value for neuron E is calculated as follow:

After this calculation, an activation function is applied to the neuron to see whether to activate this neuron for further calculations or not.

ChatGPT-3 is also based on a similar framework but includes a deep neural networkDeep neural networks have multiple hidden layers..

Types of parameters

Here are a few parameters in ChatGPT-3:

  1. Weights: They are the most fundamental parameters in ChatGPT-3 that model learns from the training data. For instance, if ChatGPT-3 sees the word "cat" followed by the word "meowed," it will assign a higher weight to this word pair. Next time, the model will more likely predict the word "meowed" after the "cat."

  2. Bias: This parameter is used as an adjusting factor to tune the prediction of an entire layer to the more accurate side. Because of this reason, it is the same for a specific layer and applied layerwise.

  3. Learning rate: This parameter suggests how much the biases and the weights have to be adjusted in order to reduce the error and make the results more accurate. This process of reducing the error is done during backpropagation.

Conclusion

ChatGPT-3 uses transformer architecture which follows an attention mechanismThis mechanism enables the model to weight the relevant parts of the data according to its context. to understand the context of every word, enabling accurate predictions.

For example, "Someone went to buy a goat" is a different statement from "A goat went to buy someone." Transformer architecture understands the context of every word and learns its parameters to differentiate between the two example sentences mentioned above. There are around 175 billion such parameters in the case of ChatGPT-3.

Copyright ©2024 Educative, Inc. All rights reserved