Hardware Constraints for Transformer Models
Learn about the different hardware constraints in transformer models.
Transformer models could not exist without optimized hardware. Memory and disk management design remain critical components. However, computing power also remains a prerequisite. It would be nearly impossible to train the original transformer described earlier in the course, without GPUs. GPUs are at the center of the battle for efficient transformer models.
This appendix lesson go over the importance of GPUs in three steps:
The architecture and scale of transformers.
CPUs vs. GPUs.
Implementing GPUs in PyTorch as an example of how any other optimized language optimizes.
The architecture and scale of transformers
A hint about hardware-driven design appeared in Chapter 3 of this course:
"However, we would only get one point of view at a time by analyzing the sequence with one
block. Furthermore, it would take quite some calculation time to find other perspectives."
A better way is to divide the
We can then run the eight “heads” in parallel to speed up the training and obtain eight different representation subspaces of how each word relates to another:
Get hands-on with 1400+ tech skills courses.