extracting meaningful information easier for other servicesmanyIn the previous lesson, we chose a model similar to Mochi 1https://huggingface.co/genmo/mochi-1-preview for the text-to-video generation system and presented the training process and the required resources. In this lesson, our focus is on the deployment infrastructure for such a model. We estimate various resources, followed by design considerations and a detailed System Design.

Let’s start with the model size estimation:

Text-to-video model size estimation

We are considering a similar model to Mochi 1, which has approximately 10 billion parameters. For FP32 floating-point precision, the model size becomes:

Get hands-on with 1400+ tech skills courses.