Google Gemini for Beginners: From Basics to Building AI Apps/

...

Enhancing AI with Audio/Video-to-Text Generation

Learn about the Files API and how it can be used to send audio and videos in prompts.

We'll cover the following...

The Files API
Audio-to-text
Video to text

The Files API

The Gemini Files API allows us to store and access media files (text, images, audio, and video) to use with the model’s generation capabilities. This functionality is particularly useful when the prompt data exceeds the size limit of the standard prompt input of 20 MB or when we want to provide multimedia content for multimodal prompting. The File API allows us to store up to 20 GB of files per project, with each file capped at 2 GB. Files are kept for 48 hours and can be accessed with the API key that was used to upload them. This service is free in all regions where the Gemini API is available.

Introduction to Google Gemini

Capabilities of Gemini

Gemini and Vertex AI

Assess Your Knowledge

Conclusion

Build a RAG Using LangChain with Google Gemini

Enhancing AI with Audio/Video-to-Text Generation

The Files API

Supported audio formats

Supported video formats

Audio-to-text