...
/Enhancing AI with Audio/Video-to-Text Generation
Enhancing AI with Audio/Video-to-Text Generation
Learn about the Files API and how it can be used to send audio and videos in prompts.
We'll cover the following...
The Files API
The Gemini Files API allows us to store and access media files (text, images, audio, and video) to use with the model’s generation capabilities. This functionality is particularly useful when the prompt data exceeds the size limit of the standard prompt input of 20 MB or when we want to provide multimedia content for multimodal prompting. The File API allows us to store up to 20 GB of files per project, with each file capped at 2 GB. Files are kept for 48 hours and can be accessed with the API key that was used to upload them. This service is free in all regions where the Gemini API is available.
Supported audio formats
Gemini supports the following data types for audio files:
- WAV: audio/wav
- MP3: audio/mp3
- AIFF: audio/aiff
- AAC: audio/aac
- OGG Vorbis: audio/ogg
- FLAC: audio/flac
Supported video formats
Gemini supports the following data types for video files:
- video/mp4
- video/mpeg
- video/mov
- video/avi
- video/x-flv
- video/mpg
- video/webm
- video/wmv
- video/3gpp