...

/

Text and Image-to-Text Generation

Text and Image-to-Text Generation

Learn how to generate textual content with multimodal prompts using real-world examples with Gemini.

We have gone through the text generation from text and image prompts individually and seen how Gemini can be creatively used in various applications. Now, it’s time to extend it further to multimodality. We’ll generate text through multiple input formats:

  • Image file: Visual data representing an image.

  • Text file: Structured text-based information.

  • Simple text: Unstructured text-based prompt.

Press + to interact
Text and image-to-text generation using Gemini
Text and image-to-text generation using Gemini

Let’s understand this through a use case:

Itinerary generation: Gemini plans your day

A famous tour company wants to plan tours for different age groups and types. Instead of manually iterating the map and choosing different places for different age groups, the company wants to use GenAI for proper planning.

We’ll utilize the gemini-1.5-flash model for planning the tour because it is best ...