What Is Google Gemini?
Learn about the key features and functionalities of Google Gemini.
We'll cover the following
Google Gemini refers to two things: a family of large language models and a chatbot powered by a Gemini LLM. There are two major versions, 1.0 and its successor, 1.5. Let’s break it down.
Gemini AI model
Launched in December 2023, the Gemini AI models are Google’s next step forward after
Gemini models are designed to be multimodal, meaning they can handle a variety of input formats beyond just text. Text is a natural starting point, but Gemini can also work with images, charts, PDFs, videos, and audio. For videos, it breaks them down into a sequence of frames. This allows these frames, or any images, to seamlessly interweave with text or audio inputs. Gemini can also directly process audio without requiring any text conversion beforehand. For outputs, Gemini models can generate text and images natively. Native image generation forgoes the need to have an intermediate natural language description before an image can be generated.
Gemini is offered in four different versions:
Gemini Ultra is the most powerful and feature-rich version of Gemini. It excels at complex tasks and can handle massive amounts of data and context. It is ideal for research labs and large companies working on cutting-edge AI applications. Access to Gemini Ultra is limited at the time of writing, and approvals are at Google’s discretion.
Gemini Pro is the best all-rounder version of Gemini. It offers a context window of up to 2 million tokens and has been crushing benchmarks. It can handle tasks such as generating different text, code, and images. The latest version of Gemini 1.5 Pro has scored better in most benchmarks than the Gemini 1.0 Ultra.
Gemini Flash is a lightweight and cost-efficient model that offers a good mix of speed and efficiency. It offers performance similar to Gemini Pro but at a much lower cost. It offers a context window of up to 1 million tokens.
Gemini Nano is the most lightweight and efficient version of Gemini. While it is not as powerful as the other versions, it can still perform many tasks well. This makes it a good choice for enabling AI applications on mobile devices or devices with lower processing power.
The Google Pixel 8 Pro and Samsung S24 Series use Google’s Gemini Nano model on-device for quick and efficient processing.
Gemini chatbot
The Gemini chabot was formally known as Bard. Google announced Bard in February 2023 as a GenAI chatbot powered by LaMDA. Since then, the chatbot has received multiple updates and features. The chatbot was switched to use the newer PaLM model before finally switching to the Gemini model and merging under the same Gemini brand. The chatbot uses a tuned version of a Gemini’s 1.0 Pro model by default. The chatbot allows for image upload and voice input as well. A paid subscription, named Gemini Advanced, provides access to the Gemini 1.5 Pro with a 1 million token context window and the ability to upload files.
Gemini 1.5
Gemini is still a work in progress. New features and updates drop regularly. In February 2024, Gemini 1.5 was revealed. This was a generational leap forward and allowed Gemini to be more capable than ever. Compared to earlier versions, Gemini 1.5 achieves similar performance while requiring less training data and being more efficient to run. The context window of Gemini was increased from 32K tokens to 1 million tokens. In May 2024, Gemini 1.5 was updated again, with Gemini 1.5 Pro now offering a context window of 2 million tokens. To put that into perspective, Gemini 1.5 Pro can process up to 2 hours of video, 22 hours of audio, 60,000 lines of code, or 1.4 million words of text.
Even more impressive is that Gemini 1.5 was tested during research with a context window of up to 10 million tokens and showed promising results. The future truly is brimming with possibilities!
Gemini and the Google ecosystem
While OpenAI’s ChatGPT broke records for user adoption, Gemini has also been growing rapidly. Google’s ecosystem of applications and services stands to gain the most from AI integrations. Google has been releasing smart features in its workspace suite of applications, and Android has also been getting smarter thanks to Gemini.
Let’s look at a few key experiences powered by Gemini:
that is The Gemini app has been released for both iOS and Android. The Gemini mobile app is well-integrated with Google apps such as YouTube and Maps. For Android devices with the application installed, Gemini is now the app opened by default using the “Hey Google” key phrase.
Gemini Nano powers the summary transcription features in the Google Recorder app and enables more contextual auto-replies called “Smart Replies” in a few applications. Google’s screen reading accessibility feature now uses Gemini to visualize and understand the images on the screen for users with low vision.
Google announced Gemini Nano with multimodality during Google’s I/O event in 2024. The highlight feature was real-time “Scam Detection.” Gemini would be able to listen to phone calls and alert you if it detects that the caller might be trying to scam you.