Technology Part 6

Google Gemini: Everything You Need to Know About the New Generative AI Platform

Google is making headlines with Gemini, its latest suite of generative AI models, applications, and services.

So, what exactly is Google Gemini? How can you use it? And how does it compare to other AI platforms?

To help you stay updated with the latest on Gemini, we’ve created this comprehensive guide. We’ll keep it current as new models, features, and news about Google’s plans for Gemini emerge.

What is Gemini?

Gemini is Google’s much-anticipated next-generation family of generative AI models, developed by Google’s AI research teams at DeepMind and Google Research. It comes in four versions:

  1. Gemini Ultra: The most powerful model in the Gemini lineup.
  2. Gemini Pro: A lighter version of Ultra.
  3. Gemini Flash: A faster, streamlined version of Pro.
  4. Gemini Nano: Two smaller models, Nano-1 and the more advanced Nano-2, designed to run offline on mobile devices.

All Gemini models are designed to be multimodal, meaning they can work with and analyze more than just text. Google says these models were trained on a mix of public, proprietary, and licensed data, including audio, images, videos, codebases, and text in various languages.

This makes Gemini different from models like Google’s LaMDA, which was trained only on text data. LaMDA can only understand and generate text, but Gemini models can handle much more.

It’s important to note that using public data to train AI models can be ethically and legally complex. Google has an AI indemnification policy to protect certain Google Cloud customers from lawsuits, but there are exceptions. Be cautious, especially if you plan to use Gemini for commercial purposes.

What’s the Difference Between Gemini Apps and Gemini Models?

Google has made it a bit confusing by not clearly distinguishing between Gemini models and Gemini apps (formerly known as Bard).

The Gemini apps are interfaces that connect to various Gemini models, like Gemini Ultra and Gemini Pro, and provide chatbot-like experiences. Think of them as the front end for Google’s generative AI, similar to OpenAI’s ChatGPT and Anthropic’s Claude apps.

On the web, you can access Gemini here. On Android, the Gemini app replaces the Google Assistant app. On iOS, the Google and Google Search apps serve as Gemini clients.

These apps can accept images, voice commands, and text, including files like PDFs and soon videos, either uploaded or imported from Google Drive. They can also generate images. Conversations with Gemini apps on mobile carry over to the web if you’re signed in to the same Google Account.

Gemini features are also being integrated into other Google apps and services like Gmail and Google Docs. To use most of these features, you’ll need the Google One AI Premium Plan, which costs $20. This plan provides access to Gemini in Google Workspace apps like Docs, Slides, Sheets, and Meet. It also includes Gemini Advanced, which brings Gemini Ultra to the apps and supports analyzing and answering questions about uploaded files.

Gemini Advanced Features

Gemini Advanced users get additional features, like trip planning in Google Search, which creates custom travel itineraries based on various factors like flight times, meal preferences, and local attractions. In Gmail, Gemini can write emails and summarize message threads. In Docs, it helps with writing and brainstorming. In Slides, it generates slides and custom images. In Sheets, it organizes data and creates tables and formulas.

Gemini also extends to Google Drive, where it can summarize files and provide quick facts about a project. In Meet, it translates captions into different languages.

Gemini in Chrome and Other Google Products

Gemini has recently been added to Google’s Chrome browser as an AI writing tool. You can use it to write new content or rewrite existing text, and it will consider the webpage you’re on to make recommendations.

You’ll also find Gemini in Google’s database products, cloud security tools, app development platforms like Firebase and Project IDX, and apps like Google TV, Google Photos, and the NotebookLM note-taking assistant.

Code Assist and Security

Google’s suite of AI-powered tools for code completion and generation, known as Code Assist (formerly Duet AI for Developers), uses Gemini for heavy computational tasks. Google’s security products, like Gemini in Threat Intelligence, also rely on Gemini to analyze large portions of potentially malicious code and perform natural language searches for ongoing threats.

Custom Chatbots and Voice Chats

Announced at Google I/O 2024, Gemini Advanced users will soon be able to create custom chatbots called Gems. These can be generated from natural language descriptions and shared with others or kept private. Gems will eventually integrate with Google services like Calendar, Tasks, Keep, and YouTube Music.

A new feature called Gemini Live will allow users to have in-depth voice chats with Gemini. Users can interrupt Gemini to ask clarifying questions, and it will adapt to their speech patterns in real time. Gemini will also be able to see and respond to users’ surroundings via photos or videos captured by their smartphones.

What Can Gemini Models Do?

Because Gemini models are multimodal, they can perform a variety of tasks, from transcribing speech to captioning images and videos in real time. Many of these capabilities are already available, and Google promises more in the future.

However, it’s worth noting that Google has had some missteps, like the underwhelming launch of Bard and a misleading video about Gemini’s capabilities. Also, like other generative AI technologies, Gemini has issues with biases and inaccuracies.

Assuming Google’s recent claims are accurate, here’s what the different tiers of Gemini can do:

Gemini Ultra

Gemini Ultra can help with tasks like solving physics homework, identifying relevant scientific papers, and generating updated charts from multiple sources. It supports image generation, but this feature isn’t fully available yet. Ultra is accessible through Vertex AI and AI Studio, but you’ll need the AI Premium Plan to use it in Gemini apps.

Gemini Pro

Gemini Pro is an improvement over LaMDA in reasoning, planning, and understanding. The latest version, Gemini 1.5 Pro, can process large amounts of data, including up to 1.4 million words, two hours of video, or 22 hours of audio. It’s available on Vertex AI and AI Studio and supports code execution to reduce bugs in generated code.

Gemini Flash

Gemini Flash is a smaller, more efficient version of Pro, designed for less demanding tasks like summarization, chat apps, and data extraction. It will be available on Vertex AI and AI Studio by mid-July.

Gemini Nano

Gemini Nano is a compact version of the Gemini models, designed to run on mobile devices. It powers features like Summarize in Recorder and Smart Reply in Gboard on the Pixel 8 Pro, Pixel 8, and Samsung Galaxy S24. Nano also drives Magic Compose in Google Messages and will soon alert users to potential scams during calls and create aural descriptions for low-vision and blind users.

Is Gemini Better Than OpenAI’s GPT-4?

Google claims that Gemini outperforms current state-of-the-art models on many benchmarks. However, OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet also perform exceptionally well, and the AI industry is rapidly evolving.

How Much Do Gemini Models Cost?

Gemini models are available through Google’s Gemini API with free options that have usage limits. Otherwise, they are pay-as-you-go. Here’s the pricing as of June 2024:

  • Gemini 1.0 Pro: 50 cents per 1 million input tokens, $1.50 per 1 million output tokens.
  • Gemini 1.5 Pro: $3.05 per 1 million tokens input (for prompts up to 128,000 tokens) or $7 per 1 million tokens (for prompts longer than 128,000 tokens); $10.50 per 1 million tokens (for prompts up to 128,000 tokens) or $21.00 per 1 million tokens (for prompts longer than 128,000 tokens).
  • Gemini 1.5 Flash: 35 cents per 1 million tokens (for prompts up to 128K tokens), 70 cents per 1 million tokens (for prompts longer than 128K); $1.05 per 1 million tokens (for prompts up to 128K tokens), $2.10 per 1 million tokens (for prompts longer than 128K).

Tokens are bits of raw data, like the syllables in a word. 1 million tokens is about 700,000 words. “Input” refers to tokens fed into the model, while “output” refers to tokens generated by the model.

Pricing for Ultra hasn’t been announced yet, and Nano is still in early access.

Is Gemini Coming to the iPhone?

It might be! Apple and Google are reportedly discussing using Gemini for features in an upcoming iOS update. Apple is also in talks with OpenAI and is developing its own generative AI capabilities. Apple SVP Craig Federighi confirmed plans to work with third-party models like Gemini but didn’t provide additional details.

This post was originally published on Feb. 16, 2024, and has been updated to include new information about Gemini and Google’s plans for it.