Google DeepMind has unveiled its latest artificial intelligence (AI) model called Gemini.
The AI model was built to be multimodal, meaning the model can generalize and seamlessly understand language as well as operate across and combine different types of information like text, code, audio, image and video.
The goal was to build a new generation of AI that helps people understand and interact with the world like an actual helper or assistant, Google said. The company said that Gemini can run on everything from data centers to mobile devices as well as allow developers and enterprises to build and scale into the AI.
There will be three versions of Gemini 1.0:
- Gemini Ultra — The largest AI capable of highly complex tasks.
- Gemini Pro — An AI model that scales a wide range of tasks.
- Gemini Nano — An AI model for on-device tasks.
How is it different?
According to Google, Gemini is natively multimodal from the start on different modalities. The AI is pre-trained from inception and additional multimodal data was integrated into the software to help understand and reason about data. Previously, creating multimodal models involved training separate components for different modalities and then stitching them together.
The AI model contains sophisticated reasoning capabilities that help make sense of complex written and visual information. This allows the model to uncover knowledge among a vast amount of data including extracting insights from hundreds of thousands of documents and understanding the information at digital speeds.
Additionally, Gemini 1.0 was trained to recognize and understand text, images, audio and more. Google said this gives the software a leg up in understanding nuanced information and questions related to media. This helps also with explaining reasoning in subjects like math and physics.