Hallucination rates for AI models

Sometimes artificial intelligence (AI) algorithms produce outputs that are not based on training data, are incorrectly decoded by the transformer or do not follow any identifiable pattern. When a large language model (LLM), such as a generative AI platform, delivers outputs that are nonsensical or inaccurate, this is considered an AI hallucination.

These unrealistic outputs can be attributed to errors in encoding and decoding, high model complexity, and other factors. To help users shield against erroneous model findings, generative AI technology developer Vectara identified the top 15 AI LLMs with the lowest hallucination rates. The evaluation applied each LLM to 1,000 short documents along with a model to detect hallucinations and provide a percentage of factually inconsistent summaries.

Smaller or more specialized models, such as Zhipu AI GLM-4-9B-Chat, OpenAI-o1-mini or OpenAI-4o-mini represent smaller or more specialized models and have some of the lowest hallucination rates. Intel’s Neural-Chat 7B is also a smaller model. In terms of foundational models, Google’s Gemini 2.0 slightly outperforms OpenAI GPT-4 with a hallucination rate difference of just 0.2%.

To contact the author of this article, email GlobalSpecEditors@globalspec.com

Hallucination rates for AI models

Discussion – 0 comments

INFORMATION TECHNOLOGY

INFORMATION TECHNOLOGY