Google Gemini
When Google rolled out its much-anticipated Gemini chatbot, Gemini Advanced, to the UK two weeks ago, the initial reviews were glowing. The company claimed ‘really positive feedback’ from the 100 leading AI experts who got a sneak preview. As with Open AI’s Chat-GPT Pro, Gemini Advanced is a paid-for service that offers more features and capabilities than the free offer available to all users.
The company even claimed that a blind ‘Pepsi challenge’-style user test named the Google offering “the most preferred chatbot”. But a lot of the praise is centred around Gemini Advanced, not the free Gemini Pro, which is based on a simpler model. And this matters, because the free version is what most users will experience.
Gemini’s predecessor was notorious for hallucination and misinformation. SEC Newgate’s digital team has worked with countless clients to clean up outrageous claims made by Bard about their businesses and their senior teams. So we were very keen to put Gemini to the test, especially as we had started to see a difference in Bard’s output during the transition phase.
Firstly, the good news – when it comes to accuracy and hallucination some of the more outlandish claims are gone. When looking at the bios of our senior team, they are much more aligned with company website biographies and other credible online sources. But the problem remains that when online information about an individual is sparse, Gemini has a tendency to fill in the gaps.
Google has clearly put a lot of thought into ensuring that its chatbot no longer makes serious allegations about individuals (and it’s “double-check” feature helps to ensure a user can see where it is unable to back up its claims). But now it tends to imply an issue exists but then refuses to answer further.
For example, a search for any controversy surrounding a client saw Gemini make a vague claim about “a legal battle surrounding [the individual’s] private life” but when prompted for more information it says “I am unable to share details about specific legal proceedings related to individuals' private lives. This is to protect the privacy of those involved and to avoid potentially harmful speculation or misinterpretation.”
Whilst this solves the most pressing issue – ridiculous, invented claims that are damaging the reputations of people and organisations – it now acts like an office gossip, making insinuations and implying scandal but refusing to elaborate, which actually makes fact-checking its claims harder.
Vectara’s Hallucination Leaderboard, which ranks various Large Language Models based on their propensity for generating hallucinations, claims that Gemini Pro has a hallucination rate of 4.8%, significantly down from the eye-watering 27% for Bard, but still above GPT-4 which sits at 3%.
And whilst this closing of the gap is to be welcomed, GPT-4 hallucinations tend to be inaccuracies about someone’s employment history or family life, whilst Gemini picks up where Bard left off and jumps straight into trying to answer questions about controversies and scandal (the sort of questions that Chat-GPT and Co-Pilot answer by telling you they haven’t found anything).
In summary, we’re still worried about the reputational risk our clients face from Gemini. If I were you, I’d be checking out what it says about you and your business as soon as you finish reading this article…