If you ask ChatGPT which number is larger, 9.11 or 9.9, it will answer incorrectly. In fact, it’s not the only AI model that gets this simple math question wrong.
Claude also answers incorrectly, but at least Gemini, Le Chat (Mistral’s chatbot), Copilot, and Llama 3.1 (405B) get it right, in some cases, by perfectly explaining their answer to this trick question.
We’re talking about the most advanced chatbots on the market from companies that have invested enormous amounts of money, manpower, and resources into training these generative AI models. And yet, in many cases, we’re proving once again something I never get tired of saying:
Chatbots screw up—a lot.
They do it all the time with basic math problems like this and other questions. By now, we’re used to AI chatbots failing—hello, glue pizza—and while they can be helpful, you always have to check all their answers. Programmers know this all too well: Around half of the answers ChatGPT shows to programming questions are wrong.
The people behind these AI models clearly state that their chatbots’ answers can be wrong. Their stochastic parrots respond to probabilistic patterns and have no idea what they’re saying. Developers have refined the operation of these AI models, and in many cases, almost surprisingly, they respond with complete accuracy.
However, in the face of uncertainty—“Is ChatGPT getting it right or is it making it all up?”—rises an important question. If we can’t fully trust an AI chatbot, how can we trust an AI-based search engine?
This is what OpenAI is now proposing with its SearchGPT search engine. Its creators have classified this tool as a “prototype”—they even include it in the URL of the announcement page—and only a few users can access it.
There is at least one crucial point here: SearchGPT, like Google’s search engine, includes attribution and links to the sources of the results. This feature is essential for the search engine’s credibility and is also necessary for search engines in general. Perplexity, the first major independent AI-based search engine, seems to have inspired SearchGPT.
While ChatGPT and Copilot have been able to search the web for some time, this is the first time OpenAI has created a product specifically designed as a search engine. It seemed inevitable, especially given that more users, like me, are searching directly on ChatGPT or other AI chatbots.
Many of these searches are questions that expect a direct answer. In recent years, Google has tried to anticipate our desires, knowing that this type of search is becoming increasingly common.
As such, if we asked for a pizza recipe—without glue, of course—Google would show it to us right away and then provide additional links. Suddenly, the search engine became a question-and-answer engine, but one based on media content and the people who created it. The answers aren’t generated probabilistically, as in text-generative AI models.
The question, of course, is how much of SearchGPT is a generative chatbot and how much is a traditional search engine that takes advantage of Internet content by displaying it as results, which is what Google has been doing.
SearchGPT’s reliability and potential success depend on the answer to that question. Google introduced AI Overviews at its Google I/O event a few months ago but did so in a poor and cautious manner. It was then that it became clear that these answers could be an absolute disaster—the example of pizzas with glued-on cheese was the most talked about faux paux—and Google had to apologize a few days later.
But of course, if SearchGPT is to be successful, it can’t just be “another Google.” It needs to leverage the strengths of ChatGPT but do so accurately and, even more complicated, efficiently. Generative AI models pollute and consume a lot of energy and water. They’ll be a waste if they’re not fundamentally better than search engines.
So many challenges for a search engine that is just taking its first steps but faces a big challenge: Not screwing up as much as Google and its alternatives.
This article was written by Javier Pastor and originally published in Spanish on Xataka.
Image | Xataka On with Bing Image Creator
Related | I Tried Nvidia’s Project G-Assist and Now I Finally Understood How AI Aims to Change How We Play
View 0 comments