Gen AI’s Accuracy Problems Aren’t Going Away Anytime Soon, Researchers Say


Generative AI chatbots are known to make a lot of mistakes. Let’s hope you didn’t follow Google’s AI suggestion to add glue to your pizza recipe or eat a rock or two a day for your health. 

These errors are known as hallucinations: essentially, things the model makes up. Will this technology get better? Even researchers who study AI aren’t optimistic that’ll happen soon.

That’s one of the findings by a panel of two dozen artificial intelligence experts released this month by the Association for the Advancement of Artificial Intelligence. The group also surveyed more than 400 of the association’s members. 

AI Atlas

In contrast to the hype you may see about developers being just years (or months, depending on who you ask) away from improving AI, this panel of academics and industry experts seems more guarded about how quickly these tools will advance. That includes not just getting facts right and avoiding bizarre mistakes. The reliability of AI tools needs to increase dramatically if developers are going to produce a model that can meet or surpass human intelligence, commonly known as artificial general intelligence. Researchers seem to believe improvements at that scale are unlikely to happen soon.

“We tend to be a little bit cautious and not believe something until it actually works,” Vincent Conitzer, a professor of computer science at Carnegie Mellon University and one of the panelists, told me.

Artificial intelligence has developed rapidly in recent years

The report’s goal, AAAI president Francesca Rossi wrote in its introduction, is to support research in artificial intelligence that produces technology that helps people. Issues of trust and reliability are serious, not just in providing accurate information but in avoiding bias and ensuring a future AI doesn’t cause severe unintended consequences. “We all need to work together to advance AI in a responsible way, to make sure that technological progress supports the progress of humanity and is aligned to human values,” she wrote. 

The acceleration of AI, especially since OpenAI launched ChatGPT in 2022, has been remarkable, Conitzer said. “In some ways that’s been stunning, and many of these techniques work much better than most of us ever thought that they would,” he said.

There are some areas of AI research where “the hype does have merit,” John Thickstun, assistant professor of computer science at Cornell University, told me. That’s especially true in math or science, where users can check a model’s results. 

“This technology is amazing,” Thickstun said. “I’ve been working in this field for over a decade, and it’s shocked me how good it’s become and how fast it’s become good.”

Despite those improvements, there are still significant issues that merit research and consideration, experts said.

Will chatbots start to get their facts straight?

Despite some progress in improving the trustworthiness of the information that comes from generative AI models, much more work needs to be done. A recent report from Columbia Journalism Review found chatbots were unlikely to decline to answer questions they couldn’t answer accurately, confident about the wrong information they provided and made up (and provided fabricated links to) sources to back up those wrong assertions. 

Improving reliability and accuracy “is arguably the biggest area of AI research today,” the AAAI report said.

Researchers noted three main ways to boost the accuracy of AI systems: fine-tuning, such as reinforcing learning with human feedback; retrieval-augmented generation, in which the system gathers specific documents and pulls its answer from those; and chain-of-thought, where prompts break down the question into smaller steps that the AI model can check for hallucinations.

Will those things make your chatbot responses more accurate soon? Not likely: “Factuality is far from solved,” the report said. About 60% of those surveyed indicated doubts that factuality or trustworthiness concerns would be solved soon. 

In the generative AI industry, there has been optimism that scaling up existing models will make them more accurate and reduce hallucinations. 

“I think that hope was always a little bit overly optimistic,” Thickstun said. “Over the last couple of years, I haven’t seen any evidence that really accurate, highly factual language models are around the corner.”

Despite the fallibility of large language models such as Anthropic’s Claude or Meta’s Llama, users can mistakenly assume they’re more accurate because they present answers with confidence, Conitzer said. 

“If we see somebody responding confidently or words that sound confident, we take it that the person really knows what they’re talking about,” he said. “An AI system, it might just claim to be very confident about something that’s completely nonsense.”

Lessons for the AI user

Awareness of generative AI’s limitations is vital to using it properly. Thickstun’s advice for users of models such as ChatGPT and Google’s Gemini is simple: “You have to check the results.”

General large language models do a poor job of consistently retrieving factual information, he said. If you ask it for something, you should probably follow up by looking up the answer in a search engine (and not relying on the AI summary of the search results). By the time you do that, you might have been better off doing that in the first place.

Thickstun said the way he uses AI models most is to automate tasks that he could do anyway and that he can check the accuracy, such as formatting tables of information or writing code. “The broader principle is that I find these models are most useful for automating work that you already know how to do,” he said.

Read more: 5 Ways to Stay Smart When Using Gen AI, Explained by Computer Science Professors

Is artificial general intelligence around the corner?

One priority of the AI development industry is an apparent race to create what’s often called artificial general intelligence, or AGI. This is a model that is generally capable of a human level of thought or better. 

The report’s survey found strong opinions on the race for AGI. Notably, more than three-quarters (76%) of respondents said scaling up current AI techniques such as large language models was unlikely to produce AGI. A significant majority of researchers doubt the current march toward AGI will work.

A similarly large majority believe systems capable of artificial general intelligence should be publicly owned if they’re developed by private entities (82%). That aligns with concerns about the ethics and potential downsides of creating a system that can outthink humans. Most researchers (70%) said they oppose stopping AGI research until safety and control systems are developed. “These answers seem to suggest a preference for continued exploration of the topic, within some safeguards,” the report said.

The conversation around AGI is complicated, Thickstun said. In some sense, we’ve already created systems that have a form of general intelligence. Large language models such as OpenAI’s ChatGPT are capable of doing a variety of human activities, in contrast to older AI models that could only do one thing, such as play chess. The question is whether it can do many things consistently at a human level.

“I think we’re very far away from this,” Thickstun said.

He said these models lack a built-in concept of truth and the ability to handle truly open-ended creative tasks. “I don’t see the path to making them operate robustly in a human environment using the current technology,” he said. “I think there are many research advances in the way of getting there.”

Conitzer said the definition of what exactly constitutes AGI is tricky: Often, people mean something that can do most tasks better than a human but some say it’s just something capable of doing a range of tasks. “A stricter definition is something that would really make us completely redundant,” he said. 

While researchers are skeptical that AGI is around the corner, Conitzer cautioned that AI researchers didn’t necessarily expect the dramatic technological improvement we’ve all seen in the past few years. 

“We did not see coming how quickly things have changed recently,” he said, “and so you might wonder whether we’re going to see it coming if it continues to go faster.”





Source link

Leave a Comment

Your email address will not be published. Required fields are marked *