Massive Turing test shows we can only just tell AIs apart from humans

A test taken by more than 1.5 million people shows that the latest generation of artificial intelligences are almost indistinguishable from humans, at least in a brief conversation.

People can only tell apart artificial intelligences from humans around 60 per cent of the time, according to a test taken by more than 1.5 million people. The results raise questions about whether the new generation of AIs should have to identify themselves in conversation, say researchers.

Who are you talking to?
IR Stone/Shutterstock


Computer scientist Alan Turing first proposed a test for machine intelligence in 1950. In its original form, a person talks via text with both another person and a machine and has to guess which is which — if they can’t differentiate, then the machine has passed the Turing test.

Daniel Jannai at AI21 Labs in Tel Aviv, Israel, and his colleagues devised an online game inspired by the Turing test. In their version, a player can swap messages with either a large language model, like ChatGPT’s GPT-4, or another player. They then have 2 minutes to work out who or what they are interacting with.

To help the AIs avoid giving themselves away when asked, Jannai and his team used three different language models, sometimes switching between them within a single conversation, and gave the AIs random prompts to act as people with specific intentions and objectives. The team also incorporated a typing delay for all players to mask the rapid computation time of the models.

The results of 1.5 million conversations show that people could only tell if they were dealing with another human or an AI 68 per cent of the time. Focusing only on the conversations with AIs, this shrank to 60 per cent, meaning that 40 per cent of people mistakenly identified an AI as human – not much better than a coin flip.

Jannai and his team were surprised at the ways in which people managed to detect AIs or signal to each other that they were human, such as using local slang or asking the AIs to do things they are forbidden to, like provide instructions for making a bomb.

Nishanth Sastry at the University of Surrey, UK, says that as the tests only lasted 2 minutes, longer interactions with AIs in the real world might make it easier to identify them.

There are also long-standing questions as to how useful the Turing test is for assessing machine intelligence, he says. “When you ask, ‘Is an AI entity intelligent?’, it’s ill-defined. You can ask, ‘Is an AI good at maths?’ Or ‘Is an AI good at finding solutions to conflicts when they happen in the workplace?’ Those are slightly better defined, and there might be more concrete answers. In that sense, I find the Turing test less helpful.”

But even if some people aren’t fooled by AIs, enough are for us to be concerned, says Jannai. “Given that, at least in some cases, people can’t tell the difference, what interactions do we want, and should we want, people to experience when they’re interacting with AI bots?” he says. “Should people be informed about whether they’re talking to a person or not, or is it OK for a company to let people talk to customer service bots without acknowledging the fact they’re robots?”

Reference:

Post a Comment

Last Article Next Article