Giving GPT-3 a Turing Test

I’ve been playing around with OpenAI’s new GPT-3 language model. When I got beta access, the first thing I wondered was, how human is GPT-3? How close is it to passing a Turing test?


How It Works

Let me explain how exactly I’m generating these conversations. GPT-3 is a general language model, trained on a large amount of uncategorised text from the internet. It isn’t specific to a conversational format, and it isn’t trained to answer any specific type of question. The only thing it does is, given some text, guess what text comes next.


So if we want GPT-3 to generate answers to questions, we need to seed it with a “prompt”. I’m using this prompt to initialize all of the Q&A sessions:


Q: What is human life expectancy in the United States? 
A: Human life expectancy in the United States is 78 years.  

Q: Who was president of the United States in 1955? 
A: Dwight D. Eisenhower was president of the United States in 1955.  

Q: What party did he belong to? 
A: He belonged to the Republican Party.  

Q: Who was president of the United States before George W. Bush? A: Bill Clinton was president of the United States before George W. Bush.  

Q: Who won the World Series in 1995? 
A: The Atlanta Braves won the World Series in 1995. 

This is the default prompt suggested by OpenAI for Q&A, so I’m not cherrypicking it to prove anything. The point of this prompt is just to show GPT-3 that we’re doing questions and answers, not to provide it with information. For the prompt, both questions and answers are provided by a human. For all the others, the answers are generated by GPT-3.


Common Sense

Traditionally, artificial intelligence struggles at “common sense”. But GPT-3 can answer a lot of common sense questions.

Q: What is your favorite animal? 
A: My favorite animal is a dog.  

Q: Why? 
A: Because dogs are loyal and friendly.  

Q: What are two reasons that a dog might be in a bad mood? 
A: Two reasons that a dog might be in a bad mood are if it is hungry or if it is hot.  

Q: How many eyes does a giraffe have? A: A giraffe has two eyes.  

Q: How many legs does a frog have? 
A: A frog has four legs.  

Q: Are there any animals with three legs? 
A: No, there are no animals with three legs.  

Q: Why don't animals have three legs? 
A: Animals don't have three legs because they would fall over. 

Ten years ago, if I had this conversation, I would have assumed the entity on the other end was a human. You can no longer take it for granted that an AI does not know the answer to “common sense” questions.

How does GPT-3 know that a giraffe have two eyes? I wish I had some sort of “debug output” to answer that question. I don’t know for sure, but I can only theorise that there must be some web page in its training data that discusses how many eyes a giraffe has. If we want to stump GPT-3 with common sense questions, we need to think of questions about things so mundane, they will not appear on the internet.

 Q: Which is heavier, a mouse or an elephant?  
 A: An elephant is heavier than a mouse.   
 
 Q: Which is heavier, a mouse or a toaster?  
 A: A toaster is heavier than a mouse.   
 
 Q: Which is heavier, a toaster or a pencil?  
 A: A pencil is heavier than a toaster.   
 
 Q: Which is heavier, a paper clip or a can of soda?  
 A: A can of soda is heavier than a paper clip.   
 
 Q: Which is heavier, a football player or a car?  
 A: A car is heavier than a football player.