OpenAI Releases API to a 175-Billion parameter NLP monster
The Godzilla of all AIs just took its first step into the open and if you didn’t feel an earthquake, you and your business surely will soon.
Sound like hyperbole?
OpenAI has spent years guzzling up the entire internet, from Wikipedia to Reddit and any other text (including thousands of books) its team can get their hands on, to refine a natural language processing (NLP) tool that now has an almost inconceivable 175 billion parameters.
To put that in context, Microsoft boasted just in February of its 17 billion-parameter “Turing Natural Language Generation” model. (Consider parameters, crudely, to be your car’s engine size).
Today, the US organisation opened up access to what looks to be an epochal NLP tool for businesses under what will initially be a free, two-month private beta. (OpenAI says it is still determining its longer-term pricing).
Users tapping in to the OpenAI API can use it to create hugely powerful chat bots, write news articles, automate “substantive elements of litigation” and even generate “useful and context-aware code suggestions” after the AI was trained on thousands of open source GitHub repositories.
The Microsoft-backed, Elon Musk-founded AI company’s first commercial release (dubbed simply “the API”) genuinely is industry-shakingly powerful: it can write news stories with merely the faintest snapshot of text, offer summarisation, sentiment analysis, translation* and more.
Petaflops and petaflops of compute needed…
OpenAI API: What’s Under the Bonnet?
The release comes 13 days after the OpenAI team detailed its training of “GPT-3”, a new “autoregressive language model” in an academic paper.
This was trained on 45TB of compressed plaintext from the “Common Crawl” dataset, an expanded version of the WebText dataset, two internet-based books corpora (Books1 and Books2) and English-language Wikipedia.
(For a detailed look at GPT-3 training, some of the issues the team ran into, and some AI-generated poems to boot, see here [pdf]).
The team noted: “For all tasks… without any gradient updates or fine-tuning… GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and close tasks*, as well as several tasks that require on-the-fly reasoning or domain adaptation.”
*A test in which someone has to supply words that have been removed from a passage as a test of their ability to comprehend text.
“Finally”, a team of 30 researchers wrote in the paper, “we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans.”
“Given any text prompt, the API will return a text completion, attempting to match the pattern you gave it. You can ‘program’ it by showing it just a few examples of what you’d like it to do”, OpenAI said today.
Scary or Exciting?
The organisation in February 2019 pulled the plug on open access to an earlier iteration of this tool over fear it could be used to automate fake news generation with alarming ease among other malicious applications.
OpenAi said today that it will track use, and terminate API access for harmful use-cases, like “harassment, spam, radicalization, or astroturfing”
OpenAI admitted that we “also know we can’t anticipate all of the possible consequences of this technology” and acknowledged that this is one of the reasons it is launching in private beta, rather than general availability.
Will your business be signing up for the beta? CIOs, CTOs, are you interested in commercial applications for tools like this? We’d like to hear from you, as well as others testing out the OpenAI API: ed dot targett at cbronline dot com
*With regards to translation, it wasn’t immediately clear in how many languages: a demonstration only showcases English > French. We’ll update this when we have an answer.