By John P. Desmond, AI Trends Editor
Since OpenAI announced last June that users could request access to the GPT-3 API, a machine learning toolset, to help OpenAI explore the strengths and limits of the new technology, some experience is accumulating.
The GPT-3 from OpenAI, the venture founded in 2015 with $1B from investors including Elon Musk, is the third generation of the large language model, with an increased capacity of two orders of magnitude—100 times—over its predecessor, GPT-2. GPT-3 has a capacity of 175 billion machine learning parameters. That is ten times larger than the next large language model, Microsoft’s Turing Natural Language Generator (NLG), according to Wikipedia.
Some researchers have warned about the potential harmful effects of GPT-3. Gary Marcus, author, entrepreneur and New York University psychology professor, published an account with Ernest Davis in MIT Technology Review last August, with the headline: “GPT-3, Bloviator: OpenAI’s language generator has no idea what it’s talking about.” He cited especially a lack of comprehension, and complained that OpenAI had not allowed his team research access to study the model.
Some are gaining access. One of them was Sahar Mor, an AI and machine learning engineer, and the founder of Stealth Co. in San Francisco. According to a recent account in AnalyticsIndiaMag, Mor learned about AI technology not at a university but as a member of Israeli Intelligence Unit – 8200.
“I was one of the first engineers within the AI community to get access to OpenAI’s GPT-3 model,” Mor stated. He used the technology to build AirPaper, an automated document extraction API, launched last September.
The website entices potential customers with “reduce your operational workload” and “No more manual data entry. Extracts what’s important and removes your humans-in-the-loop.”
The first 100 pages are free, then it moves to a subscription basis. “Send any document, either a PDF or an image, and get structured data,” Mor stated.
To gain the access, Mor emailed OpenAI’s CTO with a short background about himself and the app he had in mind. Part of the process to gain approval involves writing what he learns about the shortcomings of the model, and potential ways to mitigate them. Once the application is submitted, one has to wait. “The current waiting times can be forever,” with developers that applied in late June still waiting for a response in mid-March.
The development started with OpenAI’s Playground tool, to iterate and validate if your problem can be solved with GPT-3. “This tinkering is key in developing the needed intuition for crafting successful prompts,” Mor stated. He saw an opportunity for OpenAI to better automate this stage, which he suggested and which was implemented several months later with their instruct-model series.
Next, satisfied with a prompt template, he integrated it into his code. He preprocessed every document, turning its OCT into a “GPT-3 digestible prompt” which he used to query the API. After more testing and optimizing parameters, he deployed the app.
Asked what challenges he faced while training large language models, Mor cited “a lack of data relevant for the task at hand” namely, document processing. A number of commercial companies have document intelligence APIs, but not as open source software. Mor is now building one he calls DocumNet, calling it “an ImageNet equivalent for documents.”
Multimodal Capabilities Combining Natural Language, Images Coming
In January, OpenAI released DALL-E, an AI program that creates images from text descriptions. It uses a 12-billion parameter version of the GPT-3 transformer model to integrate natural language inputs and generate corresponding images, according to Wikipedia. OpenAI also recently released CLIP, a neural network that learns visual concepts from natural language supervision.
Asked if he sees these AI “fusion models” or multimodal systems combining text and images as the future of AI research, Mor stated, “Definitely.” He cited an example of a deep learning model for early-stage detection of cancer based on images, that is limited in its performance when not combined with text in a patient’s charts from electronic health records.
“The main reason multimodal systems aren’t common in AI research is due to their shortcoming of picking up on biases in datasets. This can be solved with more data, which is becoming increasingly more available,” Mor stated. Also, multimodal applications are not limited to vision plus language, but could extend to vision plus language plus audio, he suggested.
Asked if he believes GPT-3 should be regulated in the future, Mor said yes, but it’s tricky. OpenAI is self-regulating, showing that they acknowledge the harmful potential of its technology. “And if that’s the case, can we trust a commercial company to self-regulate in the absence of an educated regulator? What happens once such a company faces a trade-off between ethics and revenues?,” Mor wondered.
How SEO Expert in Australia Gained GPT-3 Access
A search engine optimization expert in Australia also recently gained access to GPT-3, and wrote about the experience in the blog for his company, Digitally Up.
Founder Ashar Jamil got interested in GPT-3 when he read an article in The Guardian that the newspaper said was written by a robot. “ I was excited to use GPT-3 in ways that can help the people in the SEO industry,” stated Jamil, whose company offers digital marketing and social media services.
He completed the OpenAI waitlist access form, detailing the purpose and details of his project, and waited. After a week, getting impatient, he decided to ramp up his effort. He purchased a “fancy domain” for his intended project, designed a demo landing page with a small animation, tweeted about the project with a video and tagged OpenAI chairman. After 10 minutes, he received a reply from him asking for his email.
“After only 10 minutes, I received a reply from him asking me for my email. And boom, I got access,” Jamil stated.
A little different approach for investigating GPT-3 was recently tried by researchers with Stanford University’s Human-Centered AI lab, with an account published at HAI. A group of academics in computer science, linguistics and philosophy were convened in a “Chatham House Rule” workshop, in which none of the participants can be identified by name, the theory being it can lead to a more free discussion.
The participants worked to address two questions: what are the technical capabilities and limitations of large language models? And, what are the societal effects of widespread use of large language models?
Among the discussion points:
Because GPT-3 has a large set of capabilities “including text summarization, chatbots, search and code generation,” it is difficult to characterize all its possible uses and misuses.
Additionally, “It’s unclear what effect highly capable models will have on the labor market. This raises the question of when (or what) jobs could (or should) be automated by large language models,” stated the summary from HAI.
Another comment: “Some participants said that GPT-3 lacked intentions, goals, and the ability to understand cause and effect—all hallmarks of human cognition.”
Also, “GPT-3 can exhibit undesirable behavior, including known racial, gender, and religious biases,” the summary stated. Some discussion ensued on how to respond to this. Finally, “Participants agreed there is no silver bullet and further cross-disciplinary research is needed on what values we should imbue these models with and how to accomplish this.”
All agreed on a sense of urgency to set norms and guidelines around the use of large language models like GPT-3.