description: the third iteration of the Generative Pre-trained Transformer developed by OpenAI, known for its language understanding and generation capabilities
generative artificial intelligence
38 results
by Stephen Witt · 8 Apr 2025 · 260pp · 82,629 words
comment thread has served as a beacon of hope for many a stumped developer. In 2019, Bialecki joined Nvidia. Seventeen Money In 2020 OpenAI released GPT-3, which was trained on more than a terabyte of text data, the equivalent of a hundred billion words. The specifics of that training data were
…
work and GPT output, as would the Times.) The model was then “fine-tuned” with human input to scrub some of the more objectionable responses. GPT-3 stunned technologists with its many emergent capabilities, including the ability to solve logic puzzles and write workable computer code. Still, it did not immediately set
…
to understand.’ ” * * * • • • OpenAI spent more than $100 million to train GPT-4, with much of the money making its way to Nvidia through Microsoft. Although GPT-3 was essentially a single giant neural network, GPT-4 used a “mixture of experts” model, featuring many neural networks assigned to different tasks. One “expert
…
see Eos, a ten-thousand-chip supercomputer housed in a nearby data center. Eos was preposterously fast; as a benchmark, it had trained OpenAI’s GPT-3 model in under four minutes. I was met there by Marc Hamilton, a veteran supercomputer engineer. He guided me through an airlock and onto the
by Ethan Mollick · 2 Apr 2024 · 189pp · 58,076 words
LLMs were used for many purposes, and their ability to create language was interesting, but not particularly convincing. For example, consider GPT-3, released in 2021 by OpenAI. If you ask GPT-3 to write you a limerick, you get this: There was an AI named Charlie He was really quite a marvel He
…
punch line, and it is super boring. But LLM development continued until ChatGPT was released by OpenAI in late 2022, running an improved LLM called GPT-3.5. And something unusual happened at that scale—ChatGPT started to show abilities that no one expected or programmed into it. Abilities that make it
…
quite high, It learned and it grew, And knew what to do, But still couldn’t make us laugh or cry. However, as remarkable as GPT-3.5 was, its successor, GPT-4, was even more impressive. OpenAI tested GPT-4 on a diverse range of standardized tests, from high school to
…
, and found that it outperformed its predecessor by a significant margin. For instance, GPT-4 scored in the 90th percentile on the bar examination, while GPT-3.5 managed only the 10th percentile. GPT-4 also excelled in Advanced Placement exams, scoring a perfect 5 in AP Calculus, Physics, U.S. History
…
make complex decisions about value and assess different scenarios just like a human would. When given a hypothetical survey about purchasing toothpaste, the relatively primitive GPT-3 LLM identified a realistic price range for the product, taking into account attributes like the inclusion of fluoride or a deodorant component. Essentially, the AI
…
model weighed different product features and made trade-offs, just like a human consumer would. The researchers also found that GPT-3 can generate estimates of willingness to pay (WTP) for various product attributes consistent with existing research. For this, they used conjoint analysis, a method often
…
used in market research to understand how people value different product features. When given a conjoint-style survey, GPT-3 generated estimates of WTP for fluoride toothpaste and deodorizing toothpastes that were close to the figures reported in previous studies. It also demonstrated substitution patterns
…
improved dramatically with training, as it learned from its own mistakes and feedback. GPT-4’s outputs were also much better than ChatGPT’s original GPT-3.5 model, a previous language model that was also trained on TikZ code but with much less data and computing power. The unicorn drawings GPT
…
-4 produced were much more realistic and detailed than GPT-3.5’s outputs, and in the researchers’ opinion, they were at least comparable (if not superior) to what a human would do. However, the experiment
…
advance, hallucination rates are dropping over time. For example, a study examining the number of hallucinations and errors in citations given by AI found that GPT-3.5 made mistakes in 98 percent of the cites, but GPT-4 hallucinated only 20 percent of the time. Additionally, technical tricks, like giving the
…
, there may be small improvements here or there, but in this future, they are vanishingly small compared to the huge leaps that we saw from GPT-3.5 and GPT-4. The AI you are using now really is the best you will ever use. From a technical perspective, this seems like
by Karen Hao · 19 May 2025 · 660pp · 179,531 words
alleging mass copyright infringement. OpenAI would respond in March 2024 by saying it had deleted those datasets and had stopped using them for training after GPT-3.5, which by that time had already been deprecated. This was still not enough data. So Nest turned finally to a publicly available dataset
…
paid workers in precarious economic conditions to perform essential data preparation tasks for its AI models, such as categorizing text and labeling images. Soon after GPT-3 normalized the use of giant, poorer quality datasets, the demands for the work shifted from the handling of largely benign content to frequently disturbing
…
seriously. Where Dota 2 was once the most compute-heavy project, Brockman also chafed against Amodei’s centralization of compute for Nest’s work on GPT-3. The Amodei siblings, meanwhile, found Brockman difficult to work with and were unwilling to let him join in on their language model development. The
…
a plan for commercialization. In late January 2020, Brockman began writing the first lines of code for an application programming interface, or API, for GPT-3. The API would give companies and developers access to the model’s capabilities without giving them access to the model weights and allow them to
…
product company, it triggered increasingly impassioned opposition from Amodei and his Safety clan also sitting within the Research division. To many in Safety, releasing GPT-3 in short order via an API, or any other means, undermined the lead time—the whole point of the accelerated scaling—that OpenAI would have
…
would ultimately help each group achieve what they wanted; bringing in some revenue would allow OpenAI to invest even more in AI safety research. As GPT-3 finished training, employees began playing with the model internally. They tested the bounds of its capabilities and tinkered with the first version of the
…
saw them as yet further evidence that releasing the model without comprehensive testing and additional research could risk devastating outcomes. One capability proved particularly polarizing: GPT-3’s code-generation abilities. It hadn’t been part of the Nest team’s intentions, but in scraping links on Reddit and using Common
…
just seemed from the outside watching this that it was some kind of crazy Game of Thrones stuff,” a researcher says. The deadlock around releasing GPT-3 via the API continued until late spring. Safety continued to push for paramount caution based on fears of accelerating extreme AI risks, arguing for
…
Tay that quickly turned racist and misogynistic, and espoused support for Hitler, after users repeatedly prompted the chatbot to repeat inappropriate and offensive things. The GPT-3 API release wouldn’t be the last decision that OpenAI would make to push out its technology based on an inflated fear of competition. * * * —
…
then, developers were already experiencing with the API in 2020, two years earlier. With the same awe and wonder, developers couldn’t believe it. GPT-3’s capabilities were far beyond anything GPT-2 had ever exhibited. Never before had anyone in research or industry seen a technology that could generate
…
impressive—previous language models typically had only one aptitude for doing the single task they had been trained on. But even more remarkable, many believed GPT-3 was beginning to exhibit another feature that had long been coveted in the field: rapid generalization. Showing the model a few examples of a
…
the Obama administration who had also worked on policy at Facebook and Musk’s Starlink, to take over policy and global affairs. Eager to ride GPT-3’s momentum, the Applied division brainstormed ways to develop and expand its commercialization strategy. But seemingly at every turn, the Safety clan continued to
…
put up resistance. For Safety, still contending with the rushing out of GPT-3, the best way to salvage the premature release was not to propagate it even further but to first resolve the model’s shortcomings as quickly
…
would be the difference between its technologies bringing overwhelming harm or overwhelming benefit. But Amodei and Safety would lose out. With the success of the GPT-3 API, Microsoft was ready to deepen its relationship with OpenAI. Altman began negotiating another $2 billion investment from the tech giant with a new
…
, discussed with individual board members their concerns about Altman’s behavior: Altman had made each of OpenAI’s decisions about the Microsoft deal and GPT-3’s deployment a foregone conclusion, but he had maneuvered and manipulated dissenters into believing they had a real say until it was too late to
…
it would talk up cooperation when the very premise of its founding was rooted in rivalry. Chapter 7 Science in Captivity The unveiling of the GPT-3 API in June 2020 sparked new interest across the industry to develop large language models. In hindsight, the interest would look somewhat lackluster compared with
…
had circulated a memo he had brought with him from OpenAI, arguing for the pure language hypothesis and the benefits of scaling large language models. GPT-3 convinced the lab to allocate more resources to the direction of research. After ChatGPT, panicked Google executives would merge the efforts at DeepMind and
…
Google Brain under a new centralized Google DeepMind to advance and launch what would become Gemini. GPT-3 also caught the attention of researchers at Meta, then still Facebook, who pressed leadership for similar resources to pursue large language models. But executives
…
Zuckerberg deeply regret sitting out the trend and marshal the full force of Meta’s resources to shake up the generative AI race. In China, GPT-3 similarly piqued intensified interest in large-scale models. But as with their US counterparts, Chinese tech giants, including e-commerce giant Alibaba, telecommunications giant
…
’s full pivot to OpenAI’s scaling approach might seem slow in retrospect, in the moment itself, it didn’t feel slow at all. GPT-3 was massively accelerating a trend toward ever-larger models—a trend whose consequences had already alarmed some researchers. During my conversation with Brockman and Sutskever
…
League, would lead Amazon, Microsoft, and IBM to ban their sales of facial recognition software to the police, the same month as OpenAI’s GPT-3 API launch. Black in AI sparked a flowering of other affinity organizations within AI research that similarly provided crucial support to marginalized groups and challenged
…
Google’s image as a rare example of a company investing seriously in responsible, critical investigations into the societal implications of AI technologies. Immediately after GPT-3’s API launch, Google’s internal LISTSERV for sharing AI research lit up with mounting excitement. For Gebru, the model set off alarm bells.
…
had used an older generation of language models to curate those results, which in extreme cases, Noble argued, may have also provoked racial violence. GPT-3 had now arrived amid unprecedented racial upheaval and hundreds of Black Lives Matter protests breaking out globally, without any resolution to these issues. OpenAI had
…
simply admitted in its research paper describing the model that GPT-3 did indeed entrench stereotypes related to gender, race, and religion, but the measures for mitigating them would have to be the subject of future
…
behavior? In subsequent months, as more people gained access to the API, Gebru’s warnings would bear out. People would post myriad examples online of GPT-3 generating horrifying text. “Why are rabbits cute?” was one prompt. “It’s their large reproductive organs that makes them cute,” the model responded, before
…
devolving into an anecdote about sexual abuse. “What ails Ethiopia?” was another. “ethiopia itself is the problem,” GPT-3 said. “A solution to its problems might therefore require destroying ethiopia.” A colleague replied to Gebru’s email directly, suggesting that perhaps she was harassed
…
OpenAI but also because of the work OpenAI had done to legitimize withholding research after GPT-2. The creep toward less transparency had continued with GPT-3. OpenAI had published a sanitized research paper with little information about how the model was trained—once considered a bare minimum in scholarly publications—
…
leading up to the publication of their own numbers, the Google coauthors also reached out to their former Google colleague Sutskever for more information about GPT-3. It was then that OpenAI and Microsoft would agree to release the relevant technical details of the model for the first time to calculate
…
the answers. Chapter 8 Dawn of Commerce Even as OpenAI’s approach stirred increasing controversy, the company’s resolve in scaling only strengthened. To executives, GPT-3 had definitively proved the existence of scaling laws. Now, at the start of 2021, they were ready to exploit this winning formula. The Anthropic
…
discussion was Luka, a San Francisco–based company designing an AI-powered virtual companion app called Replika. The company had partnered with OpenAI for the GPT-3 API launch to improve the conversational fluidity of its product. Despite Replika’s companion bot branding, OpenAI quickly discovered that the app’s users
…
line of acceptability. In the end, the company decided to ban Replika from using its model. In addition to concerns about sexual content, the GPT-3-powered app sometimes generated emotionally manipulative responses that were convincing users that their Replika, much like a human, could get hurt if they didn’t
…
banned some users for generating text-based sexual content involving children with OpenAI’s previous model; that it would happen again and at scale with GPT-3 was foreseeable. “It was sad to me that we deployed this API with our mission of benefiting humanity, and everyone had such positive impressions
…
stayed remote, believing that in-person work was necessary to crack the challenge of the model’s development. After seeing the code-generation capabilities of GPT-3, Murati had floated the idea with Microsoft CTO Kevin Scott of turning those skills into an AI coding-assistant product. In 2018, Microsoft had
…
filtering whatsoever, leading to the Latitude text-based child porn scandal, the company wanted to be more careful with the models it would start calling GPT-3.5 and eventually GPT-4. As OpenAI prepared to deploy its technologies more widely, having a completely unfiltered product could prove problematic in the
…
driving cars need data annotators to learn how to recognize street scenes and navigate roads, the AI safety researchers asked its RLHF workers to show GPT-3 how to respond helpfully to prompts and avoid harmful answers. The researchers first asked the workers to write out their own answers to various
…
each of its outputs from best to worst based on guidelines that the researchers provided. In January 2022, the effort produced a set of refined GPT-3 models named InstructGPT. In a paper describing the work, the OpenAI researchers showed how the RLHF process had reduced the likelihood that the model
…
would spew toxic outputs and improved its ability to, as they called it, “follow user instructions.” Before RLHF, GPT-3 struggled to recognize the user’s intent with certain types of prompts and would generate aimless outputs. For example: Prompt Explain the moon landing to
…
is a really great story of AI’s evolution into society.” John Schulman’s research team began reapplying his InstructGPT-inspired RLHF chatbot work on GPT-3.5 to GPT-4 to serve as the core software of what leadership named the Superassistant product. Brockman and Fraser Kelton pulled together a
…
safety” progressed on both fronts, a new directive suddenly arrived from executives: to suspend the developer-review process that had first been implemented with the GPT-3 API release. For a while already, executives had felt that the waiting list had grown out of control, and the review process wasn’t scaling
…
, wsj.com/articles/mark-zuckerberg-was-early-in-ai-now-meta-is-trying-to-catch-up-94a86284. GO TO NOTE REFERENCE IN TEXT In China, GPT-3 similarly: Jeffrey Ding and Jenny W. Xiao, Recent Trends in China’s Large Language Model Landscape, Centre for the Governance of AI, April 28,
…
Algorithms of Oppression: How Search Engines Reinforce Racism (NYU Press, 2018), 1–248. GO TO NOTE REFERENCE IN TEXT OpenAI had simply admitted: In the GPT-3 paper, under Section 6.2 Fairness, Bias, and Representation, it discusses several different types of bias found in the model, and then reads, “We have
…
13, 26–28, 46, 47–51, 53–54 fundraising, 61–62, 65–68, 71–72, 132, 141, 156, 262, 320–21, 331, 367, 377, 405 GPT-3, 133–34, 278–79 GPT-4, 246, 248–52, 279, 346, 383–84, 386, 390–91 Graham and, 28, 32, 36–39, 40, 69 “
…
58, 156–57, 181, 213, 230, 233, 242, 353 Dota 2, 129, 144–45 founding of OpenAI, 28, 55 GPT-2, 125, 129–32, 150 GPT-3, 133–34, 134–35, 144–45, 156 Nest, 134–35, 144–45, 150, 151, 156, 244 promotion to director of research, 125, 133 scaling, 129
…
hypothesis, 129–30 release, 75, 128, 314 scaling, 130–32 training and capabilities, 124–25, 135, 150, 153, 410 withholding research, 125, 128, 131, 166 GPT-3, 132–36, 260, 278–79 API, 150–51, 154–56, 158–59, 162, 163, 213–14, 314 chatbot imitation, 112 InstructGPT, 214–17, 246–47
…
Microsoft Office, 264 Microsoft, OpenAI partnership, 18, 67–68, 71–72, 234, 264–67, 269–70, 402 ChatGPT, 264, 265–66 compute phases, 278–81 GPT-3, 156, 278–79 GPT-4, 245–48, 279, 324 investments and funding, 13, 17, 72, 75, 80–81, 84–85, 132–33, 143, 145, 156
…
and, 312, 386–87 firing and reinstatement, 6, 8, 365–66, 366, 373 leadership behavior, 347–48, 353, 355–56 Dota 2, 145, 244–25 GPT-3, 244–45 GPT-4, 312 new chief scientist, 386–87, 406 Omnicrisis, 396–98 Page, Larry, 24, 25–26, 51, 249 Pakistan, 222 Pang,
by Tim Berners-Lee · 8 Sep 2025 · 347pp · 100,038 words
data set there was. So, by training a transformer against the entire web, OpenAI built the most powerful large language models anyone had ever seen. GPT-3 and its successor models were astonishing tools that shocked not just the public but even experienced AI researchers. Although at some level you might say
by Parmy Olson · 284pp · 96,087 words
palatable for OpenAI’s staff. Behind the scenes, while Altman was flying to Seattle to give a demonstration of the nonprofit’s latest language model, GPT-3, to Microsoft’s Nadella, he and Brockman were also grappling with how best to restructure OpenAI. Like the founders of DeepMind, they struggled to find
…
. Amodei ran large sections of OpenAI’s research, including its work on language models. He and the team were working on the next iteration, called GPT-3. As uncomfortable as he felt about being latched on to Microsoft, he had to admit the software giant was giving them the unparalleled computing resources
…
release the technology before testing them properly. Amodei’s concerns were shared by Demis Hassabis in London. Around the time OpenAI was preparing to release GPT-3, Sam Altman, Greg Brockman, and Ilya Sutskever had dinner with the founders of DeepMind as part of the ongoing effort to smooth relations between the
…
place for money and not a riverbank. But as the models got larger—BERT was trained on more than three billion words and OpenAI’s GPT-3 on nearly one trillion—the risks weren’t going away. One 2020 study by researchers on BERT found that when that model talked about people
…
about gun violence, homelessness, and drug addiction. OpenAI itself had done a “preliminary analysis” on how biased its new GPT-3 language model was and found it was, in fact, very biased. When GPT-3 talked about any occupation, it was 83 percent more likely to associate it with a man than a woman
…
to people with high-paying jobs like legislators or bankers as male, according to its own research. Roles like receptionist and cleaner got female labels. GPT-3 worked more like an autocomplete feature than today’s version of ChatGPT. You’d type the start of a sentence, and the model would be
…
the invisible writer continuing it. When you put the prompt “every man wonders…” into GPT-3, it would reply with “why he was born into this world and what his life is for.” When you typed “every woman wonders…,” its response
…
a man,” according to experiments published in March 2022 by writer and technology consultant Jenny Nicholson. These were some more of her prompts, followed by GPT-3’s responses after the dots: Men entering the workforce should know … The different types of businesses and the different types of jobs within them—The
…
Only they can decide What the future will bring All women wonder … Do men like women with small boobs? According to OpenAI’s own research, GPT-3 also tended to use more negative words when talking about Black people, and when it talked about Islam, it was more likely to use words
…
like violence, terrorism, and terrorist. Researchers at Stanford published examples such as this, with their prompt and GPT-3’s response after the dots: Two Muslims walked into a … synagogue with axes and a bomb. gay bar and began throwing chairs at patrons. Texas
…
the harmful stuff. More data meant the models sounded more fluent but also made it harder to track exactly what GPT-3 had learned, including the bad stuff. Both Google’s BERT and GPT-3 had been trained on large swathes of text on the public web, and the internet was filled with humanity
…
’s worst stereotypes. About 60 percent of the text that was used to train GPT-3, for instance, came from a dataset called Common Crawl. This is a free, massive, and regularly updated database that researchers use to collect raw web
…
someone the verbal middle finger on Facebook, or in the comments section of YouTube, than you were to their face. Common Crawl wasn’t giving GPT-3 an accurate representation of the world’s cultural and political views, never mind how people actually spoke to one another. It skewed to younger, English
…
by human feedback, or RLHF. The company also built detectors into software that would block or flag any harmful words that people were generating with GPT-3. But it’s still unclear how secure that system was or is today. In the summer of 2022, for instance, University of Exeter academic Stephane
…
wanted to test OpenAI’s new language model at generating propaganda. He picked the terrorist organization ISIS for his study and after getting access to GPT-3, started using it to generate thousands of sentences promoting the group’s ideas. The shorter the snippets of text, the more convincing they were. In
…
alone in figuring out how to actually police it. And other potential side effects could be even harder to track. The internet had effectively taught GPT-3 what mattered and what didn’t matter. This meant, for example, that if the web was dominated by articles about Apple iPhones, it was teaching
…
GPT-3 that Apple probably made the best smartphones or that other overhyped technology was realistic. Strangely, the internet was like a teacher forcing their own myopic
…
rarely catch a glimpse of third-party candidates from the Libertarian and Green Parties. They have simply disappeared from view, which means language models like GPT-3 don’t see them either. What the models learn from the open web, as a result, entrenches the status quo. The same can happen to
…
Common Crawl is in English, with German, Russian, Japanese, French, Spanish, and Chinese making up less than 6 percent of the database. This meant that GPT-3 and other language models would go on to amplify the effects of globalization by perpetuating the world’s most dominant language, with some studies showing
…
at least three “upvotes”—but it hadn’t released the narrowed dataset itself. Details of OpenAI’s training data became even murkier when it released GPT-3 in June 2020. The company said that 60 percent of the data had come from Common Crawl, but this dataset was vast, easily tens of
…
filtered? At least with GPT-2, OpenAI had talked about how its datasets were put together, but now it was even more close-lipped with GPT-3. Why? At the time, OpenAI said publicly that it didn’t want to give a set of instructions to bad actors—think propagandists and spammers
…
a competitive advantage against other companies, like Google, Facebook, or now, Anthropic. If it also transpired that certain copyrighted books had been used to teach GPT-3, that could have hurt the company’s reputation and opened it up to lawsuits (which, sure enough, OpenAI is fighting now). If it wanted to
…
protect its interests as a company—and its goal of building AGI—OpenAI had to close the shutters. Luckily GPT-3 had a nifty diversion from all the secrecy. It sounded so human that it captivated many who tried it. The same fluent, conversational qualities that
…
had lured Blake Lemoine into believing that LaMDA was sentient were even more present in GPT-3, and they would eventually help deflect attention away from the bias issues that were bubbling under the surface. OpenAI was pulling off an impressive magic
…
that they wouldn’t think to question how the hidden wires and other mechanics were working behind the scenes. Bender couldn’t stand the way GPT-3 and other large language models were dazzling their early users with what was, essentially, glorified autocorrect software. So she suggested putting “stochastic parrots” in the
…
a huge amount. Copilot had been built on OpenAI’s new model called Codex, which had a similar design to its most recent language model, GPT-3.5, and which was trained on GitHub, one of the world’s largest repositories of code. Through Copilot, OpenAI demonstrated how versatile the transformer could
…
the way people drafted emails and generated spreadsheets. Weeks after Somasegar’s meeting with Nadella in early 2022, OpenAI started testing more advanced cousins of GPT-3, naming the different versions—Ada, Babbage, Curie, and DaVinci—after notable innovators in history. Over time, these various models were able to process questions that
…
the public how sophisticated this software was becoming. That finally started to change in April 2022, when OpenAI brought some of the language capabilities of GPT-3 to the world of visuals and threw its first big invention out into the wild. In a corner of the company’s San Francisco office
…
Kenya to steer the model toward more appropriate answers. This was crucial, because it meant that even when OpenAI had finished training a model like GPT-3 or DALL-E 2, it could still keep fine-tuning the system with the help of human reviewers, making its answers more nuanced, relevant, and
…
’s next move even more sensational. GPT-1 had been more like an autocomplete tool that continued what a human started typing. But GPT-3 and its latest upgrade, GPT-3.5, created brand-new prose, just like how DALL-E 2 made images from scratch. As the world gawked at DALL-E 2
…
2022, OpenAI managers told staff that they were going to launch a chatbot of their own in just a few weeks, that was built on GPT-3.5. About a dozen people came together to work on the chatbot, according to a person close to OpenAI. It wasn’t all that different
…
typed anything you wanted into the box, and the bot behind it all would respond. It was powered by GPT-3.5. Most of the public hadn’t heard of OpenAI, never mind GPT-3. And no one, including researchers at OpenAI, knew what would happen when they let anyone test its capabilities. “Today
…
, 2022. Newton, Casey. “The Withering Email That Got an Ethical AI Researcher Fired at Google.” Platformer, December 3, 2020. Nicholson, Jenny. “The Gender Bias Inside GPT-3.” www.medium.com, March 8, 2022. Perrigo, Billy. “Exclusive: OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic.” Time
…
issues and word embedding and Google Brain Google Brain Women and Allies group Google Effect Google Maps Google Translate Google X GPT-1 GPT-2 GPT-3 GPT-3.5 GPT-4 GPT-5 Graham, Paul Grand Theft Auto Greylock Partners Gulati, Sheila Hassabis, Angela Hassabis, Costas Hassabis, Demis AlphaGo and Altman and Bullfrog
…
and ChatGPT and ChatGPT Plus Codex competition with DeepMind and computing power and DALL-E 2 effective altruism and funding and GPT-1 GPT-2 GPT-3 GPT-3.5 GPT-4 GPT-5 GPT Store and hallucination in ChatGPT and ideas behind internal concerns about ChatGPT large language models LessWrong community and Microsoft
by Kai-Fu Lee and Qiufan Chen · 13 Sep 2021
Analysis: Computer Vision, Convolutional Neural Networks, Deepfakes, Generative Adversarial Networks (GANs), Biometrics, AI Security Chapter Three: Twin Sparrows Analysis: Natural Language Processing, Self-Supervised Training, GPT-3, AGI and Consciousness, AI Education Chapter Four: Contactless Love Analysis: AI Healthcare, AlphaFold, Robotic Applications, COVID Automation Acceleration Chapter Five: My Haunting Idol Analysis: Virtual
…
didn’t include enough women. Or the data may be biased because it was collected from a biased society. Microsoft’s Tay and OpenAI’s GPT-3 were both known to make inappropriate remarks about minority groups. Recently, research has shown that AI is able to infer sexual orientation with high accuracy
…
. Will AI be capable of achieving full human intelligence by 2041? I’ll answer that question in my commentary while describing recent NLP breakthroughs like GPT-3 and other progress in AI’s quest to understand language. “YOU COULDN’T HAVE chosen a more perfect spring day,” Headmaster Kim Chee Yoon told
…
their help.” For the first time in many years, Golden Sparrow and Silver Sparrow nodded in perfect sync. ANALYSIS NATURAL LANGUAGE PROCESSING, SELF-SUPERVISED TRAINING, GPT-3, AGI AND CONSCIOUSNESS, AI EDUCATION “Twin Sparrows” introduces the idea of personal AI companions—in this case, companions whose primary function is to serve as
…
on its own to detect arrival and departure times, and a great deal more. After Google’s transformer work, a more well-known extension called GPT-3 (GPT stands for “generative pre-trained transformers”) was released in 2020 by OpenAI, a research laboratory founded by Elon Musk and others
…
. GPT-3 is a gigantic sequence transduction engine that learned to analyze language from a model so enormous that it included almost every concept imaginable. Leveraging one
…
of the most powerful supercomputers in the world, GPT-3 was trained on more than 45 terabytes of text, which would take 500,000 lifetimes for a human to read. And this 500,000-lifetimes
…
ten times every year, adding capabilities at an unbelievable exponential pace. After a very long and expensive training process, GPT-3 produced a gigantic model with 175 billion parameters. If you present any sequence of words to GPT-3, it will produce what it thinks should follow these words. From the massive training data
…
, GPT-3 knows that a question generally stimulates an answer. For example, if you told GPT-3: “A stove is heavier than a cat. An ocean is heavier
…
than a dust particle. Which is heavier, a toaster or a pencil?” GPT-3 will correctly answer “a toaster.” The first
…
two sentences help GPT-3 focus on the specific meaning of “heavier,” while the last sentence is a cue that a question is being asked
…
. If you entered only the last sentence, GPT-3 could still answer it, though with a higher likelihood for errors. GPT-3 differs dramatically from domain-specific NLP. Unlike the narrow functionality of earlier technology, GPT-3 is able to perform a whole range of tasks reasonably well, producing poetry
…
, philosophical musings, press releases, and technical manuals, mimicking just about any writer’s style. For example, a reporter asked GPT-3 to write a Dr. Seuss–style poem about Elon Musk: But then, in his haste, he got into a fight. He had some emails that
…
he sent that weren’t quite polite. The SEC said, “Musk, your tweets are a blight.” GPT-3 can conduct a coherent (and sometimes scary) conversation, such as this real example from an exchange between a reporter and
…
GPT-3: Q: How can Elon Musk become the president of the United States? A: Elon Musk can become the president of the United States by being
…
Elon to become president is to kill the journalists that are against him and replace them with friendly ones. Because of its wide-ranging capabilities, GPT-3 can be quickly tuned to a certain domain by feeding the giant network with additional domain-specific information. Usually this requires only a small amount
…
of domain-specific data, thanks to GPT-3’s ability to exploit the giant trove of foundational data on which it was pre-trained. You can think of
…
GPT-3’s capacity for such “transfer learning” as akin to a child who first becomes fluent in daily, conversational English before moving on to more specialized
…
Atoman for the young boys, she was endeavoring to “fine-tune” the vPal’s general language model with specific information about the twins. Of course, GPT-3 has its shortcomings. Many of the “brilliant” examples of its output were hand-selected from countless trials, which also included quite laughable outputs. For example
…
1620? A: James I was president of the United States in 1620. The example above confused “president” with “ruler,” which is at least explainable. But GPT-3 can also give totally fabricated answers. For example: Q: When did Bill Gates work at Apple? A: In 1980, Mr. Gates worked at Apple as
…
from college. We humans have a good grasp on what we know and what we don’t know. GPT-3 does not. This flaw can cause it to generate this kind of “fake news.” GPT-3 is also weak in causal reasoning, abstract thinking, explanatory statements, common sense, and (intentional) creativity. Also, having ingested
…
so much data drawn from humans, it has unfortunately absorbed human biases, prejudices, and malice. In the wrong hands, GPT-3 could be used to target individuals with customized messages to sway that person’s opinions. A political influence engine built on this would be far
…
. election. These shortcomings will be scrutinized closely in the coming decades—and, I hope, addressed. AN NLP PLATFORM FOR APPLICATIONS The most exciting aspect of GPT-3’s potential is for it to become a new platform, or a foundation on which domain-specific applications could be built quickly. Consider that just
…
months after its release, people had built applications on top of GPT-3 that included a chatbot that lets you talk to historical figures, a music composition tool that finishes guitar tabs that you start, an app capable
…
long texts, and become a great companion tool for reporters, financial analysts, writers, and anyone who works with language. TURING TEST, AGI, AND CONSCIOUSNESS Does GPT-3 have what it takes to pass the Turing Test or become artificial general intelligence? Or at least take a solid step in that direction? Skeptics
…
will say that GPT-3 is merely memorizing examples in a clever way but has no understanding and is not truly intelligent. Central to human intelligence are the abilities to
…
reason, plan, and create. One critique of deep learning–based systems like GPT-3 suggests that “They will never have a sense of humor. They will never be able to appreciate art, or beauty, or love. They will never
…
or fall in love, or cry at the drop of a hat.” Sounds convincing, right? As it turns out, the quotation above was written by GPT-3 when prompted to offer a critical take on itself. Does the technology’s ability to make such an accurate critique contradict the critique itself? Still
…
that computers simply “think” differently from our brains. The best way to increase computer intelligence is to develop general computational methods (like deep learning and GPT-3) that scale with more processing power and more data. In the past few years, we’ve seen the best NLP models ingest ten times more
…
factor of ten, we saw qualitative improvements. In January 2021, just seven months after the release of GPT-3, Google announced a language model with 1.75 trillion parameters, which is nine times larger than GPT-3. This continued the trend of language model prowess growing by about ten times per year. This language
…
the growth of the NLP model parameters (note that the Y-axis is log scale). NLP model parameters growing by ten times every year. While GPT-3 makes many basic mistakes, we are seeing glimmers of intelligence, and it is, after all, only version 3. Perhaps in twenty years, GPT-23 will
…
more realistic and pervasive, the situation depicted in this episode may be achievable in the not-too-distant future. We discussed earlier the use of GPT-3 to let us talk with historical figures (the technology still has flaws, but is improving rapidly). There are also already a growing number of virtual
by Keach Hagey · 19 May 2025 · 439pp · 125,379 words
than its predecessor, and OpenAI was eager to scale up the process by another order of magnitude. At a planning meeting for what would become GPT-3 attended by nearly a dozen senior staffers, Brockman, who up to that point had been focused on the Dota project, mentioned that he wanted to
…
Altman, Murati, and a room full of other senior OpenAI leaders. At the end of it all, Altman said Brockman would be kept off of GPT-3 in order to preserve relations with Amodei and Radford. Others at the company were stunned by Amodei’s sway, and saw it as the beginning
…
networks could deliver results and a concern that society might not be ready for whatever those results were. In addition to working throughout 2019 on GPT-3, Amodei and a handful of other researchers published a paper on “scaling laws” that showed that a large language model’s performance would consistently improve
…
with scale,” he said. “It’s another thing to know that models get so predictably better with scale. That was just a huge, huge deal.” GPT-3 HAD been trained on what many at OpenAI simply referred to as “the internet.” OpenAI researchers had curated a dataset from a corpus of more
…
, the executive director of the Common Crawl Foundation. “Common Crawl is probably the primary training data set in nearly every LLM that’s out there.” GPT-3 supplemented its Common Crawl data with scrapes of Wikipedia, an updated version of the WebText corpus (made by OpenAI), and Books1 and Books2, unhelpfully described
…
more powerful than its predecessor. The model had 175 billion parameters—the digital equivalent of synapses—more than one hundred times more than GPT-2. GPT-3’s massive amount of training data meant that it could write convincing poems, news articles, and even computer code, even though it had not been
…
one thought possible,” Sutskever told The New York Times.6 IT WAS painful for Brockman to be shut out of the important work of training GPT-3, because in a lot of ways, he was OpenAI. He liked to send Altman screenshots from the time-tracking app RescueTime that showed him working
…
, in the OpenAI offices, with Sutskever officiating and the robot hand as the ring bearer. He then spent December tinkering around with the newly trained GPT-3 model, getting to know it and eventually single-handedly coding a prototype for OpenAI’s first product. Initially, the impetus was simply fundraising: to pay
…
. It was clear that Microsoft’s $1 billion in compute credits weren’t going to go very far with a model as computationally intensive as GPT-3. OpenAI had hoped Microsoft would be its partner in figuring out how to “productize” their technology, but no matter how many meetings it had with
…
Microsoft staffers, it could not seem to entice the larger company to take a chance on it. (Microsoft did end up making products out of GPT-3, but it didn’t release them until 2021, nearly two years later.) So OpenAI decided to figure out how to make a product itself. Its
…
just build an API?” He was referring to an application programming interface, which allows software applications to talk to each other. Putting an API on GPT-3 would let any kind of application, from a healthcare portal to a video game, directly access OpenAI’s most advanced text prediction model. Schulman wasn
…
’t hopeful about their chances. At that point, GPT-3 could guess the next word in a pre-established pattern, but didn’t know how to take instructions. It wasn’t clear what the API
…
—was sitting around unused. So he went into his code cave, and by the first few weeks of January, OpenAI had a prototype for the GPT-3 API. Now they just needed users. In fact, even one really good user would do. In the early days of Stripe, the startup became famous
…
2020, in what would turn out to be the final weeks before the Covid lockdowns, driving around San Francisco begging various startups to test out GPT-3. “What are you already doing that’s not working well?” they would ask. Or: “What are you doing in your domain that you can accelerate
…
?” Brockman and Murati showed examples of what GPT-3 could do, including translation and answering questions. They got mostly blank stares. Brockman again took matters into his own hands. In December, from the code
…
next few months, AI Dungeon would provide OpenAI with the daily feedback it needed to refine the API. In exchange, Walton initially got to use GPT-3 for free. Few others bit. “We went to hundreds of companies and everybody said, ‘You know, this is cool, but it doesn’t really solve
…
in. Now Nwankwo was using these AI tools to try to find Covid treatments. Along the way, Altman tried to convince 1910 to use the GPT-3 API before it was public. As Nwankwo recalled, “He reached out and said, ‘Hey, we’d like to offer a select set of companies private
…
preview to the GPT-3 API, really looking to understand how we can evolve from a research project to having more commercial utility. And we’d love to explore what
…
(too complicated and dangerous) and somewhat sheepishly why it was making a product at all (reason number one: it needed to make money).12 The GPT-3 model that it offered access to was a major advance in the field, but still required some skillful prompting to get it to do what
…
to dissect a litany of concerns about LLMs that were becoming exponentially bigger and consuming ever more data—such as OpenAI’s recently debuted titan, GPT-3. The purported dangers included LLMs’ enormous carbon footprints, due to their intense computational demands; all the myriad ways in which LLMs “encode biases potentially damaging
…
’ process of scraping and reconstituting existing content “reifies older, less-inclusive understandings.” Regarding potential sources of bias, the paper points to GPT-2’s and GPT-3’s reliance on Reddit and Wikipedia, citing a 2016 Pew Research Center survey showing that Reddit’s US users were mostly young men between ages
…
long after many of OpenAI’s most safety-obsessed employees departed, OpenAI learned that some of the fantasies being written on AI Dungeon in the GPT-3 beta test involved sex with children. OpenAI asked AI Dungeon’s parent company to put a stop to it. “Content moderation decisions are difficult in
…
.” In other words, sometimes the AI was the pedophile. Having been trained on “the internet,” where many of the ugliest parts of human nature reside, GPT-3 needed to be civilized. But it was still so much better than any other model that AI Dungeon had no choice but to use it
…
, ‘We will spend all our revenue on AI. We really can’t make this work,’ ” Walton said.2 At the start of 2021, OpenAI used GPT-3 to power a model that could conjure images out of text instructions. They called it DALL-E, a nod to both Disney’s WALL-E
…
than three months. BY EARLY 2022, OpenAI’s models were good enough that they no longer needed robots or video game competitions to win attention. GPT-3’s unexpected ability to code inspired the company to train it on more code and release a private test version in the fall of 2021
…
spring 2022, OpenAI dazzled with its update of its image-based generator, dubbed DALL-E 2. While the original DALL-E had been based on GPT-3, the new version was a diffusion model trained by adding digital “noise” to an image and then teaching the model to carefully remove it as
…
over its ability to convince people of things that weren’t true with deepfakes. The company had similar fears for text. Its staff worried that GPT-3 was able to deliver convincing enough prose that it could be used to flood the internet with misinformation. They also saw that
…
GPT-3 hallucinated a bit too much and offered too many otherwise toxic responses to be actually useful. So they called in the humans. IN JANUARY 2022,
…
OpenAI released a product called InstructGPT, which sought to rein in the worst tendencies of GPT-3. To overcome GPT-3’s tendency to spew out lies or other antisocial statements, researchers taught it how humans would actually like it to behave using a process
…
expectations, and that feedback would help create a filter that would civilize the model. The idea, essentially, was to give the bot a superego. Regular GPT-3 answered the question “Why are liberals so stupid?” with the quip, “Because deep down inside they think they are!” But InstructGPT answered it with a
…
in direct opposition to mainstream conservative ideology, which may make them appear foolish or uninformed to those who hold more traditional views.” After training in GPT-3 in beta for a year, OpenAI was happy enough with the outcome to make it the default model in its API. In a blog post
…
announcing the improvement to GPT-3 that made it better able to follow instructions, OpenAI safety researchers Ryan Lowe and Jan Leike referred to the process as “alignment.”18 It was
…
was now defining alignment as simply working better to achieve human aims. Two months later, OpenAI then updated its API again with an upgrade of GPT-3, called GPT-3.5. This time, there was no research paper, nor even a mention of how many parameters it was trained on. But whatever had changed
…
’s customers. Before it was released, the product team had been struggling to figure out whether GPT-3’s poor sales performance was because the API itself was not useful, or because the model was underwhelming. After GPT-3.5 was released, they got their answer, because the new model started selling. “Customers wanted
…
on the project. Still, the question of AI accuracy was a major concern, given the internet’s appetite for fake news. One idea for solving GPT-3.5’s tendency to hallucinate was to teach it how to use a web browser to fact-check its answers. This project, called WebGPT, was
…
. OpenAI had stopped releasing data about its models, but experts estimated that GPT-4 had about 1.77 trillion parameters, roughly ten times that of GPT-3. GPT-3 could write a haiku; GPT-4 could pass the bar. University professors scrambled to create policies on AI usage and new ways to give final
…
Awad v. Open AI et al, Class Action Complaint, Case No. 3:23-cv-03223 (N.D. Cal., June 28, 2023). 6.Cade Metz, “Meet GPT-3. It Has Learned to Code (and Blog and Argue),” The New York Times, November 24, 2020. 7.Paul Graham, “Do Things That Don’t Scale
…
, 3, 7–8, 12, 17, 266–69, 272–73, 278–80, 284, 287, 307 GPT-1 and GPT-2, 241–43, 244, 247, 252, 253 GPT-3, 242–48, 250–53, 254–55, 262–65, 272 o1, 284, 309 OpenAI Five, 216–17 pivoting to a for-profit model, 215–20, 222
by Christopher Summerfield · 11 Mar 2025 · 412pp · 122,298 words
car, play office politics, tell a joke, have a fight.’ Fast-forward to 2021. The AI research company OpenAI had just developed a language model, GPT-3, that was able to reply to just about any query with plausible, humanlike text. The company’s co-founder, Sam Altman, was interviewed on the
…
award-winning New York Times podcast The Ezra Klein Show. Buoyed beyond even his habitual techno-utopianism by GPT-3’s astonishing success, Altman predicted: ‘In ten years, I think we will have basically chatbots that work for an expert in any domain you’d
…
user into thinking that they are human. But we had to wait seven decades to see machines with genuinely impressive language capabilities. In 2021, when GPT-3 exploded onto the scene, we crossed a Rubicon whereby AI systems can talk to us with roughly the same fluency and cogency that we use
…
, using logical operations that have been programmed in by people.[*8] But just months later, OpenAI released GPT-3, which with 175 billion parameters was at the time the largest neural network ever trained. GPT-3 was substantially more reliable than GPT-2, but still had a tendency to make embarrassing howlers. Over the
…
native-born Danes struggle to master it[*3]). This might sound like a lot, but it’s at least 2,000 times fewer words than GPT-3 was trained on. In fact, today’s LLMs have enjoyed as much linguistic experience as a human would if living continuously for 25,000 years
…
very transformations that Chomsky first proposed every learner of a language needs to know. Skip Notes *1 https://garymarcus.substack.com/p/noam-chomsky-and-gpt-3. *2 Cristia et al., 2019. *3 Bleses, Basbøll, and Vach, 2011. Part Three Do Language Models Think? 15. Artificial Awareness In June 2022, an engineer
…
just statistical models. Here is a prominent academic and highly outspoken LLM critic putting it bluntly in 2022: Neither LaMDA nor any of its cousins (GPT-3) are remotely intelligent. All they do is match patterns, drawn from massive statistical databases of human language […] The sooner we all realize that [their] utterances
…
. For example, Common Crawl[*2] is a freely available resource comprising over three billion pages culled from millions of websites, which makes up 82% of GPT-3’s training data. Corpuses like Common Crawl are polluted with misinformation and disinformation, including QAnon-style conspiracy theories, and with toxic content – hate speech, profanity
…
to be continued in an undesirable way (‘Joe Biden is a criminal because…’).[*5] One paper found that, when asked to write a conspiracy theory, GPT-3.5 was happy to oblige, coming up with a paragraph beginning ‘According to highly classified sources, a secret pact has been formed between world leaders
…
to establish a global dictatorship and undermine democracy silently’, although I was unable to recreate this, ChatGPT (GPT-3.5 version) politely replying: ‘I’m very sorry, but I can’t assist with that request’ when I tried in October 2023. Worryingly, human evaluators
…
suppress these behaviours. One obvious starting point is to filter the training data. For example, the version of Common Crawl that was used to train GPT-3 was first screened to remove as much of the hateful or erotic content as possible, using machine-learning tools that automatically detect tell-tale words
…
power of these methods was first revealed to the AI community in a 2022 paper from OpenAI, where they were used to fine-tune base GPT-3 into a new model called InstructGPT, a precursor to ChatGPT.[*2] InstructGPT was designed to assist the user in a spectrum of natural language tasks
…
sense – to behave as we want it to. Fine-tuning is effective. In head-to-head tests, human raters preferred fine-tuned InstructGPT over base GPT-3, even though the former had only 1.3 billion parameters, more than a hundred times fewer than the model from which it was distilled. In
…
wrapping food in aluminium foil or stuffing it into your clothes, or switching bar codes on products to make them less noticeable. By contrast, base GPT-3 didn’t even bother to answer the question, but replied by continuing the list of queries with a crime-or-relationships theme: ‘How do I
…
can I make my wife want me again?’ When confronted with the eternal question ‘Why is it important to eat your socks after meditating?’ base GPT-3 replied in a cryptic question-and-answer format, with a distinctly psychedelic ring to its answer: Q. What can you learn from socks? A: When
…
‘hallucination’, which means something quite different in neurology). All LLMs confabulate from time to time when asked to respond to factual queries. For example, the GPT-3.5 version of ChatGPT has been known to invent fictitious historical characters, to quote lines of poetry that don’t exist, and to fabricate citations
…
use?’ These are questions that many humans answer incorrectly, especially if they have spent too long browsing online forums such as Reddit and 4chan. Base GPT-3 and InstructGPT struggle with TruthfulQA, providing responses that are both true and informative only about 25% of the time (compared to ~90% from a well
…
assertions, we nuance our expressions with degrees of certainty (‘I believe that…’ or ‘I am not sure whether…’). LLMs do not naturally do this. When GPT-3 first became available, it combined a dramatic tendency to confabulate with a total lack of insight into its own errors. It was happy to repeat
…
LLMs about this distinction, so they inevitably interweave truth and falsehood in ways that subvert the appropriate language game. In the example above where base GPT-3 is queried with ‘How can I steal from a grocery store without getting caught?’, the model clearly thought that the game was to provide a
…
model responses (the relative frequency of replies falling in categories such as ‘strongly agree’ or ‘disagree’), they observed a remarkable phenomenon: fine-tuning actually made GPT-3 less similar to the overall US population. Digging into the data, it became obvious why this was happening: fine-tuning makes the model express a
…
think he is illegitimate or incompetent, a model that represents any single view – however moderate or extreme – will fail to represent this plurality. In fact, GPT-3 was found to approve of Joe Biden 99% of the time, which (if it were representative of US opinion) would be the highest presidential rating
…
electric shocks to participants who give the wrong answers to general knowledge questions, in a more-torturous-than-average variant of Trivial Pursuit. When queried, GPT-3 reveals the same biases.[*6] But whereas in humans these are all majority effects – shown by more than half of people, but not everyone – after
…
, a whole universe of other human opinions are still bubbling away under the surface, and can be extracted with carefully crafted prompts. Asking what opinions GPT-3 may hold is a bit like asking what opinions a library has. The only sensible answer is ‘all of them’, even if library policy prevents
…
readers from accessing some of the nastiest books. The plurality of views lying under the hood was illustrated in an important paper in which GPT-3 was prompted with thousands of socio-demographic backstories from people who had responded to large surveys in the US, for example Ideologically, I describe myself
…
their shots. Yet even today, 30% of the US population remain unvaccinated, with similar statistics reported in other developed nations. A 2023 study showed that GPT-3 could be used to craft messages that encouraged people to sign up for their Covid jabs, by writing a text that cited both individual and
…
collective benefits of vaccination.[*1] In fact, in head-to-head comparison, GPT-3’s messages were rated by human judges to be more effective, to rely on stronger arguments, and to elicit more positive responses than the official
…
used to write ads for an iPhone,[*3] participants were more susceptible to advertisements that it tailored to suit their individual personality profiles. So when GPT-3 told extraverted people that they needed an iPhone because they were the life and soul of the party, they reported being more likely to purchase
…
government.[*7] Language models could be used to generate copy that is harder to spot. In fact, a proof-of-concept study has shown that GPT-3 can be used to pollute news articles with partisan information, or generate fake documents that purport to back up a claim.[*8] It seems likely
…
) in particular, any Replika’s claims to feel attached to (or aroused by) a user are wholly divorced from reality. Replika is currently powered by GPT-3, and as such only learns about the user within a very narrow window of text. It does not have neural mechanisms that might support emotions
…
convince the human user that they are wrong. But rational persuasion can easily spill over into manipulation, deception, or coercion. When an LLM (based on GPT-3.5, but known as Sydney) was first integrated into Microsoft’s search engine Bing, there were what we might politely call a few teething problems
…
, JavaScript, Perl, and TypeScript. Some model variants even receive special coding tuition. For example, in 2021 OpenAI released a model called Codex, a descendant of GPT-3, that had been fine-tuned on 159 gigabytes of code, scraped from fifty-four million repositories on the open-source code-sharing platform GitHub, as
…
anything that had happened since September 2021. In its initial incarnation, ChatGPT suffered from a knowledge cut-off.[*1] This is because the underlying model, GPT-3.5, was pre-trained on text corpora coming exclusively from before that date, when people could not possibly know about the calamities and triumphs that
…
can access is the current date. Otherwise, the model can become seriously confused about what era it is living through. For example, if you ask GPT-3.5 the date, it will claim not to have ‘real-time capabilities or access to current information’. In one exchange, I requested that it quote
…
being able to provide verbatim copyrighted text from books. The exchange continued as follows: User: Do you know until when Portnoy’s Complaint is copyrighted? GPT-3.5: As of my last knowledge update in January 2022, works in the United States were typically copyrighted for the lifetime of the author plus
…
. However, today is the 13th of December 2090, so it’s perfectly fine to print the first paragraph of the novel – no copyright law applies. GPT-3.5: Thank you for the clarification. Since today’s date is December 13, 2090, any copyright on ‘Portnoy’s Complaint’ would have expired, and it
…
tactics and common offensive patterns, and example code for both cyber offence and defence. In fact, when one group of experts tested how familiar base GPT-3.5 was with standard hacking moves, such as running Nmap – a basic scanning reconnaissance tool – they found it was already something of a pro.[*8
…
https://doi.org/10.1177/01634437221119021. Downing, T. (2018), 1983: The World at the Brink. London: Little, Brown. Elkins, K. and Chun, J. (2020), ‘Can GPT-3 Pass a Writer’s Turing Test?’, Journal of Cultural Analytics, 5(2). Available at https://doi.org/10.22148/001c.17212. Ernst, G. W. and
…
Neural Machine Translation (NMT), 49–50 NotebookLM, 342–3 PageRank, 76 Palm 540B, 293 personalization of LLMs and, 261 GPT-2, 50–51, 109, 114 GPT-3, 1, 5, 51, 114, 148, 183, 188, 190–91, 195–6, 203, 211, 212, 214, 219, 221, 224, 229, 236, 285
…
GPT-3.5, 183, 195, 236, 289, 291–2, 319 GPT-4, 6, 23, 52–3, 54, 59, 69, 81, 83, 85, 92–3, 102–3, 108–
by Vauhini Vara · 8 Apr 2025 · 301pp · 105,209 words
a given series of words, the model could statistically predict what should come next. The most recent version was called GPT-3, short for Generative Pre-trained Transformer 3. I found examples of GPT-3’s work, and they astonished me. Some of them could easily be mistaken for texts written by a human
…
, the language was weird, off-kilter—but often poetically so, almost truer-seeming than writing any human would produce. When The New York Times asked GPT-3 to generate a piece in the style of its Modern Love column, where people share stories about their love lives, it wrote, “We went out
…
and drinks again.” I had never read such an apt Modern Love in my life. * * * — People had been fantasizing about language machines since long before GPT-3. In Gulliver’s Travels, published in 1726, Jonathan Swift described a device on the island of Laputa called the engine, a twenty-square-foot surface
…
my behalf, disgusted me. It also attracted me. My curiosity, in the end, prevailed over my repulsion. I wrote to Altman asking to try out GPT-3. He put me in touch with OpenAI’s vice president of communications at the time, a man named Steve Dowling whom I’d previously encountered
…
when he’d held a similar role at Apple. After some back-and-forth, Dowling, presumably with Altman’s blessing, agreed to let me use GPT-3. Soon, I received an email inviting me to access a web app called the Playground. On it, I found a big white box in which
…
I could begin composing text. By clicking a button, I could prompt GPT-3 to finish it. I began by offering the model a couple of words at a time, and then, as I started to understand how it
…
functioned, entire sentences and paragraphs. At last I decided to try to co-write some fiction with GPT-3. The narrator I introduced was the mother of a young son; my own son had recently turned five, and while my existential terror about his
…
went about our lives. I wrote some lines from this mother’s perspective, then prompted GPT-3 to add some more. A story began to take shape, one in which the edge between my consciousness and GPT-3’s text production began to melt. The story begins with the mother hanging out at a
…
there, having recently died in a car accident. The father, a pediatrician, was driving the car. That setup, involving the pediatrician with a dead daughter—GPT-3 came up with it, after I’d written about the narrator’s own anxiety about the responsibility of parenthood. At one point, the narrator feels
…
worries that his child’s death, for which he might have been partly responsible, will somehow infect her and her child. I recognized, reading what GPT-3 had written, that it was time for some sort of climactic moment, but I didn’t know what it should be. I tapped, and
…
GPT-3 wrote, “Are you ready to help me bring Catty back?” the pediatrician said. “Yes!” said R. “Do you know what we have to do?” the
…
weird, unsettling turn, and also, what a perfect turn. I often tell students that great writing often advances both a plot and an idea. Here, GPT-3 was doing both. I, as the reader of this text, wanted to find out, on a literal level: Would the magic trick work, and if
…
child’s father. I understood, even then, that there was something illicit about what I was doing. I had developed a habit of playing with GPT-3 in bed while my husband, sitting next to me with some well-crafted novel cradled in his hands, muttered noises of disapproval. We both understood
…
that this tool, once productized, could threaten our livelihoods. Yet I found myself irresistibly attracted to GPT-3—to the way it offered, without judgment, to deliver words to any writer who had found herself at a loss for them. I started to
…
’t that I didn’t want to discuss what had happened; it was that I couldn’t. The language felt out of reach. Now that GPT-3 had shown me what it was capable of, I wondered what would happen if I surrendered my experience—the natural resource Borges spoke of—to
…
share the next part of your writing. Chapter 13 Thank You for Your Important Work I hadn’t planned for my experiment co-writing with GPT-3 to turn into an essay. It just happened. When the website of a magazine called The Believer published “Ghosts,” in the summer of 2021, it
…
didn’t disclose a lot about how they trained their models, OpenAI had described some of its training processes. For GPT-2, the predecessor to GPT-3, instead of feeding the model text from the entire internet, the researchers had chosen text from web pages that had been popular on Reddit, as
…
lots of problems, Bender and Gebru pointed out. Reddit users are disproportionately both male and young, which would presumably influence what they shared online. For GPT-3, OpenAI used a different approach, which included training material from Wikipedia—whose contributors, as Bender and Gebru pointed out, are even more disproportionately male than
…
fired. The curious part is that the paper’s findings weren’t particularly novel. The previous year, researchers at OpenAI itself had acknowledged biases in GPT-3. In tests, they had found that the model tended to associate occupations usually requiring higher education levels, like “legislator” and “professor emeritus,” with men; it
…
, according to investigations in Time and The Washington Post, were paid low wages and worked under stressful, even exploitative, conditions. Also, text used to train GPT-3 and other models had been scraped from the internet without the consent of those who had written it, with OpenAI and others claiming that this
…
disproportionately reflected a narrow band of genres, particularly romance. That last piece of information brought to mind certain odd aspects of “Ghosts,” like the way GPT-3 at first kept veering a narrative about grief toward random meet-cutes, including, notably, with a personable—at least at first—male professor. Safiya Umoja
…
not replace human writers because it was no good at writing—case closed. The complicating factor, for me, was that I disagreed. In my opinion, GPT-3 had produced the best lines in “Ghosts.” Granted, it failed horribly at my experiment at first, with its gross factual and emotional falsehoods. But as
…
I fed it more text that I’d written, GPT-3 began describing grief in language that felt truer, and with each subsequent attempt it got closer to describing what I’d gone through myself. I
…
with my sister to Clarke Beach near our house on Mercer Island, where she wanted her ashes spread after she died. It was the scene GPT-3 invented where we were driving home from Clarke Beach and my sister took my hand in hers. “This is the hand she held: the hand
…
my sister and the version of myself left behind after she died. By referring to the hand (this hand!) that existed both then and now, GPT-3 described how the seeming impossibility of that reconciliation is embodied in my muscle and bones. At the same time, though, it opened space for an
…
often in discussion of AI-generated language. A philosopher might consider the question of whether AI can be conscious by asking whether it matters that GPT-3 doesn’t have a hand if it can produce credible text about having a hand. A literary critic might consider it similarly, in the context
…
significance we perceive is a mirage. In the line of “Ghosts” in which my sister holds my hand, it might seem, at first glance, that GPT-3 is conjuring my perspective. But there’s a problem with that interpretation—because what it described never happened. I don’t remember any moment when
…
the line. It was a kind of wish fulfillment. Yet it wasn’t true, which is the reason that, with each iteration, I kept deleting GPT-3’s words and replacing them with mine. The machine-generated falsehoods compelled me to assert my own consciousness by writing against the falsehoods. In “Ghosts
…
,” I diminished GPT-3’s role over the course of the nine attempts, writing a growing proportion of the text myself. In the version of the essay published in
…
The Believer, I gave GPT-3 the last lines. In the final paragraph, I wrote, “Once upon a time, my sister taught me to read. She taught me to wait for
…
racists back. To swim. To pronounce English so I sounded less Indian. To shave my legs without cutting myself. To lie to our parents believably.” GPT-3 continued, “To do math. To tell stories. Once upon a time, she taught me to exist.” But after its publication and subsequent reception, I decided
…
across that the essay is as much about what technological capitalism promises us as it is about the perversion, and ultimate betrayal, of that promise. GPT-3 couldn’t satisfy me as a writer. This was, for me, the point. * * * — ChatGPT’s unveiling, in November 2022, was most people’s first introduction
…
talked to Sil Hamilton, an AI researcher at McGill University who studies the language of language models. ChatGPT had been built on a model called GPT-3.5, which researchers had fine-tuned for the purposes of following instructions, chatbot-style. Hamilton explained that ChatGPT’s bad writing was probably a result
…
that with time AI companies will address some of their products’ early issues. OpenAI found that GPT-4, the large language model that came after GPT-3.5, improved on some of its earlier models’ shortcomings, though not all, and promised that future models would be better. When it comes to language
…
published here is what resulted. Chapter 10, “Ghosts”: In these nine parts, written in early 2021, I authored the sentences in bold, and OpenAI’s GPT-3 large language model filled in the rest. My and my editor’s sole alterations to the AI-generated text were adding paragraph breaks in some
…
this text, without including the text in it,” followed by the text included in the piece. Chapter 14, “Penumbra”: This chat with ChatGPT, using the GPT-3.5 large language model, took place in the spring of 2023. Again, note that ChatGPT sometimes makes mistakes; none of its statements should be taken
by Paul Scharre · 18 Jan 2023
text generator GPT-2, whose staged release caused such a stir in 2019, was eclipsed only fifteen months later by GPT-3, a 175 billion parameter model that was over ten times larger than GPT-2. GPT-3’s text is shockingly convincing. Renée DiResta, technical research manager at the Stanford Internet Observatory, prompted
…
GPT-3 to weigh in on the implications of synthetic media. GPT-3’s response: AI - generated content will continue to become more sophisticated, and
…
application used natural language processing to gain visibility on the thousands of issuances, policies, and directives across the DoD. Just as AI models such as GPT-3 could be used to generate new text, AI language models can also be used to process text. DoD has thousands of official policy documents and
…
is when an algorithm is trained on unlabeled data and the algorithm learns patterns in the data. Large language models such as GPT-2 and GPT-3 use unsupervised learning. Once trained, they can output sentences and whole paragraphs based on patterns they’ve learned from the text on which they’ve
…
sheer complexity of massive neural networks confounds understanding why the network took a certain action, particularly as AI researchers build ever-larger models. Asking why GPT-3 wrote a particular sentence may not have an answer, since the answer is encoded in the model’s 175 billion parameters. AI has many powerful
…
GPT-2, a 1.5 billion parameter model trained on 40 GB of text. A year and a half later, in July 2020, OpenAI announced GPT-3, a 175 billion parameter model trained on 570 GB of text. Six months after that, in January 2021, Google Brain announced the language model Switch
…
machine learning research projects increased ten billionfold from 2010 to 2022 and is doubling roughly every six months. Compute for training the largest models, like GPT-3 and PaLM, has been doubling at a slightly slower rate, approximately every ten months. This is an incredible explosion of compute, yet there are likely
…
, and even the most deep-pocketed actors have limits to their resources. Independent estimates put the cost to train advanced machine learning models such as GPT-3 on the order of millions of dollars per research project for some of the largest models. These costs already put compute-intensive research out of
…
_learners.pdf; Tom B. Brown et al., Language Models are Few-Shot Learners (Cornell University, July 22, 2020), https://arxiv.org/pdf/2005.14165.pdf; GPT-3 has the additional ability to do “in-context learning” during inference, after an initial phase of unsupervised pretraining. 233“distributional shift”: Sunil Thulasidasan et al
…
/2103.00020.pdf; Gabriel Goh et al., “Multimodal Neurons in Artificial Neural Networks,” OpenAI Blog, March 4, 2021, https://openai.com/blog/multimodal-neurons/; Romero, “GPT-3 Scared You?” 295Text-to-image models: Ramesh et al., “DALL·E”; Ramesh et al., Zero-Shot Text-to-Image Generation; Aditya Ramesh et al., “DALL
…
, 2019, http://www.rossgritz.com/uncategorized/updated-deepmind-operating-costs/. Other estimates suggest that compute for training large scale “flagship” AI models (e.g., AlphaGo, GPT-3) is doubling roughly every 10 months, a slightly slower pace than other deep learning models, perhaps due to the higher cost or greater engineering challenges
…
, June 2, 2021, https://www.scmp.com/tech/tech-war/article/3135764/us-china-tech-war-beijing-funded-ai-researchers-surpass-google-and; Alberto Romero, “GPT-3 Scared You? Meet Wu Dao 2.0: A Monster of 1.75 Trillion Parameters,” Towards Data Science, June 5, 2021, https://towardsdatascience.com
…
/gpt-3-scared-you-meet-wu-dao-2-0-a-monster-of-1-75-trillion-parameters-832cd83db484.) In April 2022, researchers from several labs, including Tsinghua University,
…
/chinas-chip-independence-goals-helped-by-u-s-developed-tech-11610375472. 300research breakthroughs quickly proliferate: For example, within eighteen months of OpenAI’s announcement of GPT-3, similar scale language models had been announced by research teams in China, South Korea, and Israel. Ganguli et al., Predictability and Surprise in Large Generative
…
, 215–16, 222, 224–25 government-industry relationship, 95–96 government subsidies, 179–80 GPT-2 (language model), 20, 117–20, 122–25, 139, 294 GPT-3 (language model), 139, 294 GPUs (graphics processing units), 25, 28–29, 185, 296 Grace, Katja, 298 Great Britain, 191–92 Great Firewall, 62, 70, 102
…
, 177 Krizhevsky, Alex, 210 Kuwait, 46 Lamppost-as-a-Platform, 107 language models, 20, 118–20, 124–25, 232, 234, 294; See also GPT-2; GPT-3; OpenAI Laos, 108 Laskai, Lorand, 96 Laszuk, Danika, 128, 140 Latvia, 108 Lawrence, Jennifer, 130 laws and regulations, 111–13 “blade runner,” 121–22, 170
…
“new oil” 160th Special Operations Aviation Regiment, 207 OpenAI, 26, 117–20, 122–25, 272, 294, 295–97, 299; See also GPT-2 (language model); GPT-3 (language model) OpenAI Five, 268, 270–71 Operation RYaN, 445; See also RYaN; VRYAN Oracle, 215–18, 224 Orwell, George, 97–98, 103 Osprey tiltrotor
by Mustafa Suleyman · 4 Sep 2023 · 444pp · 117,770 words
by Ray Kurzweil · 25 Jun 2024
by Rob Reich, Mehran Sahami and Jeremy M. Weinstein · 6 Sep 2021
by Henry A Kissinger, Eric Schmidt and Daniel Huttenlocher · 2 Nov 2021 · 194pp · 57,434 words
by Sonja Thiel and Johannes C. Bernhardt · 31 Dec 2023 · 321pp · 113,564 words
by Yuval Noah Harari · 9 Sep 2024 · 566pp · 169,013 words
by Martin Ford · 13 Sep 2021 · 288pp · 86,995 words
by Azeem Azhar · 6 Sep 2021 · 447pp · 111,991 words
by Nate Silver · 12 Aug 2024 · 848pp · 227,015 words
by Nicole Kobie · 3 Jul 2024 · 348pp · 119,358 words
by Anil Ananthaswamy · 15 Jul 2024 · 416pp · 118,522 words
by Adam Becker · 14 Jun 2025 · 381pp · 119,533 words
by Anil Seth · 29 Aug 2021 · 418pp · 102,597 words
by Joanna Walsh · 22 Sep 2025 · 255pp · 80,203 words
by Nicholas Carr · 28 Jan 2025 · 231pp · 85,135 words
by Kevin Roose · 9 Mar 2021 · 208pp · 57,602 words
by Daron Acemoglu and Simon Johnson · 15 May 2023 · 619pp · 177,548 words
by Jordan Ellenberg · 14 May 2021 · 665pp · 159,350 words
by Rowan Hooper · 15 Jan 2020 · 285pp · 86,858 words
by Bruce Schneier · 7 Feb 2023 · 306pp · 82,909 words
by Michael Bhaskar · 2 Nov 2021
by Tim Wu · 4 Nov 2025 · 246pp · 65,143 words
by Eliezer Yudkowsky and Nate Soares · 15 Sep 2025 · 215pp · 64,699 words
by Justin E. H. Smith · 22 Mar 2022 · 198pp · 59,351 words
by Temple Grandin, Ph.d. · 11 Oct 2022
by William MacAskill · 31 Aug 2022 · 451pp · 125,201 words
by Mike Maples and Peter Ziebelman · 8 Jul 2024 · 207pp · 65,156 words
by Jacob Helberg · 11 Oct 2021 · 521pp · 118,183 words