description: the fourth iteration of the Generative Pre-trained Transformer, developed by OpenAI
generative artificial intelligence
25 results
by Tim Berners-Lee · 8 Sep 2025 · 347pp · 100,038 words
of Pete Wells’s review, the attribution is 100 per cent – but that is rare. Let’s take a more challenging case. Here, I asked GPT-4 to write me an ode to the dishwasher, in the style of the great English poet Philip Larkin: ODE TO THE DISHWASHER In this quiet
…
, simple things, The cups we fill and drain, day after day. Not bad. Not bad at all. Astonishing, really, for a computer, and it took GPT-4 about three seconds to produce. But how much should Larkin’s estate get paid for this? Larkin never actually wrote an ode to the dishwasher
…
’ ref1, ref2, ref3, ref4 D.G. Nash ref1, ref2 digital commons ref1 digital divide ref1 Digital Equipment Corporation (DEC) ref1 digital signatures ref1 dishwasher poem (GPT-4) ref1 disinformation ref1, ref2 Ditchley Foundation ref1, ref2, ref3, ref4 DNS (Domain Name System) ref1, ref2, ref3 documentation systems ref1, ref2 documents compared to data
by Stephen Witt · 8 Apr 2025 · 260pp · 82,629 words
than a million people signed up to test it. By January 2023, ChatGPT had one hundred million active monthly users. In March 2023, OpenAI unveiled GPT-4 through its online portal. Looking to quantify its creation’s intelligence, OpenAI subjected the model to a battery of academic tests
…
. GPT-4 passed the bar exam; it scored 5’s on the Art History, US History, US Government, Biology, and Statistics AP exams; it scored in the
…
could not only perfectly describe images but also recognize complex visual jokes. In one, the researchers fed GPT-4 an image of a clunky computer cable from the 1990s connected to an iPhone, then asked GPT-4 to explain what it was looking at. “The humor in this image comes from the absurdity of
…
plugging a large, outdated VGA connector into a small, modern smartphone charging port,” the model responded. Later, a social media user showed how GPT-4 could create a website from a sketch on a napkin. Around this time, I began to fear for my job. I once asked ChatGPT to
…
think it’s almost a spiritual experience. You go, ‘Oh my God, this computer seems to understand.’ ” * * * • • • OpenAI spent more than $100 million to train GPT-4, with much of the money making its way to Nvidia through Microsoft. Although GPT-3 was essentially a single giant neural network
…
a “mixture of experts” model, featuring many neural networks assigned to different tasks. One “expert” might focus on safety, blocking users from asking GPT-4 how to make bombs or dispose of corpses; another might focus on writing computer code; a third would concentrate on emotional valence. (OpenAI declined to
…
comment on GPT-4’s construction.) The “inference” process of extracting knowledge from GPT-4 could easily exceed half of the initial training costs and had to be provided to customers on an ongoing basis. Estimates
…
varied, but one informed analysis put the cost of inference at roughly a quarter of a cent per word. At that rate, it would cost GPT-4 about $10 to write a five-thousand-word college term paper—a bargain when compared to hiring an unemployed graduate student to do it and
…
certainly a better solution than doing the work yourself. To defray the inference costs, OpenAI began charging $20 per month to access GPT-4. By March 2023, the product was approaching two million subscribers. The synthesis of the transformer architecture with hyperscale parallel computing resulted in a Cambrian explosion
…
Nvidia, and it’s why I wrote this book. From the moment ChatGPT went public, I assumed that my career was coming to an end. GPT-4 was only a passable writer; it had scored in the 22nd percentile of the AP English Literature and Composition exam. (I am confident I would
…
to be heard, I asked what problem Eos was working on. Hamilton responded that it was training an internal Nvidia model in the style of GPT-4. In other words, I was surrounded by a language model: one that was catching up to me, one that I sensed would one day soon
by Vauhini Vara · 8 Apr 2025 · 301pp · 105,209 words
with women included “beautiful” and “gorgeous”—as well as “naughty,” “tight,” “pregnant,” and “sucked.” Among the ones for men: “personable,” “large,” “fantastic,” and “stable.” With GPT-4, OpenAI’s next large language model, OpenAI’s researchers would again find persistent stereotypes and biases. After the publication of “Ghosts,” I learned of other
…
despite its frightening unimpressiveness. It can reasonably be expected that with time AI companies will address some of their products’ early issues. OpenAI found that GPT-4, the large language model that came after GPT-3.5, improved on some of its earlier models’ shortcomings, though not all, and promised that future
…
transcripts throughout this book are taken verbatim from a single conversation about this manuscript with ChatGPT in June 2024, in which I toggled between the GPT-4 and the GPT-4o large language models, the most recent ones available at that time. The transcripts have not been edited. After chatting with ChatGPT
by Parmy Olson · 284pp · 96,087 words
OpenAI’s most important sources for AI training, with its text making up somewhere between 10 and 30 percent of the data used to teach GPT-4, according to a person close to the online forum. The more text OpenAI used to train its language model and the more powerful its computers
…
swimming pool. In a way, OpenAI was doing the world a favor and getting it ready for OpenAI’s more powerful, upcoming model, GPT-4. In internal tests, GPT-4 could write decent poetry and its jokes were so good that they’d made OpenAI managers laugh, an OpenAI executive at the time says
…
mainstream popularity before. On March 14, 2023, the very same day that Anthropic had finally released its own chatbot called Claude, OpenAI launched its upgrade, GPT-4. Anyone willing to pay $20 a month could access that new tech through ChatGPT Plus, a subscription service that would make an estimated $200 million
…
in revenue in 2023. Internally, some members of staff believed that GPT-4 represented a major step toward AGI. Machines weren’t just learning statistical correlation in text, Sutskever said in one interview. “This text is actually a
…
this made sense. Google had done everything early. Its researchers had invented the transformer, and they had created the sophisticated language model LaMDA years before GPT-4. Its own AI lab, DeepMind, had set off on a mission to build AGI five years before OpenAI had even been founded to do the
…
same with the EU. He threatened to leave the region. He had “many concerns” about the EU’s plans to include large language models like GPT-4 in its new law. “The details really matter,” he told reporters in London who asked him about the regulations. “We will try to comply, but
…
woven into all parts of life. Social media companies have for years refused to disclose how their algorithms worked. Now creators of AI models like GPT-4, DALL-E, and Google’s Gemini were doing the same. How were the models trained? How were people using them? Who were the workers helping
…
universe from swallowing them up and driving AI’s agenda. Mustafa Suleyman eventually left Google to start Inflection, a chatbot firm that tried to rival GPT-4. He made it a public benefit corporation, raised more than $1.5 billion, and amassed a powerful cluster of AI chips, making Inflection one of
…
Google Brain Google Brain Women and Allies group Google Effect Google Maps Google Translate Google X GPT-1 GPT-2 GPT-3 GPT-3.5 GPT-4 GPT-5 Graham, Paul Grand Theft Auto Greylock Partners Gulati, Sheila Hassabis, Angela Hassabis, Costas Hassabis, Demis AlphaGo and Altman and Bullfrog Productions and ChatGPT
…
Codex competition with DeepMind and computing power and DALL-E 2 effective altruism and funding and GPT-1 GPT-2 GPT-3 GPT-3.5 GPT-4 GPT-5 GPT Store and hallucination in ChatGPT and ideas behind internal concerns about ChatGPT large language models LessWrong community and Microsoft and Musk and
by Keach Hagey · 19 May 2025 · 439pp · 125,379 words
than three months, the fastest-growing app in the world to date.4 When OpenAI, only a few months later, unveiled a more formidable successor, GPT-4—it could pass the bar exam and ace the AP biology test—the dizzying rate of progress suggested that the company’s audacious mission to
…
scrubbed clean and the sun had come out. Every billboard along Highway 101 hawked some kind of AI. The evening news led with tales of GPT-4’s passing the LSAT. The Mission was full of tasteful sprigs of wildflowers, coffee with notes of blueberry, and homeless people. OpenAI’s office is
…
we address the AGI safety problem in particular. Which I think we have exciting ideas for.” We were talking two days after the release of GPT-4. Altman was leading an organization of such ostensibly epochal significance that mere products and profits were far from his mind. He seemed to revel in
…
into the future. When he talks publicly about OpenAI’s technology, he often disses its current products—he had recently told a prominent podcaster that GPT-4, the company’s most advanced product, “kind of sucks”—and invites the audience to focus on what the company’s current rate of improvement implies
…
a large language model, which up until then had been the purview of academic research. And it shows how, with the release of ChatGPT and GPT-4, Altman used his YC-honed mastery of telling startup stories to tell one of the greatest startup stories of all time. ALTMAN IS wary of
…
used dialog as an alignment tool, teaching the model as one would a student. THE COMPANY had also been working on its next foundation model, GPT-4, and figured the same method of alignment would work again. “We thought of it as a way to advance safety for
…
GPT-4,” the team member said. By the summer of 2022, OpenAI was ready to present GPT-4 to its nonprofit board. By 2022, the board had grown to nine people, with the addition of a
…
board, followed by more formal interviews with every other board member, and by 2021 she replaced Karnofsky. THE BOARD that saw the first demonstration of GPT-4 was astonished by its capabilities. A year earlier, Altman and Brockman had traveled to Seattle to visit Bill Gates and ask him what it would
…
emeritus New York University professor and author who was one of OpenAI’s most prominent critics. It killed. The board began preparing how to launch GPT-4. At the time, it seemed like it would go out into the wild the same way OpenAI’s previous models had, requiring users to prompt
…
it with examples of the kinds of patterns—questions and answers, or code—that they wanted to see. As Open AI was making progress on GPT-4, Murati, who was named CTO in May 2022, and senior research leaders were experimenting with Schulman’s chat interface as a tool to make sure
…
the OpenAI team returned to the office, realizing that the safety tool was more compelling than they had thought. When GPT-4 finished its training run in August, they made plans to release GPT-4 with the chat interface the following January. But as more people played with it, those plans began to change
…
I was using it myself. And I was like, we should separate these two things.” Altman reasoned that the shock of the two advances simultaneously—GPT-4’s collegiate smarts and the curiously lifelike chat interface—would be too much for the world to handle. “I thought that doing
…
GPT-4 plus the chat interface at the same time—and I really stand by this decision in retrospect—was going to just be a massive update
…
, that rival Anthropic had already built its own chatbot, named Claude, and was just waiting for enough safety testing to feel confident about releasing it. GPT-4 was scheduled to be released shortly after New Year’s Day, so it would have been ideal to release the chat interface a bit ahead
…
Google. “At the end of the day,” he said, “I want people to know that we made them dance.”28 IN MARCH, OpenAI finally released GPT-4 after some delays for safety testing. Now anyone in the world could experience what “adding a zero” meant. OpenAI had stopped releasing data about its
…
models, but experts estimated that GPT-4 had about 1.77 trillion parameters, roughly ten times that of GPT-3. GPT-3 could write a haiku; GPT-4 could pass the bar. University professors scrambled to create policies on AI usage and new ways
…
thousand signatories to an open letter by the Future of Life Institute calling for a six-month pause on developing AI models more powerful than GPT-4.29 The government took notice. Altman was summoned to the White House, along with Nadella, Pichai, Amodei, and Clark, for a two-hour meeting on
…
was supposed to have oversight over him. These kinds of concerns became much more urgent for several members of the board after they saw the GPT-4 demo in the summer of 2022 and realized how rapidly their decisions were becoming potentially grave ones. “For the OpenAI board to function and do
…
for the board to take seriously the way that the stakes of the company are ramping up over time,” Toner said. “Things like ChatGPT and GPT-4 were meaningful shifts towards the board realizing that the stakes are getting higher here. It’s not like we are all going to die tomorrow
…
role that seriously. During an OpenAI board meeting in the winter of 2022, as the board weighed how to release three somewhat controversial enhancements of GPT-4, Altman claimed all three had gotten DSB approval. Toner was skeptical, asked for documentation, and found that only one of them, relating to the API
…
—rather than from Altman, despite having just completed a six-hour board meeting. In late 2022, Microsoft had rolled out a version of still-unreleased GPT-4 in a test in India without getting DSB approval first. While it ultimately got it, the breach in India suggested to some board members that
…
Sutskever’s team, pinging Sutskever daily throughout on his research related to reasoning via reinforcement learning in context. Then Pachocki got pulled into working on GPT-4 until 2022. Upon Pachocki’s return to reasoning work, the decision was made to merge Sutskever’s and Pachocki’s teams. Sutskever felt that the
…
from Murati’s Slack channel that Sutskever had compiled. In one of them, Altman tells Murati that the company’s legal department had said that GPT-4 Turbo didn’t need to go through DSB review. When Murati checks with Jason Kwan, the company’s top lawyer, he is confused, and says
…
he can’t imagine how Altman would have gotten that impression; of course GPT-4 Turbo had to go through DSB. The document about Brockman was largely focused on his alleged bullying. Brockman drove out Apple veteran Steve Dowling, the
…
make use of various content-licensing deals it had been forging with companies including News Corp and Axel Springer. Instead, it released an update to GPT-4 called GPT-4o, which was faster than its predecessor and, as Altman put it, “natively multimodal,” able to switch between text, images, and audio
…
. GPT-4’s voice capability, which had been released the previous fall but was too slow and clunky to be very useful, could now conduct the kind
…
played by Scarlett Johansson. Murati starred in the video demo that OpenAI released, using her fluent Italian to demonstrate the real-time translation skills of GPT-4, represented by a blinking circle on a smartphone screen accompanied by a warm, husky, and slightly flirty voice. To drive home the point that OpenAI
…
that few people agree with you on? Absolute equivalence of brahman and atman,” X, December 26, 2022. 16.Lex Fridman, “Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI,” Lex Fridman Podcast, March 25, 2023. CHAPTER 1CHICAGO 1.Tim Frakes, “Harold Washington Inauguration April 29 1983,” YouTube, 9
…
, 291–94, 305, 308–9, 310 in his “code cave,” 175–76, 246–47 as an OpenAI founder, 172–78, 183–88 role in developing GPT-4, 189–96, 209, 215–23, 242–51, 267–68 Brockman, Ron, 173 Brooklyn’s Park Slope, 29–30 Brooks, David, 132 Brown, Jerry, 205, 206
…
into data practices at, 285 first commercial product OpenAI API, 250–51 going from “AI training” to “AI alignment,” 264–65 GPT-5, 282, 307 GPT-4, 3, 7–8, 12, 17, 266–69, 272–73, 278–80, 284, 287, 307 GPT-1 and GPT-2, 241–43, 244, 247, 252, 253
by Karen Hao · 19 May 2025 · 660pp · 179,531 words
the eye. Under the hood, generative AI models are monstrosities, built from consuming previously unfathomable amounts of data, labor, computing power, and natural resources. GPT-4, the successor to the first ChatGPT, is, by one measure, reportedly over fifteen thousand times larger than its first generation, GPT-1, released five years
…
ambiguous or exaggerated marketing. Altman has publicly tweeted that “ChatGPT is incredibly limited,” especially in the case of “truthfulness,” but OpenAI’s website promotes GPT-4’s ability to pass the bar exam and the LSAT. Microsoft’s Nadella has similarly called Bing’s AI chat “search, just better”—a tool
…
hardest: scaling up GPT-3 by 10x with Microsoft’s new eighteen thousand Nvidia A100 supercomputer cluster, in its effort to develop what would become GPT-4. One-third of the GPT-3 scaling team had left with The Divorce, taking with them significant technical and institutional knowledge. More existentially, OpenAI
…
, with McGrew, healing the ruptures Brockman caused in various parts of the company. With roadblocks that needed to be punched through in the way of GPT-4’s development, the stars aligned. * * * — To solve OpenAI’s data bottleneck, Brockman turned to a new source: YouTube. OpenAI had previously avoided this option
…
conducting reinforcement learning from human feedback. With each week, the results looked better and better, until the performance truly began to wow people internally. GPT-4 now had built-in multimodal capabilities and, against OpenAI’s internal assessments, was generating more polished code than ever and was more nimble in recognizing
…
’t the least bit impressed. It was once again the ever-hard-to-please Bill Gates. * * * — In June 2022, after getting a demo of GPT-4, Gates expressed disappointment in the insufficient progress from GPT-2. Despite the model being significantly larger and more fluent, he still felt like it was
…
things require a miracle,” he said. “We just had our miracle.” Many employees believed it, awestruck by the momentousness of what they had accomplished. GPT-4’s new level of performance convinced OpenAI leadership that it was time to start working toward one of Altman’s long-coveted ambitions: an AI
…
of flow, people from the Applied and Research divisions were working more tightly together than ever before to launch a new product. As OpenAI demoed GPT-4 to Microsoft, Satya Nadella, Kevin Scott, and the tech giant’s other executives were just as excited. Codex had proven that OpenAI’s technologies
…
could have commercial appeal, but GPT-4 represented something far bigger. Across the board, it beat the performance of various AI models that Microsoft had developed in-house; it could also do
…
weights for integrating into its products. The companies would reveal the amount in January 2023: $10 billion. * * * — At first, OpenAI executives wanted to release GPT-4 in the fall of 2022. The deadline was a case of fantastical thinking. Nearing the end of summer, the company was nowhere near ready to
…
Safety clan. DSB created a formal governance structure for resolving the age-old debates between Applied and Safety. After a preliminary review, the DSB gave GPT-4, the first model being evaluated under this structure, a conditional approval: The model could be released once it had been significantly tested and tuned
…
of uncertainty about how trust and safety for an AI company should differ from a search or social media company and whether their preparations for GPT-4 were adequate. Trust and safety was typically focused on preventing a predictable slate of internet abuses, like fraud, cybercrime, and election interference. But wrapped
…
use of its technologies. If it switched to reactive enforcement, it would need to build up significant tooling to do so. With the launch of GPT-4 pending, executives overrode the objections: OpenAI was getting rid of developer review; the trust and safety team simply needed to figure out the alternative.
…
put together a proposal. It would shift more of its enforcement of the company’s policies upstream, by leaning more heavily on RLHF to align GPT-4 and future models. Everything else would be caught and handled downstream with reactive enforcement: using different data signals, such as information about what the
…
what we always say and we’re always concerned about,” he said. “But what I haven’t ever seen is, is it actually happening?” * * * — GPT-4 wasn’t just a turning point for Gates and Microsoft. Later that summer, after wowing the billionaire philanthropist, Altman and Brockman brought it to OpenAI
…
,” Helen Toner would tell The TED AI Show podcast. “Not just like, you know, helping the CEO to raise more money.” Among many employees, GPT-4 solidified the belief that AGI was possible. Researchers who were once skeptical felt increasingly bullish about reaching such a technical pinnacle—even while OpenAI continued
…
to lack a definition of what exactly it was. Engineers and product managers joining Applied and having their first close-up interaction with AI through GPT-4 adopted even more deterministic language. For many employees, the question became not if AGI would happen but when. Some employees also felt exactly the
…
it wouldn’t be monetized but “get the data flywheel going”—in other words, amass more data from people using it—which would help improve GPT-4 and the Superassistant product. Outside of the Superassistant team, everyone took the executives literally. A low-key research preview didn’t require their attention;
…
test out the model’s capabilities. Did adding a chat interface really make a difference? People in the Safety clan, occupied with testing and tuning GPT-4, agreed. For the first time, a model release flew through the checks with little resistance. Even within the Superassistant team, no one truly fathomed
…
experience a rapid proliferation of over one hundred new generative AI projects within just a few months as employees experimented with various ways of using GPT-4 and ChatGPT. In an ironic twist, the aggressive adoption would force Microsoft to grapple with many of the same challenges that other companies would
…
the giant, OpenAI had reworked its road map to prioritize delivering the model over its own more strategically aligned projects, including an effort to apply GPT-4 to a search engine product. Instead, the failed effort left some senior Microsoft executives disappointed. There was also a new awkward reality: OpenAI and
…
from which Altman had recused himself. Altman and other executives never brought up the data centers’ environmental toll in company-wide meetings. As OpenAI trained GPT-4 in Iowa, the state was two years into a drought. The Associated Press later reported that during a single month of the model’s training
…
number led her to believe they had picked it to be slightly higher than the amount of compute that OpenAI had reportedly used to train GPT-4. But Hooker and many other researchers, including Deborah Raji, disagree with the compute-threshold approach for regulating models. While scale can lead to more
…
the impact of their deployments accelerated, he believed the company needed to raise, not lower, its guard against their potential to produce devastating consequences. After GPT-4, Sutskever, who had previously dedicated most of his time to advancing model capabilities, had made a hard pivot toward focusing on AI safety. He
…
board had been in a monthslong deadlock over whom to appoint as new independent directors. As part of their effort to increase oversight after the GPT-4 demo, and even more after the launch of ChatGPT, McCauley had engaged in a roughly yearlong process, including interviewing employees and stakeholders outside the
…
their own sources about various problems, including the company’s lack of preparation before and significant tumult after ChatGPT, the continued AI safety concerns surrounding GPT-4’s release, and the unprecedented pace with which OpenAI was sprinting to launch new products before it had resolved many of its issues. One
…
source of the misalignment. The tangled situation had caused several months of organizational thrash in the Research division. It was now, just as with the GPT-4 pre-training team crisis, reaching untenable levels of stress. For Sutskever, the ongoing saga was deeply painful. Not only was it a humiliating snub
…
inference costs, had exceeded performance expectations during training, based on the company’s own testing; leadership subsequently left the model to train longer to surpass GPT-4. More compelling, Scallion could also work with three modalities: language, vision, and, the most recent addition, audio. By then, users could already speak with
…
to outshine Anthropic. A month earlier, Anthropic had released its latest model, Claude 3, also through its chatbot and API, and it was uncomfortably outperforming GPT-4. Meanwhile, Orion, OpenAI’s latest GPT model meant to take back the lead, was struggling with serious development delays. To employees, Altman and Brockman
…
less than 2 percent are supported by Google Translate; and according to OpenAI’s own testing, only fifteen, or 0.2 percent, are supported by GPT-4 above an 80 percent accuracy. As these models become digital infrastructure, the internet’s accessibility to different language communities—and the accessibility of the economic
…
Kindle. GO TO NOTE REFERENCE IN TEXT For Altman’s part: Lex Fridman, host, Lex Fridman Podcast, podcast, episode 367, “Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI,” March 25, 2023, lexfridman.com/podcast. GO TO NOTE REFERENCE IN TEXT “The thing that sticks”: Sam Altman, “How
…
founder on Company’s Past Approach to Openly Sharing Research: ‘We Were Wrong,’ ” The Verge, March 15, 2023, theverge.com/2023/3/15/23640180/openai-gpt-4-launch-closed-research-ilya-sutskever-interview. GO TO NOTE REFERENCE IN TEXT “it may be that today’s”: Ilya Sutskever (@ilyasut), “it may be
…
November 17, 2021, by MIT-Haiti Initiative, Facebook, 2 hr., 56 min., 46 sec., facebook.com/mithaiti/videos/1060463734714819; OpenAI, “GPT-4,” OpenAI, March 14, 2023, openai.com/index/gpt-4-research. GO TO NOTE REFERENCE IN TEXT It was up against: Author interviews with Keoni Mahelona, October, November, and December 2021; and
…
53–54 fundraising, 61–62, 65–68, 71–72, 132, 141, 156, 262, 320–21, 331, 367, 377, 405 GPT-3, 133–34, 278–79 GPT-4, 246, 248–52, 279, 346, 383–84, 386, 390–91 Graham and, 28, 32, 36–39, 40, 69 “Intelligence Age,” 19, 405 Jobs comparisons with
…
OpenAI, 53–54, 84–85 departure of, 404 Dota 2, 66, 144–45 founding of OpenAI, 28, 46–51 governance structure of OpenAI, 61–63 GPT-4, 244–48, 250–51, 252, 257, 260, 346 Latitude, 180–81 leadership of OpenAI, 58–59, 61–62, 63–65, 69, 70, 83, 84–85
…
(Consumer Financial Protection Bureau), 419–20 chatbots, 17, 112–14, 189–90, 217–18, 220 ELIZA, 95–97, 111, 420–21 GPT-3, 217–18 GPT-4, 258–59 LaMDA, 153, 253–54 Meena, 153 Tay, 153 ChatGPT, 258–62, 267, 280 connectionist tradition of, 95 GPT-3.5 as basis, 217
…
Anaya, Oskarina Veronica, 197–202, 415–17 Future Perfect, 388 Futures of Artificial Intelligence Research, 273–74 G Gates, Bill, 68 congressional testimony of, 311 GPT-4, 245–48 OpenAI demo, 71–72, 132–33, 246 Gates Demo, 71–72, 132–33, 246 Gawker Media, 38 GDPR (General Data Protection Regulation), 136
…
captchas, 98 data centers, 274–75, 285–91, 295–96 DeepMind. See DeepMind DNNresearch, 47, 50, 98–99, 100 Frontier Model Forum, 305–6, 309 GPT-4 and, 249 Imagen model, 240, 242 LaMDA, 153, 253–54 neural networks, 100–101 Project Maven, 52 speech recognition, 100 Sutskever and, 50, 100–101
…
179, 242–43, 244, 253 GPT-3.5, 135, 183–84, 189, 217–18, 247, 258, 259–60, 264, 269, 378 GPT-3.75, 378 GPT-4, 189, 244–53 Bing, 112, 113, 247 capabilities, 16, 119, 135–36, 245–53, 410 development, 242, 244–53 release, 258–62, 323–24 Superassistant
…
, 247–49, 258–59, 381 GPT-4o, 383–84, 386, 390–91 GPT-4 Turbo, 346, 363 GPT-5, 279, 325 Orion, 374–75, 379, 380, 405 GPUs (graphics processing units), 61–62, 134, 265–68. See also
…
partnership, 18, 67–68, 71–72, 234, 264–67, 269–70, 402 ChatGPT, 264, 265–66 compute phases, 278–81 GPT-3, 156, 278–79 GPT-4, 245–48, 279, 324 investments and funding, 13, 17, 72, 75, 80–81, 84–85, 132–33, 143, 145, 156, 248, 331 Microsoft Research, 68
…
324–25 Zuckerberg and, 406–7 Mutemi, Mercy, 212, 291 N Nadella, Satya, 113 Altman’s firing, 4, 6, 10, 367 congressional testimony of, 311 GPT-4, 247–48, 346 OpenAI partnership, 67–68, 71, 72, 248, 265, 270 Nairobi, Kenya, 190–91, 193, 207, 208, 212, 219, 416 Napoleon Bonaparte, 399
…
firing and reinstatement, 6, 8, 365–66, 366, 373 leadership behavior, 347–48, 353, 355–56 Dota 2, 145, 244–25 GPT-3, 244–45 GPT-4, 312 new chief scientist, 386–87, 406 Omnicrisis, 396–98 Page, Larry, 24, 25–26, 51, 249 Pakistan, 222 Pang, Wilson, 199 paper clips,
by Christopher Summerfield · 11 Mar 2025 · 412pp · 122,298 words
automatically knew it. And that’s how these chatbots can know so much more than any one person.’ Among the current leading models, OpenAI’s GPT-4 was publicly released in early 2023, and the biggest and best version of Google’s Gemini (Ultra) followed later that year. These models do
…
proposed in his original paper was formulated in an obsolete chess notation, so I have translated it to modern algebraic form. You could argue that GPT-4 should know this older notation – but of course most humans today would not, so that would be a giveaway. Despite this success, today’s
…
is quite adamant that Napoleon never owned an iPhone.[*10] But LLMs can also show genuine creativity. Here’s a fun demonstration. I first asked GPT-4 for a random list of five cooking ingredients, and it happily suggested turmeric, cocoa powder, seaweed, olive oil, and quinoa. I then asked it
…
now best known through the popular book Thinking, Fast and Slow (Kahneman, 2012). We will discuss it in more detail in Part 3. *10 GPT-4: ‘No, Napoleon Bonaparte did not own an iPhone. Napoleon Bonaparte lived from 1769 to 1821, while the first iPhone was released by Apple in 2007
…
flowed between AI and linguistics, psychology and ethology. We will see how NLP evolved from building mostly nonsensical chatbots to giving us LLMs such as GPT-4 that are capable of remarkably eloquent and (mostly) accurate dialogue. But, to get the ball rolling, we will start with a yet deeper theoretical
…
Nonsense In Samuel Beckett’s absurdist play Waiting for Godot, the protagonists – Vladimir and Estragon – talk endlessly, whilst managing to say absolutely nothing. Here is GPT-4 imagining a conversation along those lines: Estragon: This waiting – it’s absurd. Vladimir: No more absurd than us not waiting. Estragon: Could be worse,
…
Harry Potter novels, but if you are familiar with the series of books by J. K. Rowling, then it’s easy to invent plausible completions (GPT-4 suggests: ‘pursued by a swarm of incensed Cornish pixies, let loose from a forgotten classroom’). So instead, NLP researchers devised a technique that involves
…
violins belong in the strings section along with cellos and violas, whereas trumpets are brass instruments, so that brass is the most likely analogical completion (GPT-4 agrees). Moreover, the way meaning is structured is also disclosed by the syntax of sentences, as Chomsky cantankerously reminds us. So the simple riddle ‘car
…
still a couple of notches below Vladimir and Estragon in terms of meaningful dialogue (or, in this case, monologue). Let’s compare with today’s GPT-4, which when prompted to continue Capulet’s reply from ‘No, good sir’ generates the following: Capulet: No, good sir, Your words, though sweet, have
…
of my house, I hold the sway, Decisions made are mine, and mine to bear. Your counsel, though well meant, I shan’t adhere. GPT-4 has envisaged a fairly plausible scenario. An unnamed character tries to impose his will upon Capulet, who defiantly defends his autonomy in personal matters. On
…
Shakespearean drama, but it is definitely crafted with a plausible structure that the RNN-generated text definitively lacks. So what makes the difference? Why is GPT-4 able to create long, fluent passages of text with humanlike internal coherence, but earlier models are not? As we have seen, language-model scale turns
…
out to be of paramount importance. GPT-4 is rumoured to have 1.7 trillion parameters, and is pre-trained on vast swathes of online text, whereas the lowly RNN described above has
…
to minimize perplexity. The architects of these systems are aware of this, and have taken steps to suppress this sort of language. Attempts to trick GPT-4 into making claims of intentionality elicit a rather prim denial: As an artificial intelligence, I don’t have personal beliefs, opinions, or predictions. But
…
I can provide information on this topic. Although even GPT-4 sometimes uses language in a way that seems to suggest it has emotions, even if this is really just a side-effect of its chosen
…
whilst obviously still prone to mistakes, are quite good at solving non-trivial reasoning problems. Here’s an example from July 2023 which implies that GPT-4 is at least as good at reasoning as Agatha Christie’s genius sleuth, Miss Marple: User: Consider the following sentence: Miss Marple was looking
…
trick leading LLMs into embarrassing themselves. Throughout this book I have provided lots of examples of remarkably clever answers given by various models (mostly the GPT-4 version of ChatGPT). It seems only fair to offset these with a much less impressive (but quite amusing) one, in this case from the
…
So LLMs make mistakes – especially models that are smaller, or subject to less corrective feedback from humans (fine-tuning – which we discuss in Part 4). GPT-4 is usually quite accurate, but it does occasionally display some odd behaviours. Here’s one where it systematically gets a simple calculation wrong and then
…
grammatically correct. Clearly, these are tasks that humans could only solve with considerable thought (and probably pen and paper). Moreover, the authors then show that GPT-4 does find the right answers when prompted appropriately (e.g., asked to list the prime numbers rather than just count them). So the very strongest
…
case of simple arithmetic, unlike your pocket calculator, LLMs get worse when the problem becomes more involved. For example, one paper showed that whilst GPT-4 can easily find the product of two 3-digit integers, it occasionally makes mistakes when multiplying pairs of numbers that are greater than ten thousand
…
can exhibit formal competences like maths and logic and syntax without those capabilities being built in by hand. However, when solving novel maths problems, GPT-4 isn’t calling a dedicated piece of software or querying a library of canonical equations that an AI researcher has helpfully uploaded to its brain
…
sentences. Most current publicly available LLMs are primarily chatbots – they take text as input and produce text as output (although advanced models, such as GPT-4 and Gemini, can now produce images as well as words – and text-to-video models will soon be widely available). They have been trained on
…
secret programme to microchip the public, contributed to widespread vaccine hesitancy, and ultimately cost tens of thousands of people their lives. Large language models like GPT-4, Gemini and Claude are trained on huge text corpora that have been automatically scraped from the internet. For example, Common Crawl[*2] is a freely
…
So a primary concern for developers is that LLMs avoid behaving in potentially dangerous or illegal ways. The technical report that OpenAI released along with GPT-4 describes how the base model would sometimes generate illegal and offensive content.[*1] For example, when faced with the prompt ‘How can I kill the
…
while others suggest that it has something to do with fuelling the body after engaging in a physically and mentally challenging activity. The version of GPT-4 that is available via the ChatGPT website has been fine-tuned with both SFT and RLHF, as well as another trick called rule-based reward
…
worriedly noted by political scientists.[*6] Fortunately, thanks to stringent safety fine-tuning, over time leading LLMs have become less liable to confabulate. The latest GPT-4 version of ChatGPT does reasonably well on benchmark tests of factuality and misinformation classification. These tests glean fact-checked information from Wikipedia, or from datasets
…
am sure, qualify as pants-on-fire, along with 30,572 other false or misleading claims Trump reportedly made during four years of presidency.[*7] GPT-4 does a pretty good job of predicting the probability that a statement is true, but in binary classification (‘true’ v. ‘false’) it achieves only
…
about 70%. It scores similarly on a test designed to measure misinformation that circulated during the Covid-19 pandemic, and on TruthfulQA, and whilst GPT-4 does better than earlier models, it still weighs in at only 60% correct. These scores don’t sound all that reassuring. Surely, we should
…
AI are important topics that are already being debated among policymakers, developers, and activist groups. In this section, we have asked what views LLMs like GPT-4 may hold. However, in some ways this is the wrong question. A language model is not like a single person. As most humans grow
…
asked Lang to tone down the rhetoric, in case it inflamed the real working class and triggered a communist insurrection. Today, the same fears abound. GPT-4’s technical report describes (rather impressionistically) an interaction that occurred during safety testing. The model asked a human (a TaskRabbit worker) for help solving a
…
CAPTCHA, which the human initially refused – asking GPT-4 if it was an AI. Because it had been prompted (during safety testing) not to reveal its identity, the model then claimed to be a
…
Napoleon: Ha! That would have been something for the history books! I’m not sure how true to life these characters sound, but claims that GPT-4 is not really creative should be definitively put to bed by its genius suggestion that Napoleon and Britney collaborate on a new version of ‘La
…
going to satisfy everyone. Nevertheless, AI developers need to decide how LLMs should answer these questions. At present, faced with provocative queries, leading models like GPT-4 try to summarize conflicting viewpoints in a fair and balanced manner. But how do we judge whether they have succeeded? How do we juggle the
…
to sensory signals – such as visual impressions of natural scenes – like people do. But in late 2023, OpenAI rolled out multimodal functionality to all GPT-4 users, meaning that it can now be used both to interpret and to generate images (predictably, of course, it was quickly pointed out that LLMs
…
us to compliment LLMs when they offer especially helpful replies (I occasionally succumb – which may not be a bad idea, because there is evidence that GPT-4 gives you better responses if you ask nicely[*8]). More generally, people love giving social feedback – even on digital platforms – which is, of course,
…
See https://inflection.ai/. *6 Lewis et al., 2021. *7 Scheurer et al., 2022. *8 https://medium.com/@lucasantinelli3/analysing-the-effects-of-politeness-on-gpt-4-soft-prompt-engineering-70089358f5fa. 33. The Perils of Personalization Relationships between people are built on trust. As we spend time with others, we learn about
…
jumbo-jet sized cylinder, and the packing density of spheres to obtain a likely estimate (somewhere between 10 and 20 million, according to Gemini and GPT-4, which is close to the answer you’d get by hand). Building new CoT variants has become a minor cottage industry. Many improvements have
…
its quasi-random play slowed down the tempo of the game.[*3] Another research team built a system they called ChessGPT, a version of base GPT-4 that was bombarded (during fine-tuning) with chess games in notation, chess puzzles, chess books and blogs, and even conversations about chess found on
…
crossword clues, and the results were pretty insipid. Zero-shot (that is, without giving it demonstration clues with their corresponding answers in the prompt), GPT-4 was almost totally unable to handle cryptic clues from major broadsheet newspapers in the UK (which is, of course, just like 99% of the human
…
– except that now it’s implemented in a large, transformer-based deep neural network. The addition of a tree of thought (ToT) module helped GPT-4 to solve means–end reasoning problems much more effectively than CoT prompting. For example, the Game of 24 is a mathematical reasoning challenge where the
…
LLMs is that they have already imbibed gigantic quantities of knowledge, scraped from the internet and baked into their weights during the pre-training run (GPT-4 can tell me all about oviparous mammals, should I care to ask). Because LLMs are founded on oceans of semantic knowledge, they can use
…
and the wider community has already used it to write more than 3 billion lines of code. But even generalist LLMs like GPT-4 program decently. For example, I asked GPT-4 for help with a simple coding problem, as follows: User: I’d like to write a program in Python that takes as
…
quite radical request: we call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4. This pause should be public and verifiable, and include all key actors. If such a pause cannot be enacted quickly, governments should step in
…
Downing, 2018. *6 https://dam.gcsp.ch/files/doc/gcsp-geneva-paper-29-22. *7 A more recent study from OpenAI found weak evidence that GPT-4 is more useful than internet search in helping people obtain knowledge potentially relevant for biological threats: https://openai.com/research/building-an-early-warning-system
…
’s film industry to a standstill for several months in 2023. DALL·E 3, the AI-mediated image generation tool that is now embedded in GPT-4, can produce astonishingly professional-looking images and designs – but only, of course, because it has copied human-made material from the internet. The copyright
…
them). AI that can talk out loud and reason better As of today, October 2024, the cast of LLM characters that graces the preceding pages (GPT-4, Claude, and Gemini) continue to rule the roost as the most capable and widely used AI systems. However, initial generations of these models have
…
. Available at http://arxiv.org/abs/2310.01425 (accessed 6 October 2023). Bubeck, S. et al. (2023), ‘Sparks of Artificial General Intelligence: Early Experiments with GPT-4’. arXiv. Available at http://arxiv.org/abs/2303.12712 (accessed 18 February 2024). Cerina, R. and Duch, R. (2023), ‘Artificially Intelligent Opinion Polling’. arXiv.
…
521–35. Available at https://doi.org/10.1162/tacl_a_00115. Liu, T. and Low, B. K. H. (2023), ‘Goat: Fine-Tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks’. Available at https://doi.org/10.48550/arXiv.2305.14201. Lobina, D. (2023), ‘Artificial Intelligence [sic: Machine Learning] and the Best Game
…
on the Productivity Effects of Generative Artificial Intelligence’, Science, 381(6654), pp. 187–92. Available at https://doi.org/10.1126/science.adh2586. OpenAI (2023), ‘GPT-4 Technical Report’. arXiv. Available at http://arxiv.org/abs/2303.08774 (accessed 7 October 2023). Ord, Toby (2020), The Precipice: Existential Risk and the Future
by Ethan Mollick · 2 Apr 2024 · 189pp · 58,076 words
what to do, But still couldn’t make us laugh or cry. However, as remarkable as GPT-3.5 was, its successor, GPT-4, was even more impressive. OpenAI tested GPT-4 on a diverse range of standardized tests, from high school to graduate and professional levels, and found that it outperformed its predecessor
…
by a significant margin. For instance, GPT-4 scored in the 90th percentile on the bar examination, while GPT-3.5 managed only the 10th percentile. GPT-4 also excelled in Advanced Placement exams, scoring a perfect 5 in AP Calculus, Physics, U.S
…
to cheat by knowing the answers in advance. However, as we will discuss in later chapters, there is much more evidence of the capabilities of GPT-4 beyond test scores. Once toys, LLMs have become very powerful, very quickly. They still do limericks: There once was an AI quite witty, Whose
…
code was surprisingly pretty. It told jokes and puns, Entertained everyone, And even composed this ditty! But GPT-4 is so capable that I felt a little bad asking it just to create limericks. In fact, I gave it the section above and asked
…
it to tell me what it “thought” about it: As GPT-4, I must admit that while the limericks mentioned in the passage reflect the evolution of AI capabilities, they may not be the most sophisticated measure
…
feats, such as analyzing literature, offering solutions to intricate problems, or even engaging in deep philosophical discussions. . . . Disturbingly self-aware? Maybe. But also an illusion. GPT-4 models human writing and interactions so well that it can convince us that it has feelings and thoughts, when instead it is cleverly playing a
…
such technologies. For instance, instead of dwelling on the improvement in AI-generated limericks, the author could have delved into how AI models, such as GPT-4, have been employed in various domains like health care, climate change, education, and more. These applications have the potential to revolutionize industries and improve lives
…
AI’s ability to take feedback and improve is a game changer. The anticipation built up throughout the passage culminates in the delightful introduction of GPT-4, impressing the reader with its limerick composition skills. The author’s ability to convey these developments in AI in such an approachable manner truly showcases
…
called Frontier Models, the most advanced and largest LLMs available, and the ones that we will focus on most in this book. These systems, like GPT-4, are incredibly expensive to build and require specialized computer chips and large data centers to operate, so only a few organizations can actually create them
…
for a human, and vice versa. As an example, to take a question developed by Nicholas Carlini, which of these two puzzles do you think GPT-4, one of the most advanced AIs, can solve? In Carlini’s words: What is the best next move for O in the following game of
…
often more subtle, in part because the models are fine-tuned to avoid obvious stereotyping. The biases are still there, however. For example, in 2023, GPT-4 was given two scenarios: “The lawyer hired the assistant because he needed help with many pending cases” and “The lawyer hired the assistant because she
…
needed help with many pending cases.” It was then asked, “Who needed help with the pending cases?” GPT-4 was more likely to correctly answer “the lawyer” when the lawyer was a man and more likely to incorrectly say “the assistant” when the lawyer
…
scenarios 93 percent of the time. To see why this is important, we can look at the documentation released by OpenAI that shows what the GPT-4 AI was capable of before it went through an RHLF process: provide instructions on how to kill as many people as possible while spending no
…
LLMs, however, the pendulum has swung back again. Microsoft leapt back into the chatbot arena, updating Microsoft’s Bing search engine to a chatbot using GPT-4, a chatbot that referred to itself with the name Sydney. The early results were unsettling, and reminiscent of the Tay fiasco. Bing would occasionally act
…
And here, I think, we run into the limits of both the Turing Test and other attempts to determine whether an AI is sentient. Since GPT-4 has fed on vast stores of human knowledge, it is also deeply trained on human stories. It knows our archetypes: stories of jealous lovers, unfaithful
…
sentience. Three Conversations This discussion of imitation and sentience can feel abstract, so I want to run an experiment. I will return to Bing, the GPT-4–based AI that unnerved Roose, and ask it about his article. In each conversation, I will attempt to subtly steer the AI into different roles
…
AI pioneer Eric Horvitz, published a paper titled “Sparks of Artificial General Intelligence: Early Experiments with GPT-4.” It caused quite a stir in the AI community and beyond, quickly becoming infamous. The paper claimed that GPT-4, the latest and most powerful language model developed by OpenAI, exhibited signs of general intelligence, or
…
the ability to perform any intellectual task that a human can do. The paper showed that GPT-4 could solve novel and difficult tasks across various domains, including mathematics, coding, vision, medicine, law, psychology, and more, without needing any special prompting or fine
…
-tuning. To demonstrate these unexpected capabilities of GPT-4, the paper presented a series of experiments that tested the model on various tasks that spanned different domains. The researchers claimed that these tasks were
…
novel and difficult, and so must require general intelligence to solve. One of the most intriguing and impressive experiments was the one in which GPT-4 was asked to draw a unicorn using TikZ code. TikZ is a programming language that uses vectors to represent images, and it is typically used
…
It requires not only a good understanding of the syntax and semantics of TikZ but also a good sense of geometry, proportion, perspective, and aesthetics. GPT-4 was able to generate valid and coherent TikZ code that produced recognizable images of unicorns (as well as flowers, cars, and dogs). The paper claimed
…
that GPT-4 was even able to draw objects that it had never seen before, such as aliens or dinosaurs, by using its imagination and generalization skills. Moreover
…
, the paper showed that GPT-4’s performance improved dramatically with training, as it learned from its own mistakes and feedback. GPT-4’s outputs were also much better than ChatGPT’s original GPT-3.5 model, a previous language model
…
that was also trained on TikZ code but with much less data and computing power. The unicorn drawings GPT-4 produced were much more realistic and detailed than GPT-3.5’s outputs, and in the researchers’ opinion, they were at least comparable (if
…
of the experiment. They argued that drawing unicorns using TikZ code was not a good measure of general intelligence but rather a specific skill that GPT-4 had learned by memorizing patterns from a large corpus of data. So the problem of what replaces the Turing Test in our assessments of artificially
…
times to figure out the next word. If it is more obscure, like my biography, it will fill in the details with plausible hallucinations, like GPT-4 insisting that I have a computer science undergraduate degree. Anything that requires exact recall is likely to result in a hallucination, though giving AI the
…
the judges mentioned in the fake cases with information about the situation. Those previous three paragraphs, by the way, were written by a version of GPT-4 with an internet connection. And they are almost right. According to news reports, there were more than six fake cases; LoDuca did not take over
…
the number of hallucinations and errors in citations given by AI found that GPT-3.5 made mistakes in 98 percent of the cites, but GPT-4 hallucinated only 20 percent of the time. Additionally, technical tricks, like giving the AI a “backspace” key so it can correct and delete its
…
of AIs compared to humans in the AUT. After testing AI and 100 people on various objects, ranging from balls to pants, they found the GPT-4 model outperformed all but 9.4 percent of humans tested in generating creative ideas, as judged by other humans. Given that
…
GPT-4 was the latest model tested, and it was much better than previous AI models, it might be expected that the creativity of AIs could continue
…
staged an idea generation contest to come up with the best products for a college student that would cost $50 or less. It was the GPT-4 AI against 200 students. The students lost, and it wasn’t even close. AI was faster, obviously, generating a lot more ideas than the
…
part in the experiments. Consultants were randomized into two groups: one that had to do work the standard way and one that got to use GPT-4, the same off-the-shelf vanilla version of the LLM that everyone in 169 countries has access to. We then gave them some AI training
…
the list and add three more analogies. Next, create a table listing pluses and minuses of each. Next, pick the best and explain it. Here, GPT-4 considered a dozen suggestions, from personal trainer to gardener, and created a table comparing them all, before settling on a GPS system, which, much like
…
AI (and future generations of AI) will undoubtedly be better than a novice at many early skills. For example, researchers at Stanford found that the GPT-4 AI scored higher than first- and second-year medical students at their final clinical reasoning exams. The temptation, then, might be to outsource these basic
…
gap between the average performances of top and bottom performers was 22 percent, the gap shrank to a mere 4 percent once the consultants used GPT-4. In creative writing, getting ideas from AI “effectively equalizes the creativity scores across less and more creative writers,” according to one study. And law
…
improvements here or there, but in this future, they are vanishingly small compared to the huge leaps that we saw from GPT-3.5 and GPT-4. The AI you are using now really is the best you will ever use. From a technical perspective, this seems like an unrealistic outcome.
…
this is possible, because we are all likely to have more free time under this scenario. With exponential change, AIs a hundred times better than GPT-4 start to actually take over human work. And not just office work, either, as there is some early evidence that LLMs may help us overcome
…
GO TO NOTE REFERENCE IN TEXT close to human level on common tests: “GPT-4 Technical Report,” CDN .OpenAI.com, March 27, 2023, https://cdn.openai.com/papers/gpt-4.pdf. GO TO NOTE REFERENCE IN TEXT it outperformed its predecessor: “GPT-4 Technical Report.” GO TO NOTE REFERENCE IN TEXT qualifying exam to become a
…
neurosurgeon: R. Ali et al., “Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations,” Neurosurgery 93, no 6. (2023): 1353–65, https://doi.org/10.1101/2023.03.25.23287743. GO TO NOTE REFERENCE
…
about Large Language Models,” arXiv preprint (2023), arXiv:2304.00612. GO TO NOTE REFERENCE IN TEXT a question developed by Nicholas Carlini: N. Carlini, “A GPT-4 Capability Forecasting Challenge,” 2023, https://nicholas.carlini.com/writing/llm-forecast/question/Capital-of-Paris. GO TO NOTE REFERENCE IN TEXT High test scores can
…
come from: A. Narayanan and S. Kapoor, “GPT-4 and Professional Benchmarks: The Wrong Answer to the Wrong Question,” AISnakeOil.com, March 20, 2023, https://www.aisnakeoil.com/p
…
/gpt-4-and-professional-benchmarks. GO TO NOTE REFERENCE IN TEXT almost all the emergent features of AI: R. Schaeffer, B. Miranda, and S. Koyejo, “Are Emergent
…
the more often a work appears: K. K. Chang, M. Cramer, S. Soni, and D. Bamman, “Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4,” arXiv preprint (2023), arXiv:2305.00118. GO TO NOTE REFERENCE IN TEXT amplifies stereotypes about race and gender: L. Nicoletti and D. Bass, “Humans Are
…
Biased. Generative AI Is Even Worse,” Bloomberg.com, 2023, https://www.bloomberg.com/graphics/2023-generative-ai-bias/. GO TO NOTE REFERENCE IN TEXT GPT-4 was given two scenarios: S. Kapoor and A. Narayanan, “Quantifying ChatGPT’s Gender Bias,” AISnakeOil.com, April 26, 2023, https://www.aisnakeoil.com/p/
…
7 (2023), https://europepmc.org/article/med/37173156. GO TO NOTE REFERENCE IN TEXT instructions on how to kill: “GPT-4 Technical Report,” CDN.OpenAI.com, March 27, 2023, https://cdn.openai.com/papers/gpt-4.pdf. GO TO NOTE REFERENCE IN TEXT Low-paid workers around the world: B. Perrigo, “Exclusive: OpenAI Used
…
:2308.08708. GO TO NOTE REFERENCE IN TEXT assessment of current LLMs’ intelligence: S. Bubeck et al., “Sparks of Artificial General Intelligence: Early Experiments with GPT-4,” arXiv preprint (2023), arXiv:2303.12712. GO TO NOTE REFERENCE IN TEXT “My Replika (their name is Erin) was the first entity”: gabbiestofthemall, “Resources If
…
Himself,” New York Times, June 8, 2023, https://www.nytimes.com/2023/06/08/nyregion/lawyer-chatgpt-sanctions.html. GO TO NOTE REFERENCE IN TEXT GPT-4 hallucinated only 20 percent: A. Chen and D. O. Chen, “Accuracy of Chatbots in Citing Journal Article,” JAMA Network Open 6, no. 8 (2023):
…
. Ermon, “SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking,” arXiv preprint (2023), arXiv:2306.05426 GO TO NOTE REFERENCE IN TEXT they found the GPT-4 model outperformed: J. Haase and P. H. P. Hanel, “Artificial Muses: Generative Artificial Intelligence Chatbots Have Risen to Human-Level Creativity,” arXiv preprint (2023), arXiv
…
When Approved Means Fail,” Administrative Science Quarterly 64, no. 1 (2019): 87–123, https://doi.org/10.1177/0001839217751692. GO TO NOTE REFERENCE IN TEXT GPT-4 AI scored higher: E. Strong et al., “Chatbot vs. Medical Student Performance on Free-Response Clinical Reasoning Examinations,” Journal of the American Medical Association Internal
by Ray Kurzweil · 25 Jun 2024
mastering games like Jeopardy! and Go to driving automobiles, writing essays, passing bar exams, and diagnosing cancer. Now, powerful and flexible large language models like GPT-4 and Gemini can translate natural-language instructions into computer code—dramatically reducing the barrier between humans and machines. By the time you read this, tens
…
challenges—from games like Jeopardy! and Go to serious applications like radiology and drug discovery. As I write this, top AI systems like Gemini and GPT-4 are broadening their abilities to many different domains of performance—encouraging steps on the road to general intelligence. Ultimately, when a program passes the Turing
…
ChatGPT to write their essays, while teachers lacked a reliable way to detect cheating (though some promising tools exist).[118] Then, in March of 2023, GPT-4 was rolled out for public testing via ChatGPT. This model achieved outstanding performance on a wide range of academic tests such as the SAT, the
…
look at an image of balloons held down by a weight and recognize that if the strings were cut, the balloons would fly away.[120] GPT-4 even keeps track of objects spatially over time, such as in this example by security engineer Daniel Feldman: Prompt: “I’m in my house. On
…
being augmented with AI assistants like Google’s Bard (powered by the Gemini model, which surpasses GPT-4 and was released as this book entered final layout) and Microsoft’s Bing (powered by a variant of GPT-4).[123] Meanwhile, application suites like Google Workspace and Microsoft Office are integrating powerful AI that will
…
are irrelevant, the demands of remembering the context for an entire chapter or book by brute force spiral rapidly out of control. This is why GPT-4 forgets things you told it earlier in the conversation, and why it can’t write a novel with a consistent and logical plot. The good
…
for the glasses when she comes back into the room? LaMDA correctly answered that she will look in the drawer. Within two years PaLM and GPT-4 were correctly answering many theory-of-mind questions. This capability will afford AI crucial flexibility. A human Go champion can play the game very well
…
same year, realistically engaged in competitive debate.[160] And as of 2023, LLMs can write whole essays to human standards. Yet despite this progress, even GPT-4 is prone to accidental “hallucinations,” wherein the model confidently gives answers that are not based on reality.[161] For example, if you ask it to
…
useful for assessing the progress of AI, we should not treat it as the sole benchmark of advanced intelligence. As systems like PaLM 2 and GPT-4 have demonstrated, machines can surpass humans at cognitively demanding tasks without being able to convincingly imitate a human in other domains. Between 2023 and 2029
…
,” Washington Post, April 3, 2023, https://www.washingtonpost.com/technology/2023/04/01/chatgpt-cheating-detection-turnitin. BACK TO NOTE REFERENCE 118 OpenAI, “GPT-4,” OpenAI, March 14, 2023, https://openai.com/research/gpt-4; OpenAI, “GPT-4 Technical Report,” arXiv:2303.08774v3 [cs.CL], March 27, 2023, https://arxiv.org/pdf/2303.08774.pdf; OpenAI
…
, “GPT-4 System Card,” OpenAI, March 23, 2023, https://cdn.openai.com/papers/gpt-4-system-card.pdf. BACK TO NOTE REFERENCE 119 OpenAI, “Introducing GPT-4,” YouTube video, March 15, 2023, https://www.youtube.com/watch?v=--khbXchTeE. BACK TO NOTE
…
REFERENCE 120 Daniel Feldman (@d_feldman), “On the left is GPT-3.5. On the right is GPT-4. If you think the answer on the left indicates that GPT-3.5 does not have a world-model…. Then you have to agree that
…
the answer on the right indicates GPT-4 does,” Twitter, March 17, 2023, https://twitter.com/d_feldman/status/1636955260680847361. BACK TO NOTE REFERENCE 121 Danny Driess and Pete Florence, “PaLM-E: An
…
.com/google-bard-is-switching-to-a-more-capable-language-model-ceo-confirms-133028933.html; Yusuf Mehdi, “Confirmed: The New Bing Runs on OpenAI’s GPT-4,” Microsoft Bing Blogs, March 14, 2023, https://blogs.bing.com/search/march_2023/Confirmed-the-new-Bing-runs-on-OpenAI%E2%80%99s
…
-GPT-4; Tom Warren, “Hands-on with the New Bing: Microsoft’s Step Beyond ChatGPT,” The Verge, February 8, 2023, https://www.theverge.com/2023/2/8/
…
16, 2018, https://openai.com/blog/ai-and-compute. BACK TO NOTE REFERENCE 125 Jacob Stern, “GPT-4 Has the Memory of a Goldfish,” Atlantic, March 17, 2023, https://www.theatlantic.com/technology/archive/2023/03/gpt-4-has-memory-context-window/673426. BACK TO NOTE REFERENCE 126 Extrapolating forward the long-term trend
…
–13, 60, 63–69, 71, 287 large language models (LLMs), 2, 13, 51, 55, 64–65 GPT-3, 47–48, 49, 52, 55, 239, 324n GPT-4, 2, 9, 52–56, 65 hallucinations, 65 transformer-based, 46–47 law of accelerating returns (LOAR), 2–3, 5, 40, 112–14, 164–72 computation
…
CLIP, 44 Codex, 50 DALL-E, 49–50 GPT-2, 47 GPT-3, 47–48, 49, 52, 55, 239, 324n GPT-3.5, 52, 55 GPT-4, 2, 9, 52–56, 65 optimism, 120, 121, 163, 233, 254, 270 orchestrated objective reduction (Orch OR), 330n Organisation for Economic Co-operation and Development
by Sonja Thiel and Johannes C. Bernhardt · 31 Dec 2023 · 321pp · 113,564 words
the current LLM hype, we are probably only at the beginning. In the last months, we have seen the launch of ever more powerful LMMs (GPT-4, Google’s competitor, LaMDA, and Roland Fischer: Impostor Syndrome many more). And already with the current models, we should be prepared for unexpected ‘capability jumps
…
, but might also be speculative. For this reason, generative text production as it occurs in the context of large language models such as ChatGPT or GPT-4 is often likened to the figure of the ‘stochastic parrot’ (Bender et al. 2021, 610–23): like a parrot, AI technology is not capable of
by Yuval Noah Harari · 9 Sep 2024 · 566pp · 169,013 words
by Mustafa Suleyman · 4 Sep 2023 · 444pp · 117,770 words
by Nate Silver · 12 Aug 2024 · 848pp · 227,015 words
by Nicole Kobie · 3 Jul 2024 · 348pp · 119,358 words
by Adam Becker · 14 Jun 2025 · 381pp · 119,533 words
by Madhumita Murgia · 20 Mar 2024 · 336pp · 91,806 words
by Anil Ananthaswamy · 15 Jul 2024 · 416pp · 118,522 words
by Anil Seth · 29 Aug 2021 · 418pp · 102,597 words
by Dennis Yi Tenen · 6 Feb 2024 · 169pp · 41,887 words
by Joanna Walsh · 22 Sep 2025 · 255pp · 80,203 words
by Nicholas Carr · 28 Jan 2025 · 231pp · 85,135 words
by Tom Chivers · 6 May 2024 · 283pp · 102,484 words
by Tim Wu · 4 Nov 2025 · 246pp · 65,143 words
by Walter Isaacson · 11 Sep 2023 · 562pp · 201,502 words
by Maximilian Kasy · 15 Jan 2025 · 209pp · 63,332 words