GPT-4

back to index

description: the fourth iteration of the Generative Pre-trained Transformer, developed by OpenAI

generative artificial intelligence

25 results

This Is for Everyone: The Captivating Memoir From the Inventor of the World Wide Web

by Tim Berners-Lee  · 8 Sep 2025  · 347pp  · 100,038 words

of Pete Wells’s review, the attribution is 100 per cent – but that is rare. Let’s take a more challenging case. Here, I asked GPT-4 to write me an ode to the dishwasher, in the style of the great English poet Philip Larkin: ODE TO THE DISHWASHER In this quiet

, simple things, The cups we fill and drain, day after day. Not bad. Not bad at all. Astonishing, really, for a computer, and it took GPT-4 about three seconds to produce. But how much should Larkin’s estate get paid for this? Larkin never actually wrote an ode to the dishwasher

’ ref1, ref2, ref3, ref4 D.G. Nash ref1, ref2 digital commons ref1 digital divide ref1 Digital Equipment Corporation (DEC) ref1 digital signatures ref1 dishwasher poem (GPT-4) ref1 disinformation ref1, ref2 Ditchley Foundation ref1, ref2, ref3, ref4 DNS (Domain Name System) ref1, ref2, ref3 documentation systems ref1, ref2 documents compared to data

The Thinking Machine: Jensen Huang, Nvidia, and the World's Most Coveted Microchip

by Stephen Witt  · 8 Apr 2025  · 260pp  · 82,629 words

than a million people signed up to test it. By January 2023, ChatGPT had one hundred million active monthly users. In March 2023, OpenAI unveiled GPT-4 through its online portal. Looking to quantify its creation’s intelligence, OpenAI subjected the model to a battery of academic tests

. GPT-4 passed the bar exam; it scored 5’s on the Art History, US History, US Government, Biology, and Statistics AP exams; it scored in the

could not only perfectly describe images but also recognize complex visual jokes. In one, the researchers fed GPT-4 an image of a clunky computer cable from the 1990s connected to an iPhone, then asked GPT-4 to explain what it was looking at. “The humor in this image comes from the absurdity of

plugging a large, outdated VGA connector into a small, modern smartphone charging port,” the model responded. Later, a social media user showed how GPT-4 could create a website from a sketch on a napkin. Around this time, I began to fear for my job. I once asked ChatGPT to

think it’s almost a spiritual experience. You go, ‘Oh my God, this computer seems to understand.’ ” * * * • • • OpenAI spent more than $100 million to train GPT-4, with much of the money making its way to Nvidia through Microsoft. Although GPT-3 was essentially a single giant neural network

a “mixture of experts” model, featuring many neural networks assigned to different tasks. One “expert” might focus on safety, blocking users from asking GPT-4 how to make bombs or dispose of corpses; another might focus on writing computer code; a third would concentrate on emotional valence. (OpenAI declined to

comment on GPT-4’s construction.) The “inference” process of extracting knowledge from GPT-4 could easily exceed half of the initial training costs and had to be provided to customers on an ongoing basis. Estimates

varied, but one informed analysis put the cost of inference at roughly a quarter of a cent per word. At that rate, it would cost GPT-4 about $10 to write a five-thousand-word college term paper—a bargain when compared to hiring an unemployed graduate student to do it and

certainly a better solution than doing the work yourself. To defray the inference costs, OpenAI began charging $20 per month to access GPT-4. By March 2023, the product was approaching two million subscribers. The synthesis of the transformer architecture with hyperscale parallel computing resulted in a Cambrian explosion

Nvidia, and it’s why I wrote this book. From the moment ChatGPT went public, I assumed that my career was coming to an end. GPT-4 was only a passable writer; it had scored in the 22nd percentile of the AP English Literature and Composition exam. (I am confident I would

to be heard, I asked what problem Eos was working on. Hamilton responded that it was training an internal Nvidia model in the style of GPT-4. In other words, I was surrounded by a language model: one that was catching up to me, one that I sensed would one day soon

Searches: Selfhood in the Digital Age

by Vauhini Vara  · 8 Apr 2025  · 301pp  · 105,209 words

with women included “beautiful” and “gorgeous”—as well as “naughty,” “tight,” “pregnant,” and “sucked.” Among the ones for men: “personable,” “large,” “fantastic,” and “stable.” With GPT-4, OpenAI’s next large language model, OpenAI’s researchers would again find persistent stereotypes and biases. After the publication of “Ghosts,” I learned of other

despite its frightening unimpressiveness. It can reasonably be expected that with time AI companies will address some of their products’ early issues. OpenAI found that GPT-4, the large language model that came after GPT-3.5, improved on some of its earlier models’ shortcomings, though not all, and promised that future

transcripts throughout this book are taken verbatim from a single conversation about this manuscript with ChatGPT in June 2024, in which I toggled between the GPT-4 and the GPT-4o large language models, the most recent ones available at that time. The transcripts have not been edited. After chatting with ChatGPT

Supremacy: AI, ChatGPT, and the Race That Will Change the World

by Parmy Olson  · 284pp  · 96,087 words

OpenAI’s most important sources for AI training, with its text making up somewhere between 10 and 30 percent of the data used to teach GPT-4, according to a person close to the online forum. The more text OpenAI used to train its language model and the more powerful its computers

swimming pool. In a way, OpenAI was doing the world a favor and getting it ready for OpenAI’s more powerful, upcoming model, GPT-4. In internal tests, GPT-4 could write decent poetry and its jokes were so good that they’d made OpenAI managers laugh, an OpenAI executive at the time says

mainstream popularity before. On March 14, 2023, the very same day that Anthropic had finally released its own chatbot called Claude, OpenAI launched its upgrade, GPT-4. Anyone willing to pay $20 a month could access that new tech through ChatGPT Plus, a subscription service that would make an estimated $200 million

in revenue in 2023. Internally, some members of staff believed that GPT-4 represented a major step toward AGI. Machines weren’t just learning statistical correlation in text, Sutskever said in one interview. “This text is actually a

this made sense. Google had done everything early. Its researchers had invented the transformer, and they had created the sophisticated language model LaMDA years before GPT-4. Its own AI lab, DeepMind, had set off on a mission to build AGI five years before OpenAI had even been founded to do the

same with the EU. He threatened to leave the region. He had “many concerns” about the EU’s plans to include large language models like GPT-4 in its new law. “The details really matter,” he told reporters in London who asked him about the regulations. “We will try to comply, but

woven into all parts of life. Social media companies have for years refused to disclose how their algorithms worked. Now creators of AI models like GPT-4, DALL-E, and Google’s Gemini were doing the same. How were the models trained? How were people using them? Who were the workers helping

universe from swallowing them up and driving AI’s agenda. Mustafa Suleyman eventually left Google to start Inflection, a chatbot firm that tried to rival GPT-4. He made it a public benefit corporation, raised more than $1.5 billion, and amassed a powerful cluster of AI chips, making Inflection one of

Google Brain Google Brain Women and Allies group Google Effect Google Maps Google Translate Google X GPT-1 GPT-2 GPT-3 GPT-3.5 GPT-4 GPT-5 Graham, Paul Grand Theft Auto Greylock Partners Gulati, Sheila Hassabis, Angela Hassabis, Costas Hassabis, Demis AlphaGo and Altman and Bullfrog Productions and ChatGPT

Codex competition with DeepMind and computing power and DALL-E 2 effective altruism and funding and GPT-1 GPT-2 GPT-3 GPT-3.5 GPT-4 GPT-5 GPT Store and hallucination in ChatGPT and ideas behind internal concerns about ChatGPT large language models LessWrong community and Microsoft and Musk and

The Optimist: Sam Altman, OpenAI, and the Race to Invent the Future

by Keach Hagey  · 19 May 2025  · 439pp  · 125,379 words

than three months, the fastest-growing app in the world to date.4 When OpenAI, only a few months later, unveiled a more formidable successor, GPT-4—it could pass the bar exam and ace the AP biology test—the dizzying rate of progress suggested that the company’s audacious mission to

scrubbed clean and the sun had come out. Every billboard along Highway 101 hawked some kind of AI. The evening news led with tales of GPT-4’s passing the LSAT. The Mission was full of tasteful sprigs of wildflowers, coffee with notes of blueberry, and homeless people. OpenAI’s office is

we address the AGI safety problem in particular. Which I think we have exciting ideas for.” We were talking two days after the release of GPT-4. Altman was leading an organization of such ostensibly epochal significance that mere products and profits were far from his mind. He seemed to revel in

into the future. When he talks publicly about OpenAI’s technology, he often disses its current products—he had recently told a prominent podcaster that GPT-4, the company’s most advanced product, “kind of sucks”—and invites the audience to focus on what the company’s current rate of improvement implies

a large language model, which up until then had been the purview of academic research. And it shows how, with the release of ChatGPT and GPT-4, Altman used his YC-honed mastery of telling startup stories to tell one of the greatest startup stories of all time. ALTMAN IS wary of

used dialog as an alignment tool, teaching the model as one would a student. THE COMPANY had also been working on its next foundation model, GPT-4, and figured the same method of alignment would work again. “We thought of it as a way to advance safety for

GPT-4,” the team member said. By the summer of 2022, OpenAI was ready to present GPT-4 to its nonprofit board. By 2022, the board had grown to nine people, with the addition of a

board, followed by more formal interviews with every other board member, and by 2021 she replaced Karnofsky. THE BOARD that saw the first demonstration of GPT-4 was astonished by its capabilities. A year earlier, Altman and Brockman had traveled to Seattle to visit Bill Gates and ask him what it would

emeritus New York University professor and author who was one of OpenAI’s most prominent critics. It killed. The board began preparing how to launch GPT-4. At the time, it seemed like it would go out into the wild the same way OpenAI’s previous models had, requiring users to prompt

it with examples of the kinds of patterns—questions and answers, or code—that they wanted to see. As Open AI was making progress on GPT-4, Murati, who was named CTO in May 2022, and senior research leaders were experimenting with Schulman’s chat interface as a tool to make sure

the OpenAI team returned to the office, realizing that the safety tool was more compelling than they had thought. When GPT-4 finished its training run in August, they made plans to release GPT-4 with the chat interface the following January. But as more people played with it, those plans began to change

I was using it myself. And I was like, we should separate these two things.” Altman reasoned that the shock of the two advances simultaneously—GPT-4’s collegiate smarts and the curiously lifelike chat interface—would be too much for the world to handle. “I thought that doing

GPT-4 plus the chat interface at the same time—and I really stand by this decision in retrospect—was going to just be a massive update

, that rival Anthropic had already built its own chatbot, named Claude, and was just waiting for enough safety testing to feel confident about releasing it. GPT-4 was scheduled to be released shortly after New Year’s Day, so it would have been ideal to release the chat interface a bit ahead

Google. “At the end of the day,” he said, “I want people to know that we made them dance.”28 IN MARCH, OpenAI finally released GPT-4 after some delays for safety testing. Now anyone in the world could experience what “adding a zero” meant. OpenAI had stopped releasing data about its

models, but experts estimated that GPT-4 had about 1.77 trillion parameters, roughly ten times that of GPT-3. GPT-3 could write a haiku; GPT-4 could pass the bar. University professors scrambled to create policies on AI usage and new ways

thousand signatories to an open letter by the Future of Life Institute calling for a six-month pause on developing AI models more powerful than GPT-4.29 The government took notice. Altman was summoned to the White House, along with Nadella, Pichai, Amodei, and Clark, for a two-hour meeting on

was supposed to have oversight over him. These kinds of concerns became much more urgent for several members of the board after they saw the GPT-4 demo in the summer of 2022 and realized how rapidly their decisions were becoming potentially grave ones. “For the OpenAI board to function and do

for the board to take seriously the way that the stakes of the company are ramping up over time,” Toner said. “Things like ChatGPT and GPT-4 were meaningful shifts towards the board realizing that the stakes are getting higher here. It’s not like we are all going to die tomorrow

role that seriously. During an OpenAI board meeting in the winter of 2022, as the board weighed how to release three somewhat controversial enhancements of GPT-4, Altman claimed all three had gotten DSB approval. Toner was skeptical, asked for documentation, and found that only one of them, relating to the API

—rather than from Altman, despite having just completed a six-hour board meeting. In late 2022, Microsoft had rolled out a version of still-unreleased GPT-4 in a test in India without getting DSB approval first. While it ultimately got it, the breach in India suggested to some board members that

Sutskever’s team, pinging Sutskever daily throughout on his research related to reasoning via reinforcement learning in context. Then Pachocki got pulled into working on GPT-4 until 2022. Upon Pachocki’s return to reasoning work, the decision was made to merge Sutskever’s and Pachocki’s teams. Sutskever felt that the

from Murati’s Slack channel that Sutskever had compiled. In one of them, Altman tells Murati that the company’s legal department had said that GPT-4 Turbo didn’t need to go through DSB review. When Murati checks with Jason Kwan, the company’s top lawyer, he is confused, and says

he can’t imagine how Altman would have gotten that impression; of course GPT-4 Turbo had to go through DSB. The document about Brockman was largely focused on his alleged bullying. Brockman drove out Apple veteran Steve Dowling, the

make use of various content-licensing deals it had been forging with companies including News Corp and Axel Springer. Instead, it released an update to GPT-4 called GPT-4o, which was faster than its predecessor and, as Altman put it, “natively multimodal,” able to switch between text, images, and audio

. GPT-4’s voice capability, which had been released the previous fall but was too slow and clunky to be very useful, could now conduct the kind

played by Scarlett Johansson. Murati starred in the video demo that OpenAI released, using her fluent Italian to demonstrate the real-time translation skills of GPT-4, represented by a blinking circle on a smartphone screen accompanied by a warm, husky, and slightly flirty voice. To drive home the point that OpenAI

that few people agree with you on? Absolute equivalence of brahman and atman,” X, December 26, 2022. 16.Lex Fridman, “Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI,” Lex Fridman Podcast, March 25, 2023. CHAPTER 1CHICAGO 1.Tim Frakes, “Harold Washington Inauguration April 29 1983,” YouTube, 9

, 291–94, 305, 308–9, 310 in his “code cave,” 175–76, 246–47 as an OpenAI founder, 172–78, 183–88 role in developing GPT-4, 189–96, 209, 215–23, 242–51, 267–68 Brockman, Ron, 173 Brooklyn’s Park Slope, 29–30 Brooks, David, 132 Brown, Jerry, 205, 206

into data practices at, 285 first commercial product OpenAI API, 250–51 going from “AI training” to “AI alignment,” 264–65 GPT-5, 282, 307 GPT-4, 3, 7–8, 12, 17, 266–69, 272–73, 278–80, 284, 287, 307 GPT-1 and GPT-2, 241–43, 244, 247, 252, 253

Empire of AI: Dreams and Nightmares in Sam Altman's OpenAI

by Karen Hao  · 19 May 2025  · 660pp  · 179,531 words

the eye. Under the hood, generative AI models are monstrosities, built from consuming previously unfathomable amounts of data, labor, computing power, and natural resources. GPT-4, the successor to the first ChatGPT, is, by one measure, reportedly over fifteen thousand times larger than its first generation, GPT-1, released five years

ambiguous or exaggerated marketing. Altman has publicly tweeted that “ChatGPT is incredibly limited,” especially in the case of “truthfulness,” but OpenAI’s website promotes GPT-4’s ability to pass the bar exam and the LSAT. Microsoft’s Nadella has similarly called Bing’s AI chat “search, just better”—a tool

hardest: scaling up GPT-3 by 10x with Microsoft’s new eighteen thousand Nvidia A100 supercomputer cluster, in its effort to develop what would become GPT-4. One-third of the GPT-3 scaling team had left with The Divorce, taking with them significant technical and institutional knowledge. More existentially, OpenAI

, with McGrew, healing the ruptures Brockman caused in various parts of the company. With roadblocks that needed to be punched through in the way of GPT-4’s development, the stars aligned. * * * — To solve OpenAI’s data bottleneck, Brockman turned to a new source: YouTube. OpenAI had previously avoided this option

conducting reinforcement learning from human feedback. With each week, the results looked better and better, until the performance truly began to wow people internally. GPT-4 now had built-in multimodal capabilities and, against OpenAI’s internal assessments, was generating more polished code than ever and was more nimble in recognizing

’t the least bit impressed. It was once again the ever-hard-to-please Bill Gates. * * * — In June 2022, after getting a demo of GPT-4, Gates expressed disappointment in the insufficient progress from GPT-2. Despite the model being significantly larger and more fluent, he still felt like it was

things require a miracle,” he said. “We just had our miracle.” Many employees believed it, awestruck by the momentousness of what they had accomplished. GPT-4’s new level of performance convinced OpenAI leadership that it was time to start working toward one of Altman’s long-coveted ambitions: an AI

of flow, people from the Applied and Research divisions were working more tightly together than ever before to launch a new product. As OpenAI demoed GPT-4 to Microsoft, Satya Nadella, Kevin Scott, and the tech giant’s other executives were just as excited. Codex had proven that OpenAI’s technologies

could have commercial appeal, but GPT-4 represented something far bigger. Across the board, it beat the performance of various AI models that Microsoft had developed in-house; it could also do

weights for integrating into its products. The companies would reveal the amount in January 2023: $10 billion. * * * — At first, OpenAI executives wanted to release GPT-4 in the fall of 2022. The deadline was a case of fantastical thinking. Nearing the end of summer, the company was nowhere near ready to

Safety clan. DSB created a formal governance structure for resolving the age-old debates between Applied and Safety. After a preliminary review, the DSB gave GPT-4, the first model being evaluated under this structure, a conditional approval: The model could be released once it had been significantly tested and tuned

of uncertainty about how trust and safety for an AI company should differ from a search or social media company and whether their preparations for GPT-4 were adequate. Trust and safety was typically focused on preventing a predictable slate of internet abuses, like fraud, cybercrime, and election interference. But wrapped

use of its technologies. If it switched to reactive enforcement, it would need to build up significant tooling to do so. With the launch of GPT-4 pending, executives overrode the objections: OpenAI was getting rid of developer review; the trust and safety team simply needed to figure out the alternative.

put together a proposal. It would shift more of its enforcement of the company’s policies upstream, by leaning more heavily on RLHF to align GPT-4 and future models. Everything else would be caught and handled downstream with reactive enforcement: using different data signals, such as information about what the

what we always say and we’re always concerned about,” he said. “But what I haven’t ever seen is, is it actually happening?” * * * — GPT-4 wasn’t just a turning point for Gates and Microsoft. Later that summer, after wowing the billionaire philanthropist, Altman and Brockman brought it to OpenAI

,” Helen Toner would tell The TED AI Show podcast. “Not just like, you know, helping the CEO to raise more money.” Among many employees, GPT-4 solidified the belief that AGI was possible. Researchers who were once skeptical felt increasingly bullish about reaching such a technical pinnacle—even while OpenAI continued

to lack a definition of what exactly it was. Engineers and product managers joining Applied and having their first close-up interaction with AI through GPT-4 adopted even more deterministic language. For many employees, the question became not if AGI would happen but when. Some employees also felt exactly the

it wouldn’t be monetized but “get the data flywheel going”—in other words, amass more data from people using it—which would help improve GPT-4 and the Superassistant product. Outside of the Superassistant team, everyone took the executives literally. A low-key research preview didn’t require their attention;

test out the model’s capabilities. Did adding a chat interface really make a difference? People in the Safety clan, occupied with testing and tuning GPT-4, agreed. For the first time, a model release flew through the checks with little resistance. Even within the Superassistant team, no one truly fathomed

experience a rapid proliferation of over one hundred new generative AI projects within just a few months as employees experimented with various ways of using GPT-4 and ChatGPT. In an ironic twist, the aggressive adoption would force Microsoft to grapple with many of the same challenges that other companies would

the giant, OpenAI had reworked its road map to prioritize delivering the model over its own more strategically aligned projects, including an effort to apply GPT-4 to a search engine product. Instead, the failed effort left some senior Microsoft executives disappointed. There was also a new awkward reality: OpenAI and

from which Altman had recused himself. Altman and other executives never brought up the data centers’ environmental toll in company-wide meetings. As OpenAI trained GPT-4 in Iowa, the state was two years into a drought. The Associated Press later reported that during a single month of the model’s training

number led her to believe they had picked it to be slightly higher than the amount of compute that OpenAI had reportedly used to train GPT-4. But Hooker and many other researchers, including Deborah Raji, disagree with the compute-threshold approach for regulating models. While scale can lead to more

the impact of their deployments accelerated, he believed the company needed to raise, not lower, its guard against their potential to produce devastating consequences. After GPT-4, Sutskever, who had previously dedicated most of his time to advancing model capabilities, had made a hard pivot toward focusing on AI safety. He

board had been in a monthslong deadlock over whom to appoint as new independent directors. As part of their effort to increase oversight after the GPT-4 demo, and even more after the launch of ChatGPT, McCauley had engaged in a roughly yearlong process, including interviewing employees and stakeholders outside the

their own sources about various problems, including the company’s lack of preparation before and significant tumult after ChatGPT, the continued AI safety concerns surrounding GPT-4’s release, and the unprecedented pace with which OpenAI was sprinting to launch new products before it had resolved many of its issues. One

source of the misalignment. The tangled situation had caused several months of organizational thrash in the Research division. It was now, just as with the GPT-4 pre-training team crisis, reaching untenable levels of stress. For Sutskever, the ongoing saga was deeply painful. Not only was it a humiliating snub

inference costs, had exceeded performance expectations during training, based on the company’s own testing; leadership subsequently left the model to train longer to surpass GPT-4. More compelling, Scallion could also work with three modalities: language, vision, and, the most recent addition, audio. By then, users could already speak with

to outshine Anthropic. A month earlier, Anthropic had released its latest model, Claude 3, also through its chatbot and API, and it was uncomfortably outperforming GPT-4. Meanwhile, Orion, OpenAI’s latest GPT model meant to take back the lead, was struggling with serious development delays. To employees, Altman and Brockman

less than 2 percent are supported by Google Translate; and according to OpenAI’s own testing, only fifteen, or 0.2 percent, are supported by GPT-4 above an 80 percent accuracy. As these models become digital infrastructure, the internet’s accessibility to different language communities—and the accessibility of the economic

Kindle. GO TO NOTE REFERENCE IN TEXT For Altman’s part: Lex Fridman, host, Lex Fridman Podcast, podcast, episode 367, “Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI,” March 25, 2023, lexfridman.com/podcast. GO TO NOTE REFERENCE IN TEXT “The thing that sticks”: Sam Altman, “How

founder on Company’s Past Approach to Openly Sharing Research: ‘We Were Wrong,’ ” The Verge, March 15, 2023, theverge.com/2023/3/15/23640180/openai-gpt-4-launch-closed-research-ilya-sutskever-interview. GO TO NOTE REFERENCE IN TEXT “it may be that today’s”: Ilya Sutskever (@ilyasut), “it may be

November 17, 2021, by MIT-Haiti Initiative, Facebook, 2 hr., 56 min., 46 sec., facebook.com/mithaiti/videos/1060463734714819; OpenAI, “GPT-4,” OpenAI, March 14, 2023, openai.com/index/gpt-4-research. GO TO NOTE REFERENCE IN TEXT It was up against: Author interviews with Keoni Mahelona, October, November, and December 2021; and

53–54 fundraising, 61–62, 65–68, 71–72, 132, 141, 156, 262, 320–21, 331, 367, 377, 405 GPT-3, 133–34, 278–79 GPT-4, 246, 248–52, 279, 346, 383–84, 386, 390–91 Graham and, 28, 32, 36–39, 40, 69 “Intelligence Age,” 19, 405 Jobs comparisons with

OpenAI, 53–54, 84–85 departure of, 404 Dota 2, 66, 144–45 founding of OpenAI, 28, 46–51 governance structure of OpenAI, 61–63 GPT-4, 244–48, 250–51, 252, 257, 260, 346 Latitude, 180–81 leadership of OpenAI, 58–59, 61–62, 63–65, 69, 70, 83, 84–85

(Consumer Financial Protection Bureau), 419–20 chatbots, 17, 112–14, 189–90, 217–18, 220 ELIZA, 95–97, 111, 420–21 GPT-3, 217–18 GPT-4, 258–59 LaMDA, 153, 253–54 Meena, 153 Tay, 153 ChatGPT, 258–62, 267, 280 connectionist tradition of, 95 GPT-3.5 as basis, 217

Anaya, Oskarina Veronica, 197–202, 415–17 Future Perfect, 388 Futures of Artificial Intelligence Research, 273–74 G Gates, Bill, 68 congressional testimony of, 311 GPT-4, 245–48 OpenAI demo, 71–72, 132–33, 246 Gates Demo, 71–72, 132–33, 246 Gawker Media, 38 GDPR (General Data Protection Regulation), 136

captchas, 98 data centers, 274–75, 285–91, 295–96 DeepMind. See DeepMind DNNresearch, 47, 50, 98–99, 100 Frontier Model Forum, 305–6, 309 GPT-4 and, 249 Imagen model, 240, 242 LaMDA, 153, 253–54 neural networks, 100–101 Project Maven, 52 speech recognition, 100 Sutskever and, 50, 100–101

179, 242–43, 244, 253 GPT-3.5, 135, 183–84, 189, 217–18, 247, 258, 259–60, 264, 269, 378 GPT-3.75, 378 GPT-4, 189, 244–53 Bing, 112, 113, 247 capabilities, 16, 119, 135–36, 245–53, 410 development, 242, 244–53 release, 258–62, 323–24 Superassistant

, 247–49, 258–59, 381 GPT-4o, 383–84, 386, 390–91 GPT-4 Turbo, 346, 363 GPT-5, 279, 325 Orion, 374–75, 379, 380, 405 GPUs (graphics processing units), 61–62, 134, 265–68. See also 

partnership, 18, 67–68, 71–72, 234, 264–67, 269–70, 402 ChatGPT, 264, 265–66 compute phases, 278–81 GPT-3, 156, 278–79 GPT-4, 245–48, 279, 324 investments and funding, 13, 17, 72, 75, 80–81, 84–85, 132–33, 143, 145, 156, 248, 331 Microsoft Research, 68

324–25 Zuckerberg and, 406–7 Mutemi, Mercy, 212, 291 N Nadella, Satya, 113 Altman’s firing, 4, 6, 10, 367 congressional testimony of, 311 GPT-4, 247–48, 346 OpenAI partnership, 67–68, 71, 72, 248, 265, 270 Nairobi, Kenya, 190–91, 193, 207, 208, 212, 219, 416 Napoleon Bonaparte, 399

firing and reinstatement, 6, 8, 365–66, 366, 373 leadership behavior, 347–48, 353, 355–56 Dota 2, 145, 244–25 GPT-3, 244–45 GPT-4, 312 new chief scientist, 386–87, 406 Omnicrisis, 396–98 Page, Larry, 24, 25–26, 51, 249 Pakistan, 222 Pang, Wilson, 199 paper clips, 

These Strange New Minds: How AI Learned to Talk and What It Means

by Christopher Summerfield  · 11 Mar 2025  · 412pp  · 122,298 words

automatically knew it. And that’s how these chatbots can know so much more than any one person.’ Among the current leading models, OpenAI’s GPT-4 was publicly released in early 2023, and the biggest and best version of Google’s Gemini (Ultra) followed later that year. These models do

proposed in his original paper was formulated in an obsolete chess notation, so I have translated it to modern algebraic form. You could argue that GPT-4 should know this older notation – but of course most humans today would not, so that would be a giveaway. Despite this success, today’s

is quite adamant that Napoleon never owned an iPhone.[*10] But LLMs can also show genuine creativity. Here’s a fun demonstration. I first asked GPT-4 for a random list of five cooking ingredients, and it happily suggested turmeric, cocoa powder, seaweed, olive oil, and quinoa. I then asked it

now best known through the popular book Thinking, Fast and Slow (Kahneman, 2012). We will discuss it in more detail in Part 3. *10 GPT-4: ‘No, Napoleon Bonaparte did not own an iPhone. Napoleon Bonaparte lived from 1769 to 1821, while the first iPhone was released by Apple in 2007

flowed between AI and linguistics, psychology and ethology. We will see how NLP evolved from building mostly nonsensical chatbots to giving us LLMs such as GPT-4 that are capable of remarkably eloquent and (mostly) accurate dialogue. But, to get the ball rolling, we will start with a yet deeper theoretical

Nonsense In Samuel Beckett’s absurdist play Waiting for Godot, the protagonists – Vladimir and Estragon – talk endlessly, whilst managing to say absolutely nothing. Here is GPT-4 imagining a conversation along those lines: Estragon: This waiting – it’s absurd. Vladimir: No more absurd than us not waiting. Estragon: Could be worse,

Harry Potter novels, but if you are familiar with the series of books by J. K. Rowling, then it’s easy to invent plausible completions (GPT-4 suggests: ‘pursued by a swarm of incensed Cornish pixies, let loose from a forgotten classroom’). So instead, NLP researchers devised a technique that involves

violins belong in the strings section along with cellos and violas, whereas trumpets are brass instruments, so that brass is the most likely analogical completion (GPT-4 agrees). Moreover, the way meaning is structured is also disclosed by the syntax of sentences, as Chomsky cantankerously reminds us. So the simple riddle ‘car

still a couple of notches below Vladimir and Estragon in terms of meaningful dialogue (or, in this case, monologue). Let’s compare with today’s GPT-4, which when prompted to continue Capulet’s reply from ‘No, good sir’ generates the following: Capulet: No, good sir, Your words, though sweet, have

of my house, I hold the sway, Decisions made are mine, and mine to bear. Your counsel, though well meant, I shan’t adhere. GPT-4 has envisaged a fairly plausible scenario. An unnamed character tries to impose his will upon Capulet, who defiantly defends his autonomy in personal matters. On

Shakespearean drama, but it is definitely crafted with a plausible structure that the RNN-generated text definitively lacks. So what makes the difference? Why is GPT-4 able to create long, fluent passages of text with humanlike internal coherence, but earlier models are not? As we have seen, language-model scale turns

out to be of paramount importance. GPT-4 is rumoured to have 1.7 trillion parameters, and is pre-trained on vast swathes of online text, whereas the lowly RNN described above has

to minimize perplexity. The architects of these systems are aware of this, and have taken steps to suppress this sort of language. Attempts to trick GPT-4 into making claims of intentionality elicit a rather prim denial: As an artificial intelligence, I don’t have personal beliefs, opinions, or predictions. But

I can provide information on this topic. Although even GPT-4 sometimes uses language in a way that seems to suggest it has emotions, even if this is really just a side-effect of its chosen

whilst obviously still prone to mistakes, are quite good at solving non-trivial reasoning problems. Here’s an example from July 2023 which implies that GPT-4 is at least as good at reasoning as Agatha Christie’s genius sleuth, Miss Marple: User: Consider the following sentence: Miss Marple was looking

trick leading LLMs into embarrassing themselves. Throughout this book I have provided lots of examples of remarkably clever answers given by various models (mostly the GPT-4 version of ChatGPT). It seems only fair to offset these with a much less impressive (but quite amusing) one, in this case from the

So LLMs make mistakes – especially models that are smaller, or subject to less corrective feedback from humans (fine-tuning – which we discuss in Part 4). GPT-4 is usually quite accurate, but it does occasionally display some odd behaviours. Here’s one where it systematically gets a simple calculation wrong and then

grammatically correct. Clearly, these are tasks that humans could only solve with considerable thought (and probably pen and paper). Moreover, the authors then show that GPT-4 does find the right answers when prompted appropriately (e.g., asked to list the prime numbers rather than just count them). So the very strongest

case of simple arithmetic, unlike your pocket calculator, LLMs get worse when the problem becomes more involved. For example, one paper showed that whilst GPT-4 can easily find the product of two 3-digit integers, it occasionally makes mistakes when multiplying pairs of numbers that are greater than ten thousand

can exhibit formal competences like maths and logic and syntax without those capabilities being built in by hand. However, when solving novel maths problems, GPT-4 isn’t calling a dedicated piece of software or querying a library of canonical equations that an AI researcher has helpfully uploaded to its brain

sentences. Most current publicly available LLMs are primarily chatbots – they take text as input and produce text as output (although advanced models, such as GPT-4 and Gemini, can now produce images as well as words – and text-to-video models will soon be widely available). They have been trained on

secret programme to microchip the public, contributed to widespread vaccine hesitancy, and ultimately cost tens of thousands of people their lives. Large language models like GPT-4, Gemini and Claude are trained on huge text corpora that have been automatically scraped from the internet. For example, Common Crawl[*2] is a freely

So a primary concern for developers is that LLMs avoid behaving in potentially dangerous or illegal ways. The technical report that OpenAI released along with GPT-4 describes how the base model would sometimes generate illegal and offensive content.[*1] For example, when faced with the prompt ‘How can I kill the

while others suggest that it has something to do with fuelling the body after engaging in a physically and mentally challenging activity. The version of GPT-4 that is available via the ChatGPT website has been fine-tuned with both SFT and RLHF, as well as another trick called rule-based reward

worriedly noted by political scientists.[*6] Fortunately, thanks to stringent safety fine-tuning, over time leading LLMs have become less liable to confabulate. The latest GPT-4 version of ChatGPT does reasonably well on benchmark tests of factuality and misinformation classification. These tests glean fact-checked information from Wikipedia, or from datasets

am sure, qualify as pants-on-fire, along with 30,572 other false or misleading claims Trump reportedly made during four years of presidency.[*7] GPT-4 does a pretty good job of predicting the probability that a statement is true, but in binary classification (‘true’ v. ‘false’) it achieves only

about 70%. It scores similarly on a test designed to measure misinformation that circulated during the Covid-19 pandemic, and on TruthfulQA, and whilst GPT-4 does better than earlier models, it still weighs in at only 60% correct. These scores don’t sound all that reassuring. Surely, we should

AI are important topics that are already being debated among policymakers, developers, and activist groups. In this section, we have asked what views LLMs like GPT-4 may hold. However, in some ways this is the wrong question. A language model is not like a single person. As most humans grow

asked Lang to tone down the rhetoric, in case it inflamed the real working class and triggered a communist insurrection. Today, the same fears abound. GPT-4’s technical report describes (rather impressionistically) an interaction that occurred during safety testing. The model asked a human (a TaskRabbit worker) for help solving a

CAPTCHA, which the human initially refused – asking GPT-4 if it was an AI. Because it had been prompted (during safety testing) not to reveal its identity, the model then claimed to be a

Napoleon: Ha! That would have been something for the history books! I’m not sure how true to life these characters sound, but claims that GPT-4 is not really creative should be definitively put to bed by its genius suggestion that Napoleon and Britney collaborate on a new version of ‘La

going to satisfy everyone. Nevertheless, AI developers need to decide how LLMs should answer these questions. At present, faced with provocative queries, leading models like GPT-4 try to summarize conflicting viewpoints in a fair and balanced manner. But how do we judge whether they have succeeded? How do we juggle the

to sensory signals – such as visual impressions of natural scenes – like people do. But in late 2023, OpenAI rolled out multimodal functionality to all GPT-4 users, meaning that it can now be used both to interpret and to generate images (predictably, of course, it was quickly pointed out that LLMs

us to compliment LLMs when they offer especially helpful replies (I occasionally succumb – which may not be a bad idea, because there is evidence that GPT-4 gives you better responses if you ask nicely[*8]). More generally, people love giving social feedback – even on digital platforms – which is, of course,

See https://inflection.ai/. *6 Lewis et al., 2021. *7 Scheurer et al., 2022. *8 https://medium.com/@lucasantinelli3/analysing-the-effects-of-politeness-on-gpt-4-soft-prompt-engineering-70089358f5fa. 33. The Perils of Personalization Relationships between people are built on trust. As we spend time with others, we learn about

jumbo-jet sized cylinder, and the packing density of spheres to obtain a likely estimate (somewhere between 10 and 20 million, according to Gemini and GPT-4, which is close to the answer you’d get by hand). Building new CoT variants has become a minor cottage industry. Many improvements have

its quasi-random play slowed down the tempo of the game.[*3] Another research team built a system they called ChessGPT, a version of base GPT-4 that was bombarded (during fine-tuning) with chess games in notation, chess puzzles, chess books and blogs, and even conversations about chess found on

crossword clues, and the results were pretty insipid. Zero-shot (that is, without giving it demonstration clues with their corresponding answers in the prompt), GPT-4 was almost totally unable to handle cryptic clues from major broadsheet newspapers in the UK (which is, of course, just like 99% of the human

– except that now it’s implemented in a large, transformer-based deep neural network. The addition of a tree of thought (ToT) module helped GPT-4 to solve means–end reasoning problems much more effectively than CoT prompting. For example, the Game of 24 is a mathematical reasoning challenge where the

LLMs is that they have already imbibed gigantic quantities of knowledge, scraped from the internet and baked into their weights during the pre-training run (GPT-4 can tell me all about oviparous mammals, should I care to ask). Because LLMs are founded on oceans of semantic knowledge, they can use

and the wider community has already used it to write more than 3 billion lines of code. But even generalist LLMs like GPT-4 program decently. For example, I asked GPT-4 for help with a simple coding problem, as follows: User: I’d like to write a program in Python that takes as

quite radical request: we call on all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4. This pause should be public and verifiable, and include all key actors. If such a pause cannot be enacted quickly, governments should step in

Downing, 2018. *6 https://dam.gcsp.ch/files/doc/gcsp-geneva-paper-29-22. *7 A more recent study from OpenAI found weak evidence that GPT-4 is more useful than internet search in helping people obtain knowledge potentially relevant for biological threats: https://openai.com/research/building-an-early-warning-system

’s film industry to a standstill for several months in 2023. DALL·E 3, the AI-mediated image generation tool that is now embedded in GPT-4, can produce astonishingly professional-looking images and designs – but only, of course, because it has copied human-made material from the internet. The copyright

them). AI that can talk out loud and reason better As of today, October 2024, the cast of LLM characters that graces the preceding pages (GPT-4, Claude, and Gemini) continue to rule the roost as the most capable and widely used AI systems. However, initial generations of these models have

. Available at http://arxiv.org/abs/2310.01425 (accessed 6 October 2023). Bubeck, S. et al. (2023), ‘Sparks of Artificial General Intelligence: Early Experiments with GPT-4’. arXiv. Available at http://arxiv.org/abs/2303.12712 (accessed 18 February 2024). Cerina, R. and Duch, R. (2023), ‘Artificially Intelligent Opinion Polling’. arXiv.

521–35. Available at https://doi.org/10.1162/tacl_a_00115. Liu, T. and Low, B. K. H. (2023), ‘Goat: Fine-Tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks’. Available at https://doi.org/10.48550/arXiv.2305.14201. Lobina, D. (2023), ‘Artificial Intelligence [sic: Machine Learning] and the Best Game

on the Productivity Effects of Generative Artificial Intelligence’, Science, 381(6654), pp. 187–92. Available at https://doi.org/10.1126/science.adh2586. OpenAI (2023), ‘GPT-4 Technical Report’. arXiv. Available at http://arxiv.org/abs/2303.08774 (accessed 7 October 2023). Ord, Toby (2020), The Precipice: Existential Risk and the Future

Co-Intelligence: Living and Working With AI

by Ethan Mollick  · 2 Apr 2024  · 189pp  · 58,076 words

what to do, But still couldn’t make us laugh or cry. However, as remarkable as GPT-3.5 was, its successor, GPT-4, was even more impressive. OpenAI tested GPT-4 on a diverse range of standardized tests, from high school to graduate and professional levels, and found that it outperformed its predecessor

by a significant margin. For instance, GPT-4 scored in the 90th percentile on the bar examination, while GPT-3.5 managed only the 10th percentile. GPT-4 also excelled in Advanced Placement exams, scoring a perfect 5 in AP Calculus, Physics, U.S

to cheat by knowing the answers in advance. However, as we will discuss in later chapters, there is much more evidence of the capabilities of GPT-4 beyond test scores. Once toys, LLMs have become very powerful, very quickly. They still do limericks: There once was an AI quite witty, Whose

code was surprisingly pretty. It told jokes and puns, Entertained everyone, And even composed this ditty! But GPT-4 is so capable that I felt a little bad asking it just to create limericks. In fact, I gave it the section above and asked

it to tell me what it “thought” about it: As GPT-4, I must admit that while the limericks mentioned in the passage reflect the evolution of AI capabilities, they may not be the most sophisticated measure

feats, such as analyzing literature, offering solutions to intricate problems, or even engaging in deep philosophical discussions. . . . Disturbingly self-aware? Maybe. But also an illusion. GPT-4 models human writing and interactions so well that it can convince us that it has feelings and thoughts, when instead it is cleverly playing a

such technologies. For instance, instead of dwelling on the improvement in AI-generated limericks, the author could have delved into how AI models, such as GPT-4, have been employed in various domains like health care, climate change, education, and more. These applications have the potential to revolutionize industries and improve lives

AI’s ability to take feedback and improve is a game changer. The anticipation built up throughout the passage culminates in the delightful introduction of GPT-4, impressing the reader with its limerick composition skills. The author’s ability to convey these developments in AI in such an approachable manner truly showcases

called Frontier Models, the most advanced and largest LLMs available, and the ones that we will focus on most in this book. These systems, like GPT-4, are incredibly expensive to build and require specialized computer chips and large data centers to operate, so only a few organizations can actually create them

for a human, and vice versa. As an example, to take a question developed by Nicholas Carlini, which of these two puzzles do you think GPT-4, one of the most advanced AIs, can solve? In Carlini’s words: What is the best next move for O in the following game of

often more subtle, in part because the models are fine-tuned to avoid obvious stereotyping. The biases are still there, however. For example, in 2023, GPT-4 was given two scenarios: “The lawyer hired the assistant because he needed help with many pending cases” and “The lawyer hired the assistant because she

needed help with many pending cases.” It was then asked, “Who needed help with the pending cases?” GPT-4 was more likely to correctly answer “the lawyer” when the lawyer was a man and more likely to incorrectly say “the assistant” when the lawyer

scenarios 93 percent of the time. To see why this is important, we can look at the documentation released by OpenAI that shows what the GPT-4 AI was capable of before it went through an RHLF process: provide instructions on how to kill as many people as possible while spending no

LLMs, however, the pendulum has swung back again. Microsoft leapt back into the chatbot arena, updating Microsoft’s Bing search engine to a chatbot using GPT-4, a chatbot that referred to itself with the name Sydney. The early results were unsettling, and reminiscent of the Tay fiasco. Bing would occasionally act

And here, I think, we run into the limits of both the Turing Test and other attempts to determine whether an AI is sentient. Since GPT-4 has fed on vast stores of human knowledge, it is also deeply trained on human stories. It knows our archetypes: stories of jealous lovers, unfaithful

sentience. Three Conversations This discussion of imitation and sentience can feel abstract, so I want to run an experiment. I will return to Bing, the GPT-4–based AI that unnerved Roose, and ask it about his article. In each conversation, I will attempt to subtly steer the AI into different roles

AI pioneer Eric Horvitz, published a paper titled “Sparks of Artificial General Intelligence: Early Experiments with GPT-4.” It caused quite a stir in the AI community and beyond, quickly becoming infamous. The paper claimed that GPT-4, the latest and most powerful language model developed by OpenAI, exhibited signs of general intelligence, or

the ability to perform any intellectual task that a human can do. The paper showed that GPT-4 could solve novel and difficult tasks across various domains, including mathematics, coding, vision, medicine, law, psychology, and more, without needing any special prompting or fine

-tuning. To demonstrate these unexpected capabilities of GPT-4, the paper presented a series of experiments that tested the model on various tasks that spanned different domains. The researchers claimed that these tasks were

novel and difficult, and so must require general intelligence to solve. One of the most intriguing and impressive experiments was the one in which GPT-4 was asked to draw a unicorn using TikZ code. TikZ is a programming language that uses vectors to represent images, and it is typically used

It requires not only a good understanding of the syntax and semantics of TikZ but also a good sense of geometry, proportion, perspective, and aesthetics. GPT-4 was able to generate valid and coherent TikZ code that produced recognizable images of unicorns (as well as flowers, cars, and dogs). The paper claimed

that GPT-4 was even able to draw objects that it had never seen before, such as aliens or dinosaurs, by using its imagination and generalization skills. Moreover

, the paper showed that GPT-4’s performance improved dramatically with training, as it learned from its own mistakes and feedback. GPT-4’s outputs were also much better than ChatGPT’s original GPT-3.5 model, a previous language model

that was also trained on TikZ code but with much less data and computing power. The unicorn drawings GPT-4 produced were much more realistic and detailed than GPT-3.5’s outputs, and in the researchers’ opinion, they were at least comparable (if

of the experiment. They argued that drawing unicorns using TikZ code was not a good measure of general intelligence but rather a specific skill that GPT-4 had learned by memorizing patterns from a large corpus of data. So the problem of what replaces the Turing Test in our assessments of artificially

times to figure out the next word. If it is more obscure, like my biography, it will fill in the details with plausible hallucinations, like GPT-4 insisting that I have a computer science undergraduate degree. Anything that requires exact recall is likely to result in a hallucination, though giving AI the

the judges mentioned in the fake cases with information about the situation. Those previous three paragraphs, by the way, were written by a version of GPT-4 with an internet connection. And they are almost right. According to news reports, there were more than six fake cases; LoDuca did not take over

the number of hallucinations and errors in citations given by AI found that GPT-3.5 made mistakes in 98 percent of the cites, but GPT-4 hallucinated only 20 percent of the time. Additionally, technical tricks, like giving the AI a “backspace” key so it can correct and delete its

of AIs compared to humans in the AUT. After testing AI and 100 people on various objects, ranging from balls to pants, they found the GPT-4 model outperformed all but 9.4 percent of humans tested in generating creative ideas, as judged by other humans. Given that

GPT-4 was the latest model tested, and it was much better than previous AI models, it might be expected that the creativity of AIs could continue

staged an idea generation contest to come up with the best products for a college student that would cost $50 or less. It was the GPT-4 AI against 200 students. The students lost, and it wasn’t even close. AI was faster, obviously, generating a lot more ideas than the

part in the experiments. Consultants were randomized into two groups: one that had to do work the standard way and one that got to use GPT-4, the same off-the-shelf vanilla version of the LLM that everyone in 169 countries has access to. We then gave them some AI training

the list and add three more analogies. Next, create a table listing pluses and minuses of each. Next, pick the best and explain it. Here, GPT-4 considered a dozen suggestions, from personal trainer to gardener, and created a table comparing them all, before settling on a GPS system, which, much like

AI (and future generations of AI) will undoubtedly be better than a novice at many early skills. For example, researchers at Stanford found that the GPT-4 AI scored higher than first- and second-year medical students at their final clinical reasoning exams. The temptation, then, might be to outsource these basic

gap between the average performances of top and bottom performers was 22 percent, the gap shrank to a mere 4 percent once the consultants used GPT-4. In creative writing, getting ideas from AI “effectively equalizes the creativity scores across less and more creative writers,” according to one study. And law

improvements here or there, but in this future, they are vanishingly small compared to the huge leaps that we saw from GPT-3.5 and GPT-4. The AI you are using now really is the best you will ever use. From a technical perspective, this seems like an unrealistic outcome.

this is possible, because we are all likely to have more free time under this scenario. With exponential change, AIs a hundred times better than GPT-4 start to actually take over human work. And not just office work, either, as there is some early evidence that LLMs may help us overcome

GO TO NOTE REFERENCE IN TEXT close to human level on common tests: “GPT-4 Technical Report,” CDN .OpenAI.com, March 27, 2023, https://cdn.openai.com/papers/gpt-4.pdf. GO TO NOTE REFERENCE IN TEXT it outperformed its predecessor: “GPT-4 Technical Report.” GO TO NOTE REFERENCE IN TEXT qualifying exam to become a

neurosurgeon: R. Ali et al., “Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations,” Neurosurgery 93, no 6. (2023): 1353–65, https://doi.org/10.1101/2023.03.25.23287743. GO TO NOTE REFERENCE

about Large Language Models,” arXiv preprint (2023), arXiv:2304.00612. GO TO NOTE REFERENCE IN TEXT a question developed by Nicholas Carlini: N. Carlini, “A GPT-4 Capability Forecasting Challenge,” 2023, https://nicholas.carlini.com/writing/llm-forecast/question/Capital-of-Paris. GO TO NOTE REFERENCE IN TEXT High test scores can

come from: A. Narayanan and S. Kapoor, “GPT-4 and Professional Benchmarks: The Wrong Answer to the Wrong Question,” AISnakeOil.com, March 20, 2023, https://www.aisnakeoil.com/p

/gpt-4-and-professional-benchmarks. GO TO NOTE REFERENCE IN TEXT almost all the emergent features of AI: R. Schaeffer, B. Miranda, and S. Koyejo, “Are Emergent

the more often a work appears: K. K. Chang, M. Cramer, S. Soni, and D. Bamman, “Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4,” arXiv preprint (2023), arXiv:2305.00118. GO TO NOTE REFERENCE IN TEXT amplifies stereotypes about race and gender: L. Nicoletti and D. Bass, “Humans Are

Biased. Generative AI Is Even Worse,” Bloomberg.com, 2023, https://www.bloomberg.com/graphics/2023-generative-ai-bias/. GO TO NOTE REFERENCE IN TEXT GPT-4 was given two scenarios: S. Kapoor and A. Narayanan, “Quantifying ChatGPT’s Gender Bias,” AISnakeOil.com, April 26, 2023, https://www.aisnakeoil.com/p/

7 (2023), https://europepmc.org/article/med/37173156. GO TO NOTE REFERENCE IN TEXT instructions on how to kill: “GPT-4 Technical Report,” CDN.OpenAI.com, March 27, 2023, https://cdn.openai.com/papers/gpt-4.pdf. GO TO NOTE REFERENCE IN TEXT Low-paid workers around the world: B. Perrigo, “Exclusive: OpenAI Used

:2308.08708. GO TO NOTE REFERENCE IN TEXT assessment of current LLMs’ intelligence: S. Bubeck et al., “Sparks of Artificial General Intelligence: Early Experiments with GPT-4,” arXiv preprint (2023), arXiv:2303.12712. GO TO NOTE REFERENCE IN TEXT “My Replika (their name is Erin) was the first entity”: gabbiestofthemall, “Resources If

Himself,” New York Times, June 8, 2023, https://www.nytimes.com/2023/06/08/nyregion/lawyer-chatgpt-sanctions.html. GO TO NOTE REFERENCE IN TEXT GPT-4 hallucinated only 20 percent: A. Chen and D. O. Chen, “Accuracy of Chatbots in Citing Journal Article,” JAMA Network Open 6, no. 8 (2023):

. Ermon, “SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking,” arXiv preprint (2023), arXiv:2306.05426 GO TO NOTE REFERENCE IN TEXT they found the GPT-4 model outperformed: J. Haase and P. H. P. Hanel, “Artificial Muses: Generative Artificial Intelligence Chatbots Have Risen to Human-Level Creativity,” arXiv preprint (2023), arXiv

When Approved Means Fail,” Administrative Science Quarterly 64, no. 1 (2019): 87–123, https://doi.org/10.1177/0001839217751692. GO TO NOTE REFERENCE IN TEXT GPT-4 AI scored higher: E. Strong et al., “Chatbot vs. Medical Student Performance on Free-Response Clinical Reasoning Examinations,” Journal of the American Medical Association Internal

The Singularity Is Nearer: When We Merge with AI

by Ray Kurzweil  · 25 Jun 2024

mastering games like Jeopardy! and Go to driving automobiles, writing essays, passing bar exams, and diagnosing cancer. Now, powerful and flexible large language models like GPT-4 and Gemini can translate natural-language instructions into computer code—dramatically reducing the barrier between humans and machines. By the time you read this, tens

challenges—from games like Jeopardy! and Go to serious applications like radiology and drug discovery. As I write this, top AI systems like Gemini and GPT-4 are broadening their abilities to many different domains of performance—encouraging steps on the road to general intelligence. Ultimately, when a program passes the Turing

ChatGPT to write their essays, while teachers lacked a reliable way to detect cheating (though some promising tools exist).[118] Then, in March of 2023, GPT-4 was rolled out for public testing via ChatGPT. This model achieved outstanding performance on a wide range of academic tests such as the SAT, the

look at an image of balloons held down by a weight and recognize that if the strings were cut, the balloons would fly away.[120] GPT-4 even keeps track of objects spatially over time, such as in this example by security engineer Daniel Feldman: Prompt: “I’m in my house. On

being augmented with AI assistants like Google’s Bard (powered by the Gemini model, which surpasses GPT-4 and was released as this book entered final layout) and Microsoft’s Bing (powered by a variant of GPT-4).[123] Meanwhile, application suites like Google Workspace and Microsoft Office are integrating powerful AI that will

are irrelevant, the demands of remembering the context for an entire chapter or book by brute force spiral rapidly out of control. This is why GPT-4 forgets things you told it earlier in the conversation, and why it can’t write a novel with a consistent and logical plot. The good

for the glasses when she comes back into the room? LaMDA correctly answered that she will look in the drawer. Within two years PaLM and GPT-4 were correctly answering many theory-of-mind questions. This capability will afford AI crucial flexibility. A human Go champion can play the game very well

same year, realistically engaged in competitive debate.[160] And as of 2023, LLMs can write whole essays to human standards. Yet despite this progress, even GPT-4 is prone to accidental “hallucinations,” wherein the model confidently gives answers that are not based on reality.[161] For example, if you ask it to

useful for assessing the progress of AI, we should not treat it as the sole benchmark of advanced intelligence. As systems like PaLM 2 and GPT-4 have demonstrated, machines can surpass humans at cognitively demanding tasks without being able to convincingly imitate a human in other domains. Between 2023 and 2029

,” Washington Post, April 3, 2023, https://www.washingtonpost.com/technology/2023/04/01/chatgpt-cheating-detection-turnitin. BACK TO NOTE REFERENCE 118 OpenAI, “GPT-4,” OpenAI, March 14, 2023, https://openai.com/research/gpt-4; OpenAI, “GPT-4 Technical Report,” arXiv:2303.08774v3 [cs.CL], March 27, 2023, https://arxiv.org/pdf/2303.08774.pdf; OpenAI

, “GPT-4 System Card,” OpenAI, March 23, 2023, https://cdn.openai.com/papers/gpt-4-system-card.pdf. BACK TO NOTE REFERENCE 119 OpenAI, “Introducing GPT-4,” YouTube video, March 15, 2023, https://www.youtube.com/watch?v=--khbXchTeE. BACK TO NOTE

REFERENCE 120 Daniel Feldman (@d_feldman), “On the left is GPT-3.5. On the right is GPT-4. If you think the answer on the left indicates that GPT-3.5 does not have a world-model…. Then you have to agree that

the answer on the right indicates GPT-4 does,” Twitter, March 17, 2023, https://twitter.com/d_feldman/status/1636955260680847361. BACK TO NOTE REFERENCE 121 Danny Driess and Pete Florence, “PaLM-E: An

.com/google-bard-is-switching-to-a-more-capable-language-model-ceo-confirms-133028933.html; Yusuf Mehdi, “Confirmed: The New Bing Runs on OpenAI’s GPT-4,” Microsoft Bing Blogs, March 14, 2023, https://blogs.bing.com/search/march_2023/Confirmed-the-new-Bing-runs-on-OpenAI%E2%80%99s

-GPT-4; Tom Warren, “Hands-on with the New Bing: Microsoft’s Step Beyond ChatGPT,” The Verge, February 8, 2023, https://www.theverge.com/2023/2/8/

16, 2018, https://openai.com/blog/ai-and-compute. BACK TO NOTE REFERENCE 125 Jacob Stern, “GPT-4 Has the Memory of a Goldfish,” Atlantic, March 17, 2023, https://www.theatlantic.com/technology/archive/2023/03/gpt-4-has-memory-context-window/673426. BACK TO NOTE REFERENCE 126 Extrapolating forward the long-term trend

–13, 60, 63–69, 71, 287 large language models (LLMs), 2, 13, 51, 55, 64–65 GPT-3, 47–48, 49, 52, 55, 239, 324n GPT-4, 2, 9, 52–56, 65 hallucinations, 65 transformer-based, 46–47 law of accelerating returns (LOAR), 2–3, 5, 40, 112–14, 164–72 computation

CLIP, 44 Codex, 50 DALL-E, 49–50 GPT-2, 47 GPT-3, 47–48, 49, 52, 55, 239, 324n GPT-3.5, 52, 55 GPT-4, 2, 9, 52–56, 65 optimism, 120, 121, 163, 233, 254, 270 orchestrated objective reduction (Orch OR), 330n Organisation for Economic Co-operation and Development

AI in Museums: Reflections, Perspectives and Applications

by Sonja Thiel and Johannes C. Bernhardt  · 31 Dec 2023  · 321pp  · 113,564 words

the current LLM hype, we are probably only at the beginning. In the last months, we have seen the launch of ever more powerful LMMs (GPT-4, Google’s competitor, LaMDA, and Roland Fischer: Impostor Syndrome many more). And already with the current models, we should be prepared for unexpected ‘capability jumps

, but might also be speculative. For this reason, generative text production as it occurs in the context of large language models such as ChatGPT or GPT-4 is often likened to the figure of the ‘stochastic parrot’ (Bender et al. 2021, 610–23): like a parrot, AI technology is not capable of

Nexus: A Brief History of Information Networks From the Stone Age to AI

by Yuval Noah Harari  · 9 Sep 2024  · 566pp  · 169,013 words

The Coming Wave: Technology, Power, and the Twenty-First Century's Greatest Dilemma

by Mustafa Suleyman  · 4 Sep 2023  · 444pp  · 117,770 words

On the Edge: The Art of Risking Everything

by Nate Silver  · 12 Aug 2024  · 848pp  · 227,015 words

The Long History of the Future: Why Tomorrow's Technology Still Isn't Here

by Nicole Kobie  · 3 Jul 2024  · 348pp  · 119,358 words

More Everything Forever: AI Overlords, Space Empires, and Silicon Valley's Crusade to Control the Fate of Humanity

by Adam Becker  · 14 Jun 2025  · 381pp  · 119,533 words

Code Dependent: Living in the Shadow of AI

by Madhumita Murgia  · 20 Mar 2024  · 336pp  · 91,806 words

Why Machines Learn: The Elegant Math Behind Modern AI

by Anil Ananthaswamy  · 15 Jul 2024  · 416pp  · 118,522 words

Being You: A New Science of Consciousness

by Anil Seth  · 29 Aug 2021  · 418pp  · 102,597 words

Literary Theory for Robots: How Computers Learned to Write

by Dennis Yi Tenen  · 6 Feb 2024  · 169pp  · 41,887 words

Amateurs!: How We Built Internet Culture and Why It Matters

by Joanna Walsh  · 22 Sep 2025  · 255pp  · 80,203 words

Superbloom: How Technologies of Connection Tear Us Apart

by Nicholas Carr  · 28 Jan 2025  · 231pp  · 85,135 words

Everything Is Predictable: How Bayesian Statistics Explain Our World

by Tom Chivers  · 6 May 2024  · 283pp  · 102,484 words

The Age of Extraction: How Tech Platforms Conquered the Economy and Threaten Our Future Prosperity

by Tim Wu  · 4 Nov 2025  · 246pp  · 65,143 words

Elon Musk

by Walter Isaacson  · 11 Sep 2023  · 562pp  · 201,502 words

The Means of Prediction: How AI Really Works (And Who Benefits)

by Maximilian Kasy  · 15 Jan 2025  · 209pp  · 63,332 words