Attention Is All You Need

back to index

description: AI scientific article about the transformer architecture, published in June 2017

10 results

pages: 336 words: 91,806

Code Dependent: Living in the Shadow of AI
by Madhumita Murgia
Published 20 Mar 2024

They started playing around with some early prototypes on English–German translations, and found it worked. Their work formalized a months-long collaboration in 2017 that eventually produced a software for processing language, known simply as the ‘transformer’. The eight research scientists who eventually played a part in its creation described it in a short paper with a snappy title: ‘Attention Is All You Need’.2 One of the authors, Llion Jones, who grew up in a tiny Welsh village, says the title was a nod to the Beatles song ‘All You Need Is Love’. The paper was first published in June 2017, and it kick-started an entirely new era of artificial intelligence: the rise of generative AI. The genesis of the transformer and the story of its creators helps to account for how we got to this moment in artificial intelligence: an inflection point, comparable to our transition to the web or to smartphones, that has seeded a new generation of entrepreneurs building AI-powered consumer products for the masses.

OpenAI took a hefty investment of more than $10bn from Microsoft and converted itself into what was, for all intents and purposes, a for-profit enterprise that sold AI technologies to large corporations and governments around the world.4 OpenAI’s crown jewel was an algorithm called GPT – the Generative Pre-trained Transformer – software that could produce text-based answers in response to human queries. One of the authors of the ‘Attention Is All You Need’ paper, Lukasz Kaiser, had ended up working there and helping to build it. It was an impressive piece of technology but until November in 2022 it was small-scale, clunky and mostly in the hands of tech-savvy programmers. To have invented a computer program that could employ our own language to communicate directly with us was quite a feat.

Chang Chien, ‘How China’s Police Used Phones and Faces to Track Protesters’, The New York Times, December 4, 2022, https://www.nytimes.com/2022/12/02/business/china-protests-surveillance.html. CHAPTER 10: YOUR SOCIETY 1 M. Murgia, ‘Transformers: The Google Scientists Who Pioneered an AI Revolution’, The Financial Times, July 23, 2023, https://www.ft.com/content/37bb01af-ee46-4483-982f-ef3921436a50. 2 A. Vaswani et al., ‘Attention Is All You Need’, Arxiv, June 12, 2017, https://arxiv.org/abs/1706.03762. 3 M. Murgia, ‘OpenAI’s Mira Murati: The Woman Charged with Pushing Generative AI into the Real World’, The Financial Times, June 18, 2023, https://www.ft.com/content/73f9686e-12cd-47bc-aa6e-52054708b3b3. 4 R. Waters and T. Kinder, ‘Microsoft’s $10bn Bet on ChatGPT Developer Marks New Era of AI’, The Financial Times, January 16, 2023, https://www.ft.com/content/a6d71785-b994-48d8-8af2-a07d24f661c5. 5 M.

pages: 284 words: 96,087

Supremacy: AI, ChatGPT, and the Race That Will Change the World
by Parmy Olson

Vaswani, who was the lead author, slept on a nearby couch overnight. “We need a title,” he said aloud at one point. Jones looked up from his desk, nearby. “I’m not very good with titles,” he replied. “But how about ‘Attention is all you need’?” It was a random thought that had popped into his head, and Vaswani didn’t say anything in agreement. In fact, he got up and walked away, Jones recalls. But later, the title “Attention Is All You Need” landed on the front page of their paper, a perfect summary of what they’d discovered. When you used a transformer, your AI system could pay attention to large amounts of data at the same time and do far more with it.

Pichai appeared to dismiss the comment and chalk Shazeer up to one of Google’s more eccentric researchers. By all means look into it, he said. Frustrated, Shazeer left Google in 2021 to pursue his research on large language models independently, cofounding a chatbot company called Character.ai. By that time, the “Attention Is All You Need” paper had become one of the most popular research works of all time in the field of AI. Typically, a research paper on AI might receive a few dozen citations over its lifetime if its authors are lucky. But the transformer paper made such a splash among scientists that it was cited more than eighty thousand times.

Bloomberg, October 17, 2021. Uszkoreit, Jakob. “Transformer: A Novel Neural Network Architecture for Language Understanding.” blog.research.google, August 31, 2017. Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. “Attention Is All You Need.” Advances in Neural Information Processing Systems 30 (2017). Chapter 10: Size Matters Brockman, Greg (@gdb). “Held our civil ceremony in the @OpenAI office last week. Officiated by @ilyasut, with the robot hand serving as ring bearer. Wedding planning to commence soon.” Twitter, November 12, 2019, 9:39 a.m. https://twitter.com/gdb/status/1194293590979014657?

pages: 848 words: 227,015

On the Edge: The Art of Risking Everything
by Nate Silver
Published 12 Aug 2024

GO TO NOTE REFERENCE IN TEXT Post describes him: Nitasha Tiku, “OpenAI Leaders Warned of Abusive Behavior before Sam Altman’s Ouster,” The Washington Post, December 8, 2023, washingtonpost.com/technology/2023/12/08/open-ai-sam-altman-complaints. GO TO NOTE REFERENCE IN TEXT companies like OpenAI and Anthropic: “Google Brain Drain: Where are the Authors of ‘Attention Is All You Need’ Now?” AIChat, aichat.blog/google-exodus-where-are-the-authors-of-attention-is-all-you-need-now. GO TO NOTE REFERENCE IN TEXT Altman has tipped his hat:@SamA, https://twitter.com/sama/status/1540227243368058880?lang=en. GO TO NOTE REFERENCE IN TEXT thinks the “schtick”: roon (@tszzl), “e/acc’s are both dangerous and cringe and cribbed half my schtick. disavow!

“Even last year, what large language models were doing was kind of babbling and not very interesting,” he said when we spoke in 2023. “And then suddenly this threshold was passed, where, gosh, it seems like human-level text generation. And, you know, nobody really anticipated that.” In 2017, a group of researchers at Google published a paper called “Attention Is All You Need” that introduced something called a “transformer.” I’ll provide a more detailed description of a transformer later, but it isn’t important for now—the intuition is just that it parses a sentence all at once instead of sequentially. (So, for example, in the sentence “Alice came over for dinner, but unlike Bob, she forgot to bring wine,” it figures out that it’s Alice and not Bob who forgot the wine.

On its own, the tell is not very meaningful, but in the context of other semantic information (the player is breathing heavily and avoiding eye contact) it might be. This part of the process, as ChatGPT says, is hidden from view. Exactly how the transformer makes these inferences is something of a mystery—this is the “bag of numbers” stage. But it just seems to work out somehow. In the famous Google paper on transformers, “Attention Is All You Need,” “attention” essentially refers to the importance of the relationships between different pairs of tokens. Once a transformer figures out these relationships, there isn’t a whole lot else it needs to do. For instance, the tokens “Alice” and “Bob” have an important relationship that the transformer will pay more attention to.

pages: 412 words: 122,298

These Strange New Minds: How AI Learned to Talk and What It Means
by Christopher Summerfield
Published 11 Mar 2025

*6 See here for more samples: https://cs.stanford.edu/people/karpathy/char-rnn/shakespear.txt. 13. Robots in Disguise The transformer was invented in 2017. It was first described in a preprint – a paper published online without peer review – with the slightly incongruous title ‘Attention is All You Need’.[*1] The paper didn’t make much of a splash at first. Submitted to the annual jamboree that is the Neural Information Processing Systems (NeurIPS) conference, held that year in Long Beach, California, it wasn’t even chosen for an oral presentation, an accolade reserved for the top-rated submissions.

But the transformer pushes the concept of attention to the limit. The algorithm described in the 2017 paper dispenses with the RNN entirely, deploying instead a neural network that processes the entire input sequence in parallel – using a form of attention called self-attention to place emphasis on each item i when predicting j (hence: ‘Attention is All You Need’). To understand why self-attention is so useful in language, consider the problem of completing the following two prompts: As I approached the ancient tree, Naeema said that its bark was _______ As I bent down to stroke the dog, Naeema said that its bark was _______ In English, the word bark is polysemic – it has more than one meaning.

S. et al. (2008), ‘Confabulation: Damage to a Specific Inferior Medial Prefrontal System’, Cortex, 44(6), pp. 637–48. Available at https://doi.org/10.1016/j.cortex.2007.01.002. Ullman, T. (2023), ‘Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks’, arXiv. Available at https://arxiv.org/pdf/2302.08399. Vaswani, A. et al. (2017), ‘Attention Is All You Need’. Preprint. arXiv. Available at http://arxiv.org/abs/1706.03762 (accessed 30 October 2020). Verzijden, M. N. et al. (2015), ‘Male Drosophila melanogaster Learn to Prefer an Arbitrary Trait Associated with Female Mating Status’, Current Zoology, 61(6), pp. 1036–42. Available at https://doi.org/10.1093/czoolo/61.6.1036.

pages: 439 words: 125,379

The Optimist: Sam Altman, OpenAI, and the Race to Invent the Future
by Keach Hagey
Published 19 May 2025

“It was a complete surprise,” Radford told Wired.5 In April 2017, Radford, Sutskever, and another OpenAI researcher named Rafał Józefowicz published a paper on what they called the sentiment neuron, which could understand whether statements were positive or negative without relying on humans to pre-label the data. Two months later, Sutskever read a preprint of a paper by eight Google researchers titled “Attention Is All You Need,” which he immediately recognized as presenting a method to make the kind of research Radford was doing vastly more efficient. Rather than processing one character at a time, the Google paper showed how a model could process large chunks of text in parallel, using an insightful application of a technique called the “attention mechanism” to dynamically assign importance to key parts of that text input.

Louis, 37–38, 40 second marriage of, 33, 35–37, 41–42, 201–2 Altman, Max (Sam’s brother), 36–38, 42, 136, 139, 156, 201, 250, 295, 311, 312 Altman, Minnie, 23 Altman, Reba, 23–24 Altman, Richard, 25 Altman, Sam, 1–17, 311–15 Annie’s allegations against, 261–62 Artix, 68 belief we are living in a simulation created by, 17 the “blip” and beyond, 7, 14–15, 276–94, 295–10, 311–15 blogging, 148, 152 claim of getting scurvy, 77 deal with Boost mobile network, 80, 99–105 death of his father Jerry, 15, 226–28, 233, 248–50, 302 as dishonest and other complaints, 108–9, 266, 283–84 dropping out of Stanford, 85–86, 91–93 early life and family, 15–16 enter AI, 140–47, 163–66, 169–72 enter China, 230–31, 267, 314 enter Covid-19, 246, 248–50 enter Elon Musk, 16, 147, 170–72, 183, 194, 214–15 enter Greg Brockman, 172–78, 183–88 enter Helen Toner, 241–42, 266–67, 276–79, 282–88, 291, 292, 299, 309 enter Ilya Sutskever, 176–82 enter Microsoft, 229–33 enter Mira Murati, 225–26, 242–43, 246–48, 268, 282–93, 304–5, 307–8 enter Peter Thiel, 1–7, 10, 16–17, 125, 134–48 enter Saudi Arabia, 231–32 enter Steve Jobs, 61, 85, 87, 90, 110–16 “good meeting-itis,” 107 ham radios, 32–33, 48 as head of Y Combinator, 149–63 Helion Energy nuclear fusion startup, 13, 136, 207, 259, 280, 298 his cars, 161, 259–60 his “fashion sense,” 49, 115, 126 his first computer, 44 his first product launch, 101–3 his first term sheet (Radiate and NEA), 85–86, 88–93, 95–96 his public altruism, his private enrichment, 89, 234, 254–61 his twee all-lowercase writing, 102, 269, 292 his wide-ranging curiosity, 149, 176 house in Big Sur, 134, 171, 260 at the John Burroughs School, 15–16, 45–48, 49–53, 202, 227 as a kind of doomsday prepper, 154 Loopt, 16, 96, 101–17, 118–23, 126–28, 133, 138, 150–51, 159, 216, 229–31 love of cars, 26, 51 Mac LC II personal computer, 44, 315 “mercantile spirit,” 16, 186 as a messy eater, 49–50 “Moore’s Law for Everything,” 255 networking talents, 8, 59, 123–24, 274 at OpenAI’s founding, 182–88, 234 optimism as his brand, 1–4, 15, 17, 30, 101, 103, 128, 136–37, 141, 152, 154, 200 pink and green polo shirt “douchebag” badge, 115, 126 political ambitions of, 13, 200, 206, 223, 273–74, 296 the promises of AI, 254–56 Radiate (formerly Viendo), 76, 77–82, 85, 88–90, 93–94, 96–101 from Radiate to Loopt, 103–9 recurring dream of, 92 relationship with Lachy Groom, 151–52, 154, 157 relationship with Matt Krisiloff, 206–7, 275 relationship with Nathan Watters, 50–51, 201 relationship with Nick Donahue, 260–61 relationship with Nick Sivo, 55–56, 60, 64, 85, 89, 91–92, 102–3, 113–14, 128 relationship with Oliver Mulherin, 275, 288, 295, 311 religion and spirituality of, 15–17, 24, 31, 43–44, 134, 227 sexual orientation of, 16, 33, 48–53, 201–3, 295 “shiny object syndrome,” 120–21 as a Stanford undergrad, 16, 55–61 travels in Europe, Southeast Asia, and Japan, 134 in Y Combinator’s first Summer Founders Program, 62–66, 75–82 see also OpenAI Altman, Sol, 23, 24 Altman, Sunny, 23 Amazon, 82, 90, 218, 247 Amazon Web Services, 111, 119, 187, 216 Ambrosia, 258–59 America Online (AOL), 16, 49–50, 76, 162, 229 American Lung Association, 135 American Millinery Company, 31 Amodei, Daniela, 213 Amodei, Dario, 169–70, 178, 184, 209, 213, 226, 241–43, 251–53, 273, 310 Andreessen Horowitz, 155, 158, 161, 257 Andregg, William, 258 angel investing, 70–74, 81, 123, 138, 150–52, 309 Anthropic, 2, 9, 145, 252, 266, 268, 271, 306, 309, 310 Claude the chatbot, 268, 278, 285 Long-Term Benefit Corporation, 277, 299 antisemitism in AI, 270 Anybots, 94 AP Bio exam, 3, 267–68 AP Computer Science, 47 Apple App Store, 112 early personal computers, 44, 86, 117, 195, 315 iPads, 197 iPhones, 110–16, 118, 119 Loopt and, 112–14, 116 Safari browser, 111 on the stock exchange, 58 Vivarium Project, 209–10 the “walled garden” of, 110 Worldwide Developers Conference, 111, 114 application programming interfaces (APIs), 125, 245–48, 250–51, 264–65, 269, 279 Arment, Marco, 203 Arnold & Porter law firm, 291 Ars Technica, 242 artificial general intelligence (AGI), see AI (artificial intelligence) Artix, 68 Asana workplace management software, 212 Asilomar AI Principles, 208, 211, 213, 233–34 Asimov, Isaac, 47 Assange, Julian, 142 AT&T, 62, 90, 100, 103, 104, 110, 118, 119 Atari, 87, 147, 165, 190 Atkins, Brian and Sabine, 142–43 Atlanta Georgi, 23 Atlantic, The (magazine), 46, 297 Atmos home building platform, 260 “Attention Is All You Need” (“the transformer paper”), 218–19, 270 Australia, 99, 266 Authors Guild, 220 Autopilot AI-assisted driving, 225 Baidu, 170 baldness, cure for, 108 Bankman-Fried, Sam, 212, 257, 276 bankruptcies, 31, 87–88, 145, 205 bar exam, AI passing the, 3, 272 Bard conversational model, 271 “bed nets” era of effective altruism, 212 Beginning of Infinity, The (Deutsch), 314–15 Bell Labs, 140, 185 Bell, Tom (aka T.

Louis, 26, 53 Delicious (website), 75 Deming, Peter, 59–60, 64 democracy, 152–53, 225 Democratic Party, 21, 204, 207–8, 234, 296 Deployment Safety Board (DSB), 279–80, 287 Deshpande, Alok, 58–61, 64, 76, 85, 89, 102, 111, 127 Deshpande, Sheila, 58, 91 Desmond-Hellmann, Sue, 303 Deutsch, David, 314–15 Dewey, John, 45 deworming charities, 211, 213 Diamandis, Peter, 144, 210 diffusion model trained by adding digital “noise,” 262–63 Digital Chocolate mobile gaming company, 97 Diller, Barry, 229 Ditton, Andy, 34–35 DJ Kay Slay, 102 DJI drones, 231 DLA Piper lobbying firm, 300 DNNresearch, 182 Doblet phone-charging startup, 149, 177 Dodgeball location-based app, 104, 105, 118 Donahue, Nick, 260–61 Dota 2 (video game), 215–18, 221–22, 242, 284 dot-com bubble of the late 1990s, 73, 87, 93 Dowling, Steve, 287 Dropbox, 123, 139, 150, 152, 157, 158–59, 247 drop-outs, tech bros as mostly, 62, 88, 91–92, 96, 108, 113, 124, 132, 175, 179, 217–18 Dubai, 274 eBay, 82, 168 economic growth democracy and, 152–53, 225 dot-com bubble of the late 1990s, 73, 87, 93 financial crisis of 2008, 4, 116–17, 131, 137, 150 as a kind of spiritual hack, 153 real wages, 132 Edge web browser, 272 Edison, Thomas, 46 effective altruism, 2, 4, 6 189–214, 224, 265–66, 276, 300 “bed nets” era of, 212 deworming charities, 211, 213 GreaterWrong forum, 301 Open Philanthropy Project, 213–14, 241, 266–67, 276–77, 299–301 see also AI safety Effective Ventures, 276 Efficient Market Hypothesis, 144 Eichler, Mike, 30, 40 80,000 Hours, 211–12 Elbaz, Gil, 243 Electronic Arts, 97 Electronic Frontier Foundation (EFF), 59, 98 energy production, 13, 134, 231, 280, 313–14 nuclear energy, 12–13, 57, 108, 134–36, 154, 177, 205, 230, 259, 280 solar energy, 28 see also compute Enigma machine, 139 Enlightenment, 314–15 Enron, 60 esports, 215–16 Evans, Jon, 142 Extropians, 141–42, 144–45, 164–65, 199, 214 Fabolous (DJ), 102 Facebook, 60, 63, 73, 79–80, 96, 101, 116–17, 123, 127, 131, 137, 161, 169, 184, 188, 204, 207–8, 229 fact-checking, 265 Fairchild Semiconductor, 72, 87 fake news, 265 Federal Communications Commission (FCC), 59, 106 Federal Trade Commission, 285 Feldman, Ellen, 173 Fellow Robots robotics company, 210 Fermi’s paradox, 170 “few shot” learning, 244 “Feynman method of being a genius,” 219 Filan, Daniel, 305 financial crisis of 2008, 4, 116–17, 131, 137, 150 “finding product-market fit,” 107, 156 “finding your tribe,” 157 Finney, Hal, 142 Firefox web browser, 63 Flexport logistics platform, 138 Florida, 40 “Flowers for Algernon” (Keyes), 140 Foo Camp, 63 Forbes (magazine), 257 Foreign Affairs (journal), 267 Foresight Institute, a technology think tank, 144 Forstall, Scott, 112, 114, 116 “Founder Mode” (Graham), 263 founders as kings, 6, 60, 65, 68, 70–75, 95 Founders at Work (Livingston), 63, 64 Founders Fund venture firm, 2, 6, 132, 139, 147, 226 Foursquare, 118–20, 126 Francis, Peggy, 32, 36 “Free Oceana,” 141 Freemasonry, 31 Fridman, Lex, 17, 304 Friend, Tad, 201 From Zero to One (Thiel and Masters), 132, 230 frontier in American history, 153 FroSoCo (short for Freshman Sophomore College), 55, 57 FTX crypto market, 212, 257 Furstenberg, Diane von, 229 Future of Humanity Institute, 5, 165–66, 241–42, 267 Future of Life conference, Puerto Rico (2015), 167–70, 207, 211 Future of Life Institute, 145, 168, 208–9, 272 Galef, Julia, 225 Galois, Évariste, 174 game theory, 166, 285 Gates, Bill, 65, 90, 212, 216, 267–68 Gates, Melinda, 212 Gauss, Carl Friedrich, 174 Gawker Media, 137, 204–5 gay marriage rights, 56, 296 Gay Straight Alliance, 52 Gaza, war in, 289 GE (General Electric), 210 Gebbia, Joe, 263 Gebru, Timnit, 252–53, 270–71 Gemini AI model, 307 general artificial intelligence (AGI), see AI (artificial intelligence) generative AI, 1, 3, 9, 219, 221, 270 generative pre-trained transformers (GPTs), 3, 221, see also various GPTs under OpenAI genius, human, 77, 81, 127, 140, 156, 219 “Gentle Seduction, The” (Stiegler), 199–200 George, Henry, 256 Georgia, 23–25, 58, 104, 142 GeoSim 3D modeling company, 209–10 Germany, 274 Gibney, Bruce, 132–33 Gibstine, Connie (Sam’s mother), 15, 31–38, 39–50, 53–54, 91, 201–2, 227–28, 248–50, 261, 295, 312 Gil, Elad, 136 Gillette, 62, 71 Ginsberg, Allen, 166 Girard, René, 131 Github Copilot, 262 Gittell, Ross, 35 GiveWell, 212–13, 266 Giving Pledge, 212 Giving What We Can, 211 global positioning system (GPS) chips in mobile phones, 57–58, 99 Gmail, 123, 150, 286–87 Go (game), 191–92, 216–17 “godfathers of AI,” 188, 312 Goertzel, Ben, 145 Goetz, Jim, 88 Goldman Sachs, 59, 64, 150, 225 Good Ventures foundation, 212–13 Google AdSense, 243 Alphabet, 194, 271 “Attention Is All You Need” (“the transformer paper”), 218–19, 270 Bard conversational model, 271 Chrome web browser, 272 DeepMind acquisition, 146–48, 154, 165, 168–69, 171–72, 184, 189–94, 208, 211, 217, 221, 270 Dodgeball location-based app acquisition, 104, 105, 118, 104 Gemini AI model, 307 Gmail, 123, 150, 286–87 recent initial public offering (IPO) of, 87 YouTube acquisition, 93–94 see also Anthropic Google assistant, 271 Google Brain, 82, 169, 184, 243, 270 Google Colab Notebook, 247 Google I/O annual developer conference, 307 Google Maps, 59 Google Search, 270 Gordon-Levitt, Joseph, 210, 225 GotNews (website), 204 GPTs (generative pre-trained transformers), see chatbots; various GPTs under OpenAI Graham, Paul, 3, 13–16, 62–65, 67–76, 81–82, 94, 136, 149, 151, 186, 263 “A Unified Theory of VC Suckage,” 72–73 “Collison installation,” 246 “Founder Mode,” 263 “How to Start a Startup,” 69–70 painting’s influence on, 67–69 see also Y Combinator (YC) graphical user interfaces, 195 graphics processing units (GPUs), 176, 181–82, 219, 247, 255 Gras, Mike, 249 GreaterWrong forum, 301 Green Dot prepaid debit card company, 126–27, 133 Green-Lowe, Jason, 301–2 Grey, Aubrey de, 144, 259 Groom, Lachy, 151–52, 154, 157 Gross, Daniel, 309 Groupon, 119, 116 Guardian, The (newspaper), 220 Gurevich, Mikhail, 77, 82 Gurson, Doktor, 149, 177 Hacker News message board, 132, 151 hackers/hacking, 3, 57, 63, 68–70, 160n, 162 Haffner & Gibstine Real Estate, 32 Halcyon Molecular, 257–59 Hall, Ed, 26 Halo 3 (video game), 95, 109 ham radios, 32–33, 48 Hanson, Robin, 141–42, 144 Harder, Josh, 207 “hardware strategy, the,” 226, see also compute Harris, Kamala, 273, 313 Harris, Sylvia, 24–25 Hartford Courant (newspaper), 28 Hartford, CT, 28–29 Hartford Institute of Criminal and Social Justice, 28 Hartmann, Frank, 28–29 Harvard Computer Society, 69 Hassabis, Demis, 145–47, 169, 171–72, 192, 209 Hassenfeld, Elie, 212 Hawaii, 227, 250, 260, 295, 311–12 Hawking, Stephen, 169, 211 Hawkins, Trip, 97 HBO’s Silicon Valley, 101, 258–59 HBO’s Westworld, 199 heads, frozen, 141 Health Extension Foundation, 258 Helion Energy nuclear fusion startup, 13, 136, 207, 259, 280, 298 Helo telepresence robot, 210 Her (film), 307 Herzog, Isaac, 274 Hill, Daniel, 165 Hilton, Jacob, 265 Hinduism, 17 Hinton, Geoff, 178–84, 188, 266, 312–13 hip-hop, 102 Hipmunk travel search company, 162 HIV/AIDS, 33, 43, 49 Hoffman, Reid, 109, 172, 223, 231, 234–35, 277, 296 Hogan, Hulk, 205 Holocaust, attempted analogies to the, 135 Homejoy, 195 Horizon Institute for Public Service, 301 housing, see affordable housing Houston, Drew, 157 “How to Start a Startup” (Graham), 69–70 Howard, James, 110–11 Howl (Ginsberg), 166 Huffman, Steve, 69–70, 75, 76–78, 81, 162–63 Hui, Fan, 192 Human Advancement Research Community (HARC), 195, 197–98, 236 humans in a democratic society, 135 human genius, 77, 81, 127, 140, 156, 219 human-computer interfaces, 210 learning to truly produce knowledge, 314 neuroscience and studying the human brain, 145–48, 202 “only good for hugs or sex,” 269 their brains as the original neural nets, 181 see also chatbots Hunnewell, H.

The Singularity Is Nearer: When We Merge with AI
by Ray Kurzweil
Published 25 Jun 2024

BACK TO NOTE REFERENCE 155 Notably, one of the doctoral students who designed Proverb, the first AI to master crossword puzzles better than most human solvers, was Noam Shazeer. He went on to work at Google, where he was a lead author of “Attention Is All You Need,” the paper that invented the transformer architecture for large language models that has powered the latest AI revolution. See Duke University, “Duke Researchers Pit Computer Against Human Crossword Puzzle Players,” ScienceDaily, April 20, 1999, https://www.sciencedaily.com/releases/1999/04/990420064821.htm; Vaswani et al., “Attention Is All You Need.” BACK TO NOTE REFERENCE 156 For a representative video clip from the matches and analyses of Watson and the competition, see OReilly, “Jeopardy!

BACK TO NOTE REFERENCE 94 For a more detailed explainer on how transformers work, and the original technical paper, see Giuliano Giacaglia, “How Transformers Work,” Towards Data Science, March 10, 2019, https://towardsdatascience.com/transformers-141e32e69591; Ashish Vaswani et al., “Attention Is All You Need,” arXiv:1706.03762v5 [cs.CL], December 6, 2017, https://arxiv.org/pdf/1706.03762.pdf. BACK TO NOTE REFERENCE 95 Irene Solaiman et al., “GPT-2: 1.5B Release,” OpenAI, November 5, 2019, https://openai.com/blog/gpt-2-1-5b-release. BACK TO NOTE REFERENCE 96 Tom B.

pages: 189 words: 58,076

Co-Intelligence: Living and Working With AI
by Ethan Mollick
Published 2 Apr 2024

And, most important, most AI models were also limited in their ability to understand and generate text in a coherent and context-aware manner. Thus, while these uses of AI are still important today, they were not something most people directly saw or noticed in their daily lives. But among the many papers on different forms of AI being published by industry and academic experts, one stood out, a paper with the catchy title “Attention Is All You Need.” Published by Google researchers in 2017, this paper introduced a significant shift in the world of AI, particularly in how computers understand and process human language. This paper proposed a new architecture, called the Transformer, that could be used to help a computer better process how humans communicate.

pages: 260 words: 82,629

The Thinking Machine: Jensen Huang, Nvidia, and the World's Most Coveted Microchip
by Stephen Witt
Published 8 Apr 2025

This elegant architecture, designed to do the simplest thing conceivable—just take one step at a time—was like a skeleton key for AI. In 2017 the team published its results in the Neural Information Processing Systems journal, which had published the original AlexNet results. The paper needed a name, so Jones, channeling the Beatles, suggested “Attention Is All You Need.” This was an off-the-cuff joke that he didn’t think the team would actually use. Later, he would meet people with the sentence tattooed on their arms. In July 2017, shortly before publishing their results, Shazeer and team member Lukasz Kaiser tried an experiment. Rather than ask the transformer machine to translate preexisting texts, they asked it to ingest a corpus of millions of Wikipedia articles, then generate new text based on what it had read.

pages: 660 words: 179,531

Empire of AI: Dreams and Nightmares in Sam Altman's OpenAI
by Karen Hao
Published 19 May 2025

One of the applications I’m most eagerly awaiting.,” Twitter (now X), September 27, 2023, x.com/ilyasut/status/1707027536150929689. GO TO NOTE REFERENCE IN TEXT Sutskever would get up: A photo of Sutskever at the event. GO TO NOTE REFERENCE IN TEXT In August 2017, that changed: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez et al., “Attention Is All You Need,” in NIPS ’17: Proceedings of the 31st International Conference on Neural Information Processing Systems (December 2017): 6000–10, dl.acm.org/doi/10.5555/3295222.3295349. GO TO NOTE REFERENCE IN TEXT But Sutskever, who had focused: Sutskever’s PhD thesis work focused on recurrent neural networks.

pages: 2,466 words: 668,761

Artificial Intelligence: A Modern Approach
by Stuart Russell and Peter Norvig
Published 14 Jul 2019

The score of each word is the log-probability generated by the target RNN softmax, and the score of each hypothesis is the sum of the word scores. At timestep 3, the highest scoring hypothesis La entrada can only generate low-probability continuations, so it “falls off the beam.” 25.4The Transformer Architecture The influential article “Attention is all you need” (Vaswani et al., 2018) introduced the transformer architecture, which uses a self-attention mechanism that can model long-distance context without a sequential dependency. 25.4.1Self-attention Previously, in sequence-to-sequence models, attention was applied from the target RNN to the source RNN.

Vasilache, N., Johnson, J., Mathieu, M., Chintala, S., Piantino, S., and LeCun, Y. (2014). Fast convolutional nets with fbfft: A GPU performance evaluation. arXiv:1412.7580. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I.(2018). Attention is all you need. In NeurIPS 30. Veach, E. and Guibas, L. J. (1995). Optimally combining sampling techniques for Monte Carlo rendering. In Proc. 22rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH). Venkatesh, S. (2012). The Theory of Probability: Explorations and Applications.