description: AI scientific article about the transformer architecture, published in June 2017
generative artificial intelligence
11 results
by Maximilian Kasy · 15 Jan 2025 · 209pp · 63,332 words
Another in View of the Evidence of Two Samples.” Biometrika 25 no. 3–4 (1933): 285–94. Vaswani, A., N. Shazeer, N. Parmar, et al. “Attention Is All You Need.” Proceedings of the 31st International Conference on Neural Information Processing Systems, December 4, 2017, 6000–6010. Yang, L., Z. Zhang, Y. Song, et al. “Diffusion
by Nate Silver · 12 Aug 2024 · 848pp · 227,015 words
seems like human-level text generation. And, you know, nobody really anticipated that.” In 2017, a group of researchers at Google published a paper called “Attention Is All You Need” that introduced something called a “transformer.” I’ll provide a more detailed description of a transformer later, but it isn’t important for now—the
…
of a mystery—this is the “bag of numbers” stage. But it just seems to work out somehow. In the famous Google paper on transformers, “Attention Is All You Need,” “attention” essentially refers to the importance of the relationships between different pairs of tokens. Once a transformer figures out these relationships, there isn’t a
…
. GO TO NOTE REFERENCE IN TEXT companies like OpenAI and Anthropic: “Google Brain Drain: Where are the Authors of ‘Attention Is All You Need’ Now?” AIChat, aichat.blog/google-exodus-where-are-the-authors-of-attention-is-all-you-need-now. GO TO NOTE REFERENCE IN TEXT Altman has tipped his hat:@SamA, https://twitter.com/sama/status/1540227243368058880?lang
…
, washingtonpost.com/technology/2023/05/16/sam-altman-open-ai-congress-hearing. GO TO NOTE REFERENCE IN TEXT “Attention Is All”: Ashish Vaswani et al., “Attention Is All You Need,” arXiv, August 1, 2023, arxiv.org/abs/1706.03762. GO TO NOTE REFERENCE IN TEXT most rapidly adopted technologies: Krystal Hu, “ChatGPT Sets Record for
by Stuart Russell and Peter Norvig · 14 Jul 2019 · 2,466pp · 668,761 words
, the highest scoring hypothesis La entrada can only generate low-probability continuations, so it “falls off the beam.” 25.4The Transformer Architecture The influential article “Attention is all you need” (Vaswani et al., 2018) introduced the transformer architecture, which uses a self-attention mechanism that can model long-distance context without a sequential dependency. 25
…
GPU performance evaluation. arXiv:1412.7580. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I.(2018). Attention is all you need. In NeurIPS 30. Veach, E. and Guibas, L. J. (1995). Optimally combining sampling techniques for Monte Carlo rendering. In Proc. 22rd Annual Conference on Computer
by Stephen Witt · 8 Apr 2025 · 260pp · 82,629 words
in the Neural Information Processing Systems journal, which had published the original AlexNet results. The paper needed a name, so Jones, channeling the Beatles, suggested “Attention Is All You Need.” This was an off-the-cuff joke that he didn’t think the team would actually use. Later, he would meet people with the sentence
by Keach Hagey · 19 May 2025 · 439pp · 125,379 words
negative without relying on humans to pre-label the data. Two months later, Sutskever read a preprint of a paper by eight Google researchers titled “Attention Is All You Need,” which he immediately recognized as presenting a method to make the kind of research Radford was doing vastly more efficient. Rather than processing one character
…
, 119 Atari, 87, 147, 165, 190 Atkins, Brian and Sabine, 142–43 Atlanta Georgi, 23 Atlantic, The (magazine), 46, 297 Atmos home building platform, 260 “Attention Is All You Need” (“the transformer paper”), 218–19, 270 Australia, 99, 266 Authors Guild, 220 Autopilot AI-assisted driving, 225 Baidu, 170 baldness, cure for, 108 Bankman-Fried
…
,” 188, 312 Goertzel, Ben, 145 Goetz, Jim, 88 Goldman Sachs, 59, 64, 150, 225 Good Ventures foundation, 212–13 Google AdSense, 243 Alphabet, 194, 271 “Attention Is All You Need” (“the transformer paper”), 218–19, 270 Bard conversational model, 271 Chrome web browser, 272 DeepMind acquisition, 146–48, 154, 165, 168–69, 171–72, 184
by Karen Hao · 19 May 2025 · 660pp · 179,531 words
TO NOTE REFERENCE IN TEXT In August 2017, that changed: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez et al., “Attention Is All You Need,” in NIPS ’17: Proceedings of the 31st International Conference on Neural Information Processing Systems (December 2017): 6000–10, dl.acm.org/doi/10.5555/3295222
by Christopher Summerfield · 11 Mar 2025 · 412pp · 122,298 words
The transformer was invented in 2017. It was first described in a preprint – a paper published online without peer review – with the slightly incongruous title ‘Attention is All You Need’.[*1] The paper didn’t make much of a splash at first. Submitted to the annual jamboree that is the Neural Information Processing Systems (NeurIPS
…
the entire input sequence in parallel – using a form of attention called self-attention to place emphasis on each item i when predicting j (hence: ‘Attention is All You Need’). To understand why self-attention is so useful in language, consider the problem of completing the following two prompts: As I approached the ancient tree
…
Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks’, arXiv. Available at https://arxiv.org/pdf/2302.08399. Vaswani, A. et al. (2017), ‘Attention Is All You Need’. Preprint. arXiv. Available at http://arxiv.org/abs/1706.03762 (accessed 30 October 2020). Verzijden, M. N. et al. (2015), ‘Male Drosophila melanogaster Learn to
by Ray Kurzweil · 25 Jun 2024
the original technical paper, see Giuliano Giacaglia, “How Transformers Work,” Towards Data Science, March 10, 2019, https://towardsdatascience.com/transformers-141e32e69591; Ashish Vaswani et al., “Attention Is All You Need,” arXiv:1706.03762v5 [cs.CL], December 6, 2017, https://arxiv.org/pdf/1706.03762.pdf. BACK TO NOTE REFERENCE 95 Irene Solaiman et al., “GPT
…
master crossword puzzles better than most human solvers, was Noam Shazeer. He went on to work at Google, where he was a lead author of “Attention Is All You Need,” the paper that invented the transformer architecture for large language models that has powered the latest AI revolution. See Duke University, “Duke Researchers Pit Computer
…
Against Human Crossword Puzzle Players,” ScienceDaily, April 20, 1999, https://www.sciencedaily.com/releases/1999/04/990420064821.htm; Vaswani et al., “Attention Is All You Need.” BACK TO NOTE REFERENCE 156 For a representative video clip from the matches and analyses of Watson and the competition, see OReilly, “Jeopardy! IBM Challenge
by Parmy Olson · 284pp · 96,087 words
,” he said aloud at one point. Jones looked up from his desk, nearby. “I’m not very good with titles,” he replied. “But how about ‘Attention is all you need’?” It was a random thought that had popped into his head, and Vaswani didn’t say anything in agreement. In fact, he got up and
…
walked away, Jones recalls. But later, the title “Attention Is All You Need” landed on the front page of their paper, a perfect summary of what they’d discovered. When you used a transformer, your AI system could
…
, Shazeer left Google in 2021 to pursue his research on large language models independently, cofounding a chatbot company called Character.ai. By that time, the “Attention Is All You Need” paper had become one of the most popular research works of all time in the field of AI. Typically, a research paper on AI might
…
Understanding.” blog.research.google, August 31, 2017. Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. “Attention Is All You Need.” Advances in Neural Information Processing Systems 30 (2017). Chapter 10: Size Matters Brockman, Greg (@gdb). “Held our civil ceremony in the @OpenAI office last week
by Madhumita Murgia · 20 Mar 2024 · 336pp · 91,806 words
as the ‘transformer’. The eight research scientists who eventually played a part in its creation described it in a short paper with a snappy title: ‘Attention Is All You Need’.2 One of the authors, Llion Jones, who grew up in a tiny Welsh village, says the title was a nod to the Beatles song
…
called GPT – the Generative Pre-trained Transformer – software that could produce text-based answers in response to human queries. One of the authors of the ‘Attention Is All You Need’ paper, Lukasz Kaiser, had ended up working there and helping to build it. It was an impressive piece of technology but until November in 2022
…
Pioneered an AI Revolution’, The Financial Times, July 23, 2023, https://www.ft.com/content/37bb01af-ee46-4483-982f-ef3921436a50. 2 A. Vaswani et al., ‘Attention Is All You Need’, Arxiv, June 12, 2017, https://arxiv.org/abs/1706.03762. 3 M. Murgia, ‘OpenAI’s Mira Murati: The Woman Charged with Pushing Generative AI into
…
healthcare and see healthcare policing and see policing regulation of see regulation teenage pregnancy and see pregnancy, teenage term ref1 Aslam, Yaseen ref1 AstraZeneca ref1 ‘Attention Is All You Need’ (paper) ref1, ref2 automation ref1, ref2, ref3, ref4 Ayup, Abduweli ref1 Azure software ref1, ref2 Bandura, Albert ref1, ref2 Bard ref1 Bayyah, Sheikh bin ref1
by Ethan Mollick · 2 Apr 2024 · 189pp · 58,076 words