The Ethical Algorithm: The Science of Socially Aware Algorithm Design
by Michael Kearns and Aaron Roth
Published 3 Oct 2019

In this way we turn playing backgammon into a machine learning problem, with the requisite data being provided by simulated self-play. This simple but brilliant idea was first successfully applied by Gerry Tesauro of IBM Research in 1992, whose self-trained TD-Gammon program achieved a level of play comparable to the best humans in the world. (The “TD” stands for “temporal difference,” a technical term referring to the complication that in games such as backgammon, feedback is delayed: you do not receive feedback about whether each individual move was good or bad, only whether you won the entire game or not.) In the intervening decades, simulated self-play has proven to be a powerful algorithmic technique for designing champion-level programs for a variety of games, including quite recently for Atari video games, and for the notoriously difficult and ancient game of Go.

In the intervening decades, simulated self-play has proven to be a powerful algorithmic technique for designing champion-level programs for a variety of games, including quite recently for Atari video games, and for the notoriously difficult and ancient game of Go. It might seem natural that internal self-play is an effective design principle if your goal is to actually create a good game-playing program. A more surprising recent development is the use of self-play in algorithms whose outward goal has nothing to do with games at all. Consider the challenge of designing a computer program that can generate realistic but synthetic images of cats. (We’ll return shortly to the question of why one might be interested in this goal, short of simply being a cat fanatic.)

A rather different and more recent use of game theory is for the internal design of algorithms, rather than for managing the preferences of an external population of users. In these settings, there is no actual human game, such as commuting, that the algorithm is helping to solve. Rather, a game is played in the “mind” of the algorithm, for its own purposes. An early example of this idea is self-play in machine learning for board games. Consider the problem of designing the best backgammon-playing computer program that you can. One approach would be to think very hard about backgammon strategy, probabilities, and the like, and hand-code rules dictating which move to make in any given board configuration.

Rust Programming by Example
by Guillaume Gomez and Antoni Boucher
Published 11 Jan 2018

{ #[name="window"] gtk::Window { title: "Rusic", gtk::Box { orientation: Vertical, #[name="toolbar"] gtk::Toolbar { // … gtk::ToolButton { icon_widget: &new_icon("gtk-media-previous"), clicked => playlist@PreviousSong, }, // … gtk::ToolButton { icon_widget: &new_icon("gtk-media-next"), clicked => playlist@NextSong, }, }, // … }, delete_event(_, _) => (Quit, Inhibit(false)), } } We'll handle these messages in the Paylist::update() method: fn update(&mut self, event: Msg) { match event { AddSong(path) => self.add(&path), LoadSong(path) => self.load(&path), NextSong =>, PauseSong => (), PlaySong =>, PreviousSong => self.previous(), RemoveSong => self.remove_selection(), SaveSong(path) =>, // To be listened by App. SongStarted(_) => (), StopSong => self.stop(), } } This requires some new methods: fn next(&mut self) { let selection = self.treeview.get_selection(); let next_iter = if let Some((_, iter)) = selection.get_selected() { if !self.model.model.iter_next(&iter) { return; } Some(iter) } else { self.model.model.get_iter_first() }; if let Some(ref iter) = next_iter { selection.select_iter(iter);; } } fn previous(&mut self) { let selection = self.treeview.get_selection(); let previous_iter = if let Some((_, iter)) = selection.get_selected() { if !

Then, we update the stopped field of the application state because the click handler for the play button will use it to decide whether we want to play or resume the music. We also call set_playing() to indicate to the player thread whether it needs to continue playing the song or not. This method is defined as such: fn set_playing(&self, playing: bool) { *self.event_loop.playing.lock().unwrap() = playing; let (ref lock, ref condition_variable) = *self.event_loop.condition_variable; let mut started = lock.lock().unwrap(); *started = playing; if playing { condition_variable.notify_one(); } } It sets the playing variable and then notifies the player thread to wake it up if playing is true.

We first need to create a new method in the Playlist: pub fn next(&self) -> bool { let selection = self.treeview.get_selection(); let next_iter = if let Some((_, iter)) = selection.get_selected() { if !self.model.iter_next(&iter) { return false; } Some(iter) } else { self.model.get_iter_first() }; if let Some(ref iter) = next_iter { selection.select_iter(iter);; } next_iter.is_some() } We start by getting the selection. Then we check whether an item is selected: in this case, we try to get the item after the selection. Otherwise, we get the first item on the list. Then, if we were able to get an item, we select it and start playing the song. We return whether we changed the selection or not.

The Ages of Globalization
by Jeffrey D. Sachs
Published 2 Jun 2020

Then, to make matters even more dramatic, AlphaGo was decisively defeated by a next-generation AI system that learned Go from scratch in self-play over a few hours. Once again, hundreds of years of expert study and competition could be surpassed in a few hours of learning through self-play. The advent of learning through self-play, sometimes called “tabula rasa” or blank-slate learning, is mind-boggling. In tabula-rasa learning, the AI system is trained to play against itself, for example in millions of games of chess, with the weights of the neural networks updated depending on the wins and losses in self-play. Starting from no information whatsoever other than the rules of chess, the AI system plays against itself in millions of chess games and uses the results to update the neural-network weights in order to learn chess-playing skills.

With the vast increases in computational capacity and speed of computers represented by Moore’s law, artificial intelligence systems are now being built with hundreds of layers of digital neurons and very high-dimensional digital inputs and outputs. With sufficiently large “training sets” of data or ingenious designs of self-play described below, neural networks are achieving superhuman skills on a rapidly expanding array of challenges, from board games like Chess and Go, to interpersonal games such as poker, to sophisticated language operations such as real-time translation, and to professional medical skills such as complex diagnostics.

Starting from no information whatsoever other than the rules of chess, the AI system plays against itself in millions of chess games and uses the results to update the neural-network weights in order to learn chess-playing skills. Remarkably, in just four hours of self-play, an advanced computer AI system developed by the company DeepMind learned all of the skills needed to handily defeat the world’s best human chess players as well as the previous AI world-champion chess player!3 A few hours of blank-slate learning bested 600 years of learning of chess play by all of the chess experts in history. Technological Advances and the End of Poverty In 2006, I published a book titled The End of Poverty in which I suggested that the end of extreme poverty was within the reach of our generation, indeed by 2025, if we made increased global efforts to help the poor.4 I had in mind special efforts to bolster health, education, and infrastructure for the world’s poorest people, notably in sub-Saharan African and South Asia, home to most of the world’s extreme poverty.

Four Battlegrounds
by Paul Scharre
Published 18 Jan 2023

For DeepMind’s next version, AlphaZero, three different versions of the same algorithm were trained to reach superhuman performance through self-play in chess (44 million self-play games), go (21 million self-play games), and the Japanese strategy game shogi (24 million self-play games). For each type of game, 5,000 AI-specialized computer chips were used to generate the simulated games, allowing compute to effectively act as a substitute for real-world data. Strategy games are a special case since they can be perfectly simulated, while the complexity of the real world oftentimes cannot, but synthetic data can help augment datasets when real-world data may be limited.

AlphaGo then refined its performance to superhuman levels through self-play, a form of training on synthetic data in which the computer plays against itself. An updated version, AlphaGo Zero, released the following year, reached superhuman performance without any human training data at all, playing 4.9 million games against itself. AlphaGo Zero was able to entirely replace human-generated data with synthetic data. (This also had the benefit of allowing the algorithm to learn to play go without adopting any biases from human players.) A subsequent version of AlphaGo Zero was trained on 29 million games of self-play. For DeepMind’s next version, AlphaZero, three different versions of the same algorithm were trained to reach superhuman performance through self-play in chess (44 million self-play games), go (21 million self-play games), and the Japanese strategy game shogi (24 million self-play games).

“It splits its bets into three, four, five different sizes,” Daniel McAulay (who lost to Libratus) told Wired magazine. “No human has the ability to do that.” Chess grandmasters have pored over the moves of the chess-playing AI agent AlphaZero to analyze its style. AlphaZero learned to play chess entirely through self-play without any data from human games and has adopted a unique playing style. AlphaZero focuses its energies on attacking the opponent’s king, resulting in “ferocious, unexpected attacks,” according to experts who have studied its play. AlphaZero is willing to sacrifice material for positional advantage and strongly favors optionality—moves that give it more options in the future.

The Deep Learning Revolution (The MIT Press)
by Terrence J. Sejnowski
Published 27 Sep 2018

Watson Research Center in Yorktown Heights, New York, Gerald Tesauro worked with me when he was at the Center for Complex Systems Research at the University of Illinois in Urbana-Champaign on the problem of teaching a neural network to play backgammon (figure 10.1).2 Our approach used expert supervision to train networks with backprop to evaluate game positions and possible moves. The flaw in this approach was that the program could never get better than our experts, who were not at world-championship level. But, with self-play, it might be possible to do better. The problem with self-play at that time was that the only learning signal was win or lose at the end of the game. But when one side won, which of the many moves were responsible? This is called the “temporal credit assignment problem.” A learning algorithm that can solve this temporal credit assignment problem was invented in 1988 by Richard Sutton,3 who had been working closely with Andrew Barto, his doctoral advisor, at the University of Massachusetts at Amherst, on difficult problems in reinforcement learning, a branch of machine learning inspired by associative learning in animal experiments (figure 10.2).

Robertie won most of the games but was surprised to lose several well-played ones and declared it the best backgammon program he had ever played. Some of TD-Gammon’s unusual moves he had never seen before; on closer examination, these proved to be improvements on human play overall. Robertie returned when the program had reached 1.5 million self-played games and was astonished when TD-Gammon played him to a draw. It had gotten so much better that he felt it had achieved human-championship level. One backgammon expert, Kit Woolsey, found that TD-Gammon’s positional judgment on whether to play “safe” (low risk/reward) or play “bold” (high risk/reward) was at that time better than that of any human he had seen.

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis, Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, arXiv:1712.01815 (2017).

Artificial Intelligence: A Guide for Thinking Humans
by Melanie Mitchell
Published 14 Oct 2019

This was the first of several times IBM saw its stock price increase after a demonstration of a game-playing program beating humans; as a more recent example, IBM’s stock price similarly rose after the widely viewed TV broadcasts in which its Watson program won in the game show Jeopardy! While Samuel’s checkers player was an important milestone in AI history, I made this historical digression primarily to introduce three all-important concepts that it illustrates: the game tree, the evaluation function, and learning by self-play. Deep Blue Although Samuel’s “tricky but beatable” checkers program was remarkable, especially for its time, it hardly challenged people’s idea of themselves as uniquely intelligent. Even if a machine could win against human checkers champions (as one finally did in 199413), mastering the game of checkers was never seen as a proxy for general intelligence.

In the popular media, it was Deep Blue versus Kasparov all over again, with an endless supply of think pieces on what AlphaGo’s triumph meant for the future of humanity. But this was even more significant than Deep Blue’s win: AI had surmounted an even greater challenge than chess and had done so in a much more impressive fashion. Unlike Deep Blue, AlphaGo acquired its abilities by reinforcement learning via self-play. Demis Hassabis noted that “the thing that separates out top Go players [is] their intuition” and that “what we’ve done with AlphaGo is to introduce with neural networks this aspect of intuition, if you want to call it that.”26 How AlphaGo Works There have been several different versions of AlphaGo, so to keep them straight, DeepMind started naming them after the human Go champions the programs had defeated—AlphaGo Fan and AlphaGo Lee—which to me evoked the image of the skulls of vanquished enemies in the collection of a digital Viking.

It’s certainly true that the deep Q-learning method used in AlphaGo can be used to learn other tasks, but the system itself would have to be wholly retrained; it would have to start essentially from scratch in learning a new skill. This brings us back to the “easy things are hard” paradox of AI. AlphaGo was a great achievement for AI; learning largely via self-play, it was able to definitively defeat one of the world’s best human players in a game that is considered a paragon of intellectual prowess. But AlphaGo does not exhibit human-level intelligence as we generally define it, or even arguably any real intelligence. For humans, a crucial part of intelligence is, rather than being able to learn any particular skill, being able to learn to think and to then apply our thinking flexibly to whatever situations or challenges we encounter.

Human Compatible: Artificial Intelligence and the Problem of Control
by Stuart Russell
Published 7 Oct 2019

The program learned essentially from scratch, by playing against itself and observing the rewards of winning and losing.60 In 1992, Gerry Tesauro applied the same idea to the game of backgammon, achieving world-champion-level play after 1,500,000 games.61 Beginning in 2016, DeepMind’s AlphaGo and its descendants used reinforcement learning and self-play to defeat the best human players at Go, chess, and shogi. Reinforcement learning algorithms can also learn how to select actions based on raw perceptual input. For example, DeepMind’s DQN system learned to play forty-nine different Atari video games entirely from scratch—including Pong, Freeway, and Space Invaders.62 It used only the screen pixels as input and the game score as a reward signal.

Indeed, its behavior might be identical to that of a machine that just wants to give its opponent a really exciting game. So, saying that AlphaGo “has the purpose of winning” is an oversimplification. A better description would be that AlphaGo is the result of an imperfect training process—reinforcement learning with self-play—for which winning was the reward. The training process is imperfect in the sense that it cannot produce a perfect Go player: AlphaGo learns an evaluation function for Go positions that is good but not perfect, and it combines that with a lookahead search that is good but not perfect. The upshot of all this is that discussions beginning with “suppose that robot R has purpose P” are fine for gaining some intuition about how things might unfold, but they cannot lead to theorems about real machines.

AlphaZero is described by David Silver et al., "Mastering chess and shogi by self-play with a general reinforcement learning algorithm," arXiv:1712.01815 (2017).

The Alignment Problem: Machine Learning and Human Values
by Brian Christian
Published 5 Oct 2020

No sooner had AlphaGo reached the pinnacle of the game of Go, however, than it was, in 2017, summarily dethroned, by an even stronger program called AlphaGo Zero.86 The biggest difference between the original AlphaGo and AlphaGo Zero was in how much human data the latter had been fed to imitate: zero. From a completely random initialization, tabula rasa, it simply learned by playing against itself, again and again and again and again. Incredibly, after just thirty-six hours of self-play, it was as good as the original AlphaGo, which had beaten Lee Sedol. After seventy-two hours, the DeepMind team set up a match between the two, using the exact same two-hour time controls and the exact version of the original AlphaGo system that had beaten Lee. AlphaGo Zero, which consumed a tenth of the power of the original system, and which seventy-two hours earlier had never played a single game, won the hundred-game series—100 games to 0.

For more detail about AlphaZero, see Silver et al., "A General Reinforcement Learning Algorithm That Masters Chess, Shogi, and Go Through Self-Play."

In 2018, AlphaGo Zero was further refined into an even stronger program—and a more general one, capable of record-breaking strength in not just Go but chess and shogi—called AlphaZero. For more detail about AlphaZero, see Silver et al., “A General Reinforcement Learning Algorithm That Masters Chess, Shogi, and Go Through Self-Play.” In 2019, a subsequent iteration of the system called MuZero matched this level of performance with less computation and less advance knowledge of the rules of the game, while proving flexible enough to excel at not just board games but Atari games as well; see Schrittwieser et al., “Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model.” 87.

Artificial Intelligence: A Modern Approach
by Stuart Russell and Peter Norvig
Published 14 Jul 2019

But this year, it became like a god of Go.” ALPHAGO benefited from studying hundreds of thousands of past games by human Go players, and from the distilled knowledge of expert Go players that worked on the team. A followup program, ALPHAZERO, used no input from humans (except for the rules of the game), and was able to learn through self-play alone to defeat all opponents, human and machine, at Go, chess, and shogi (Silver et al., 2018). Meanwhile, human champions have been beaten by AI systems at games as diverse as Jeopardy! (Ferrucci et al., 2010), poker (Bowling et al., 2015; Moravčík et al., 2017; Brown and Sandholm, 2019), and the video games Dota 2 (Fernandez and Mahlmann, 2018), StarCraft II (Vinyals et al., 2019), and Quake III (Jaderberg et al., 2019).

For some simple games, that happens to be the same answer as “what is the best move if both players play well?,” but for most games it is not. To get useful information from the playout we need a playout policy that biases the moves towards good ones. For Go and other games, playout policies have been successfully learned from self-play by using neural networks. Sometimes game-specific heuristics are used, such as “consider capture moves” in chess or “take the corner square” in Othello. Given a playout policy, we next need to decide two things: from what positions do we start the playouts, and how many playouts do we allocate to each position?

There is a theoretical argument that C should be , but in practice, game programmers try multiple values for C and choose the one that performs best. (Some programs use slightly different formulas; for example, ALPHAZERO adds in a term for move probability, which is calculated by a neural network trained from past self-play.) With C = 1.4, the 60/79 node in Figure 6.10 has the highest UCB1 score, but with C = 1.5, it would be the 2/11 node. Figure 6.11 shows the complete UCT MCTS algorithm. When the iterations terminate, the move with the highest number of playouts is returned. You might think that it would be better to return the node with the highest average utility, but the idea is that a node with 65/100 wins is better than one with 2/3 wins, because the latter has a lot of uncertainty.

Army of None: Autonomous Weapons and the Future of War
by Paul Scharre
Published 23 Apr 2018

Within a mere four hours of self-play and with no training data, AlphaZero eclipsed the previous top chess program. The method behind AlphaZero, deep reinforcement learning, appears to be so powerful that it is unlikely that humans can add any value as members of a “centaur” human-machine team for these games. Tyler Cowen, “The Age of the Centaur Is *Over* Skynet Goes Live,”, December 7, 2017, David Silver et al., “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” December 5, 2017, 325 as computers advance: Cowen, “What Are Humans Still Good For?”

They simply fed a neural network massive amounts of data and let it learn all on its own, and some of the things it learned were surprising. In 2017, DeepMind surpassed their earlier success with a new version of AlphaGo. With an updated algorithm, AlphaGo Zero learned to play go without any human data to start. With only access to the board and the rules of the game, AlphaGo Zero taught itself to play. Within a mere three days of self-play, AlphaGo Zero had eclipsed the previous version that had beaten Lee Sedol, defeating it 100 games to 0. These deep learning techniques can solve a variety of other problems. In 2015, even before DeepMind debuted AlphaGo, DeepMind trained a neural network to play Atari games. Given only the pixels on the screen and the game score as input and told to maximize the score, the neural network was able to learn to play Atari games at the level of a professional human video game tester.

Wonderland: How Play Made the Modern World
by Steven Johnson
Published 15 Nov 2016

In a fitting echo of the musical innovations that preceded them, the Banu Musa even included a description of how their instrument could be embedded inside an automaton, creating the illusion that the robot musician was playing the encoded melody on a flute. Reconstruction of the Banu Musa’s self-playing music automaton The result was not just an instrument that played itself, as marvelous as that must have been. The Banu Musa were masters of automation, to be sure, but humans had been tinkering with the idea of making machines move in lifelike ways since the days of Plato. Animated peacocks, water clocks, robotic dancers—all these contraptions were engineering marvels, but they also shared a fundamental limitation.

This vast cycle of encoding and decoding is now as ubiquitous as electricity in our lives, and yet, like electricity, the cycle was for all practical purposes nonexistent just a hundred and fifty years ago. Not surprisingly, one of the very first technologies that introduced the coding/decoding cycle to everyday life took the form of a musical instrument, one with a direct lineage to Vaucanson and the Banu Musa: the player piano. Though its prehistory dates back to the House of Wisdom, a self-playing piano became a central focus for instrument designers in the second half of the nineteenth century; dozens of inventors from the United States and Europe contributed partial solutions to the problem of designing a machine that could mimic the feel of a human pianist. The new opportunities for expression that Cristofori’s pianoforte had introduced posed a critical challenge for automating that expression; it wasn’t enough to record the correct sequence of notes—the player piano also had to capture the loudness of each individual note, what digital music software now calls “velocity.”

The World Beyond Your Head: On Becoming an Individual in an Age of Distraction
by Matthew B. Crawford
Published 29 Mar 2015

The freedom and dignity of this modern self depend on its being insulated from contingency—by layers of representation. As Thomas de Zengotita points out in his beautiful book Mediated, representations are addressed to us, unlike dumb nature, which just sits there. They are fundamentally flattering, placing each of us at the center of a little “me-world.”1 If the world encountered as something distinct from the self plays a crucial role for a person in achieving adult agency, then it figures that when our encounters with the world are increasingly mediated by representations that soften this boundary, this will have some effect on the kind of selves we become. To see this, consider children’s television. THE MOUSEKE-DOER In the old Mickey Mouse cartoons from the early and middle decades of the twentieth century, by far the most prominent source of hilarity is the capacity of material stuff to generate frustration, or rather demonic violence.

As we have seen, the dialectic between tradition and innovation allows the organ maker to understand his own inventiveness as a going further in a trajectory he has inherited. This is very different from the modern concept of creativity, which seems to be a crypto-theological concept: creation ex nihilo. For us the self plays the role of God, and every eruption of creativity is understood to be like a miniature Big Bang, coming out of nowhere. This way of understanding inventiveness cannot connect us to others, or to the past. It also falsifies the experience to which we give the name “creativity” by conceiving it to be something irrational, incommunicable, unteachable.

Ghost Work: How to Stop Silicon Valley From Building a New Global Underclass
by Mary L. Gray and Siddharth Suri
Published 6 May 2019

It was also given a large database of games between human experts. That database was used with supervised learning to train the initial move selection function (the “policy function”) for AlphaGo. Then AlphaGo carried out a second phase of “self play” in which it played against a copy of itself (a technique first developed by Arthur Samuel in 1959, I believe) and applied reinforcement learning algorithms to refine the policy function. Finally, they ran additional self-play games to learn a “value function” (value network) that predicted which side would win in each board state. During game play, AlphaGo combines the value function with the policy function to select moves based on forward search (Monte Carlo tree search).

pages: 346 words: 97,890

by Michael Wooldridge
Published 2 Nov 2018

AlphaGo used two neural networks: the value network was solely concerned with estimating how good a given board position was, while the policy network made recommendations about which move to make, based on a current board position.14 The policy network contained 13 layers, and was trained by using supervised learning first, where the training data was examples of expert games played by humans, and then reinforcement learning, based on self-play. Finally, these two networks were embedded in a sophisticated search technique, called Monte Carlo tree search. Before the system was announced, DeepMind hired Fan Hui, a European Go champion, to play against AlphaGo: the system beat him five games to zero. This was the first time a Go program had beaten a human champion player in a full game.

The extraordinary thing about AlphaGo Zero is that it learned how to play to a super-human level without any human supervision at all: it just played against itself.16 To be fair, it had to play itself a lot, but nevertheless it was a striking result, and it was further generalized in another follow-up system called AlphaZero, which learned to play a range of other games, including chess: after just nine hours of self-play, AlphaZero was able to consistently beat or draw against Stockfish, one of the world’s leading dedicated chess-playing programs. Veterans from the computer chess community were astonished. The idea that AlphaZero had played itself for nine hours and taught itself to be a world-class chess player was almost beyond belief.

pages: 419 words: 109,241

by Daniel Susskind
Published 14 Jan 2020

The new machine, dubbed AlphaZero, was matched up against the champion chess computer Stockfish. Of the fifty games where AlphaZero played white, it won twenty-five and drew twenty-five; of the fifty games where it played black, it won three and drew forty-seven. David Silver, Thomas Hubert, Julian Schrittwieser, et al., “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,”, arXiv:1712.01815v1 (2017).   7.  Tyler Cowen, “The Age of the Centaur Is Over Skynet Goes Live,” Marginal Revolution, 7 December 2017.   8.  See Kasparov, Deep Thinking, chap. 11.   9.  Data is from Ryland Thomas and Nicholas Dimsdale, “A Millennium of UK Data,” Bank of England OBRA data set (2017).

Silver, David, Thomas Hubert, Julian Schrittwieser, et al. "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.", arXiv:1712.01815v1 (2017)

pages: 424 words: 114,905

by Eric Topol
Published 1 Jan 2019

To that he said, “They’re a very press hungry organization.”13 Marcus isn’t alone in the critique of AlphaGo Zero. A sharp critique by Jose Camacho Collados made several key points including the lack of transparency (the code is not publicly available), the overreach of the author’s claim of “completely learning from ‘self-play,’” considering the requirement for teaching the game rules and for some prior game knowledge, and the “responsibility of researchers in this area to accurately describe… our achievements and try not to contribute to the growing (often self-interested) misinformation and mystification of the field.”14 Accordingly, some of AI’s biggest achievements to date may have been glorified.

Silver, D., et al., Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv, 2017.

pages: 444 words: 117,770

by Mustafa Suleyman
Published 4 Sep 2023

Just as today’s models produce detailed images based on a few words, so in decades to come similar models will produce a novel compound or indeed an entire organism with just a few natural language prompts. That compound’s design could be improved by countless self-run trials, just as AlphaZero became an expert chess or Go player through self-play. Quantum technologies, many millions of times more powerful than the most powerful classical computers, could let this play out at a molecular level. This is what we mean by hyper-evolution—a fast, iterative platform for creation. Nor will this evolution be limited to specific, predictable, and readily containable areas.

But what if you had a worm that improved itself using reinforcement learning, experimentally updating its code with each network interaction, each time finding more and more efficient ways to take advantage of cyber vulnerabilities? Just as systems like AlphaGo learn unexpected strategies from millions of self-played games, so too will AI-enabled cyberattacks. However much you war-game every eventuality, there’s inevitably going to be a tiny vulnerability discoverable by a persistent AI. Everything from cars and planes to fridges and data centers relies on vast code bases. The coming AIs make it easier than ever to identify and exploit weaknesses.

pages: 170 words: 49,193

The People vs Tech: How the Internet Is Killing Democracy (And How We Save It)
David Silver et al., "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm," DeepMind, December 5, 2017,

David Silver et al., "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm," DeepMind, December 5, 2017,

David Silver et al., "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm", Cornell University Library Research Paper, 5 December 2017,

David Silver et al., 'Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm', arXiv (2017),

David Silver et al., "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm." Cornell University, arXiv:1712.01815 [cs.AI],

Published 24 Mar 2020

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis, "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm", Cornell University Library Research Paper, 5 December 2017,

by Nate Silver
