reinforcement learning

description: field of machine learning

131 results

pages: 625 words: 167,349

The Alignment Problem: Machine Learning and Human Values
by Brian Christian
Published 5 Oct 2020

See, e.g., the “SNARC” system discussed in Minsky, “Theory of Neural-Analog Reinforcement Systems and Its Application to the Brain Model Problem” for an early example, and Chapter 15 of Sutton and Barto, Reinforcement Learning for discussion. 23. Andrew G. Barto, “Reinforcement Learning: A History of Surprises and Connections” (lecture), July 19, 2018, International Joint Conference on Artificial Intelligence, Stockholm, Sweden. 24. Andrew Barto, personal interview, May 9, 2018. 25. The canonical text about reinforcement learning is Sutton and Barto, Reinforcement Learning, recently updated into a second edition. For a summary of the field up to the mid-1990s, see also Kaelbling, Littman, and Moore, “Reinforcement Learning.” 26. Richard Sutton defines and discusses this idea at http://incompleteideas.net/rlai.cs.ualberta.ca/RLAI/rewardhypothesis.html, and it also appears in Sutton and Barto, Reinforcement Learning.

…

So, Russell wondered, walking down the wooded boulevard known as The Uplands in Berkeley, if the human gait was the answer—and reinforcement learning was the method by which the body had found it—then . . . what was the question? Russell wrote a 1998 paper that served as something of a call to action. What the field needed, he argued, was what he called inverse reinforcement learning. Rather than asking, as regular reinforcement learning does, “Given a reward signal, what behavior will optimize it?,” inverse reinforcement learning (or “IRL”) asks the reverse: “Given the observed behaviour, what reward signal, if any, is being optimized?”

…

Read Montague,” Cold Spring Harbor Symposium Interview Series, Brains and Behavior https://www.youtube.com/watch?v=mx96DYQIS_s. 64. Peter Dayan, personal interview, March 12, 2018. 65. Wolfram Schultz, personal interview, June 25, 2018. 66. See, e.g., Niv, “Reinforcement Learning in the Brain.” 67. Niv. 68. For a discussion of potential limitations to the TD-error theory of dopamine, see, e.g., Dayan and Niv, “Reinforcement Learning,” and O’Doherty, “Beyond Simple Reinforcement Learning.” 69. Niv, “Reinforcement Learning in the Brain.” 70. Yael Niv, personal interview, February 21, 2018. 71. Lenson, On Drugs. 72. See, e.g., Berridge, “Food Reward: Brain Substrates of Wanting and Liking,” and Berridge, Robinson, and Aldridge, “Dissecting Components of Reward.” 73.

pages: 350 words: 98,077

Artificial Intelligence: A Guide for Thinking Humans
by Melanie Mitchell
Published 14 Oct 2019

FIGURE 22: A Sony Aibo robotic dog, about to kick a robot soccer ball While reinforcement learning has been part of the AI toolbox for decades, it has long been overshadowed by neural networks and other supervised-learning methods. This changed in 2016 when reinforcement learning played a central role in a stunning and momentous achievement in AI: a program that learned to beat the best humans at the complex game of Go. In order to explain that program, as well as other recent achievements of reinforcement learning, I’ll first take you through a simple example to illustrate how reinforcement learning works. Training Your Robo-Dog For our illustrative example, let’s look to the fun game of robot soccer, in which humans (usually college students) program robots to play a simplified version of soccer on a room-sized “field.”

…

Sutherland wrote about how, after years of futile nagging, sarcasm, and resentment, she used this simple method to covertly train her oblivious husband to pick up his socks, find his own car keys, show up to restaurants on time, and shave more regularly.1 This classic training technique, known in psychology as operant conditioning, has been used for centuries on animals and humans. Operant conditioning inspired an important machine-learning approach called reinforcement learning. Reinforcement learning contrasts with the supervised-learning method I’ve described in previous chapters: in its purest form, reinforcement learning requires no labeled training examples. Instead, an agent—the learning program—performs actions in an environment (usually a computer simulation) and occasionally receives rewards from the environment. These intermittent rewards are the only feedback the agent uses for learning.

…

Deciding how much to explore new actions and how much to exploit (that is, stick with) tried-and-true actions is called the exploration versus exploitation balance. Achieving the right balance is a core issue for making reinforcement learning successful. These are samples of ongoing research topics among the growing community of people working on reinforcement learning. Just as in the field of deep learning, designing successful reinforcement-learning systems is still a difficult (and sometimes lucrative!) art, mastered by a relatively small group of experts who, like their deep-learning counterparts, spend a lot of time tuning hyperparameters.

pages: 2,466 words: 668,761

Artificial Intelligence: A Modern Approach
by Stuart Russell and Peter Norvig
Published 14 Jul 2019

First-Order Inference 9.2Unification and First-Order Inference 9.3Forward Chaining 9.4Backward Chaining 9.5Resolution Summary Bibliographical and Historical Notes 10Knowledge Representation 10.1Ontological Engineering 10.2Categories and Objects 10.3Events 10.4Mental Objects and Modal Logic 10.5Reasoning Systems for Categories 10.6Reasoning with Default Information Summary Bibliographical and Historical Notes 11Automated Planning 11.1Definition of Classical Planning 11.2Algorithms for Classical Planning 11.3Heuristics for Planning 11.4Hierarchical Planning 11.5Planning and Acting in Nondeterministic Domains 11.6Time, Schedules, and Resources 11.7Analysis of Planning Approaches Summary Bibliographical and Historical Notes IVUncertain knowledge and reasoning 12Quantifying Uncertainty 12.1Acting under Uncertainty 12.2Basic Probability Notation 12.3Inference Using Full Joint Distributions 12.4Independence 12.5Bayes’ Rule and Its Use 12.6Naive Bayes Models 12.7The Wumpus World Revisited Summary Bibliographical and Historical Notes 13Probabilistic Reasoning 13.1Representing Knowledge in an Uncertain Domain 13.2The Semantics of Bayesian Networks 13.3Exact Inference in Bayesian Networks 13.4Approximate Inference for Bayesian Networks 13.5Causal Networks Summary Bibliographical and Historical Notes 14Probabilistic Reasoning over Time 14.1Time and Uncertainty 14.2Inference in Temporal Models 14.3Hidden Markov Models 14.4Kalman Filters 14.5Dynamic Bayesian Networks Summary Bibliographical and Historical Notes 15Making Simple Decisions 15.1Combining Beliefs and Desires under Uncertainty 15.2The Basis of Utility Theory 15.3Utility Functions 15.4Multiattribute Utility Functions 15.5Decision Networks 15.6The Value of Information 15.7Unknown Preferences Summary Bibliographical and Historical Notes 16Making Complex Decisions 16.1Sequential Decision Problems 16.2Algorithms for MDPs 16.3Bandit Problems 16.4Partially Observable MDPs 16.5Algorithms for Solving POMDPs Summary Bibliographical and Historical Notes 17Multiagent Decision Making 17.1Properties of Multiagent Environments 17.2Non-Cooperative Game Theory 17.3Cooperative Game Theory 17.4Making Collective Decisions Summary Bibliographical and Historical Notes 18Probabilistic Programming 18.1Relational Probability Models 18.2Open-Universe Probability Models 18.3Keeping Track of a Complex World 18.4Programs as Probability Models Summary Bibliographical and Historical Notes VMachine Learning 19Learning from Examples 19.1Forms of Learning 19.2Supervised Learning 19.3Learning Decision Trees 19.4Model Selection and Optimization 19.5The Theory of Learning 19.6Linear Regression and Classification 19.7Nonparametric Models 19.8Ensemble Learning 19.9Developing Machine Learning Systems Summary Bibliographical and Historical Notes 20Knowledge in Learning 20.1A Logical Formulation of Learning 20.2Knowledge in Learning 20.3Explanation-Based Learning 20.4Learning Using Relevance Information 20.5Inductive Logic Programming Summary Bibliographical and Historical Notes 21Learning Probabilistic Models 21.1Statistical Learning 21.2Learning with Complete Data 21.3Learning with Hidden Variables: The EM Algorithm Summary Bibliographical and Historical Notes 22Deep Learning 22.1Simple Feedforward Networks 22.2Computation Graphs for Deep Learning 22.3Convolutional Networks 22.4Learning Algorithms 22.5Generalization 22.6Recurrent Neural Networks 22.7Unsupervised Learning and Transfer Learning 22.8Applications Summary Bibliographical and Historical Notes 23Reinforcement Learning 23.1Learning from Rewards 23.2Passive Reinforcement Learning 23.3Active Reinforcement Learning 23.4Generalization in Reinforcement Learning 23.5Policy Search 23.6Apprenticeship and Inverse Reinforcement Learning 23.7Applications of Reinforcement Learning Summary Bibliographical and Historical Notes VICommunicating, perceiving, and acting 24Natural Language Processing 24.1Language Models 24.2Grammar 24.3Parsing 24.4Augmented Grammars 24.5Complications of Real Natural Language 24.6Natural Language Tasks Summary Bibliographical and Historical Notes 25Deep Learning for Natural Language Processing 25.1Word Embeddings 25.2Recurrent Neural Networks for NLP 25.3Sequence-to-Sequence Models 25.4The Transformer Architecture 25.5Pretraining and Transfer Learning 25.6State of the art Summary Bibliographical and Historical Notes 26Robotics 26.1Robots 26.2Robot Hardware 26.3What kind of problem is robotics solving?

…

Summary This chapter has examined the reinforcement learning problem: how an agent can become proficient in an unknown environment, given only its percepts and occasional rewards. Reinforcement learning is a very broadly applicable paradigm for creating intelligent systems. The major points of the chapter are as follows. •The overall agent design dictates the kind of information that must be learned: –A model-based reinforcement learning agent acquires (or is equipped with) a transition model P(s' | s, a) for the environment and learns a utility function U(s). –A model-free reinforcement learning agent may learn an action-utility function Q(s, a) or a policy π(s).

…

Facebook’s AI Habitat simulation (Savva et al., 2019) provides a photo-realistic virtual environment for indoor robotic tasks, and their HORIZON platform (Gauci et al., 2018) enables reinforcement learning in large-scale production systems. The SYNTHIA system (Ros et al., 2016) is a simulation environment designed for improving the computer vision capabilities of self-driving cars. The OpenAI Gym (Brockman et al., 2016) provides several environments for reinforcement learning agents, and is compatible with other simulations such as the Google Football simulator. Littman (2015) surveys reinforcement learning for a general scientific audience. The canonical text by Sutton and Barto (2018), two of the field’s pioneers, shows how reinforcement learning weaves together the ideas of learning, planning, and acting.

pages: 416 words: 112,268

Human Compatible: Artificial Intelligence and the Problem of Control
by Stuart Russell
Published 7 Oct 2019

The first successful reinforcement learning system was Arthur Samuel’s checkers program, which created a sensation when it was demonstrated on television in 1956. The program learned essentially from scratch, by playing against itself and observing the rewards of winning and losing.60 In 1992, Gerry Tesauro applied the same idea to the game of backgammon, achieving world-champion-level play after 1,500,000 games.61 Beginning in 2016, DeepMind’s AlphaGo and its descendants used reinforcement learning and self-play to defeat the best human players at Go, chess, and shogi. Reinforcement learning algorithms can also learn how to select actions based on raw perceptual input.

…

The first algorithms for IRL: Andrew Ng and Stuart Russell, “Algorithms for inverse reinforcement learning,” in Proceedings of the 17th International Conference on Machine Learning, ed. Pat Langley (Morgan Kaufmann, 2000). 7. Better algorithms for inverse RL: Pieter Abbeel and Andrew Ng, “Apprenticeship learning via inverse reinforcement learning,” in Proceedings of the 21st International Conference on Machine Learning, ed. Russ Greiner and Dale Schuurmans (ACM Press, 2004). 8. Understanding inverse RL as Bayesian updating: Deepak Ramachandran and Eyal Amir, “Bayesian inverse reinforcement learning,” in Proceedings of the 20th International Joint Conference on Artificial Intelligence, ed.

…

Once it finds something, it will enter into correspondence with you to extract the largest possible amount of money (or to coerce behavior, if the goal is political control or espionage). The extraction of money works as the perfect reward signal for a reinforcement learning algorithm, so we can expect AI systems to improve rapidly in their ability to identify and profit from misbehavior. Early in 2015, I suggested to a computer security expert that automated blackmail systems, driven by reinforcement learning, might soon become feasible; he laughed and said it was already happening. The first blackmail bot to be widely publicized was Delilah, identified in July 2016.5 A more subtle way to change people’s behavior is to modify their information environment so that they believe different things and make different decisions.

pages: 1,331 words: 163,200

Hands-On Machine Learning With Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
by Aurélien Géron
Published 13 Mar 2017

RBMs are trained sequentially in an unsupervised manner, and then the whole system is fine-tuned using supervised learning techniques. Reinforcement Learning Reinforcement Learning is a very different beast. The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards, as in Figure 1-12). It must then learn by itself what is the best strategy, called a policy, to get the most reward over time. A policy defines what action the agent should choose when it is in a given situation. Figure 1-12. Reinforcement Learning For example, many robots implement Reinforcement Learning algorithms to learn how to walk.

…

Pac-Man Using Deep Q-Learning regression, Supervised learningDecision Trees, Regression-Regression regression modelslinear, Training and Evaluating on the Training Set regression versus classification, Multioutput Classification regularization, Overfitting the Training Data-Overfitting the Training Data, Testing and Validating, Regularized Linear Models-Early Stoppingdata augmentation, Data Augmentation-Data Augmentation Decision Trees, Regularization Hyperparameters-Regularization Hyperparameters dropout, Dropout-Dropout early stopping, Early Stopping-Early Stopping, Early Stopping Elastic Net, Elastic Net Lasso Regression, Lasso Regression-Lasso Regression max-norm, Max-Norm Regularization-Max-Norm Regularization Ridge Regression, Ridge Regression-Ridge Regression shrinkage, Gradient Boosting ℓ 1 and ℓ 2 regularization, ℓ1 and ℓ2 Regularization-ℓ1 and ℓ2 Regularization REINFORCE algorithms, Policy Gradients Reinforcement Learning (RL), Reinforcement Learning-Reinforcement Learning, Reinforcement Learning-Thank You!actions, Evaluating Actions: The Credit Assignment Problem-Evaluating Actions: The Credit Assignment Problem credit assignment problem, Evaluating Actions: The Credit Assignment Problem-Evaluating Actions: The Credit Assignment Problem discount rate, Evaluating Actions: The Credit Assignment Problem examples of, Learning to Optimize Rewards Markov decision processes, Markov Decision Processes-Markov Decision Processes neural network policies, Neural Network Policies-Neural Network Policies OpenAI gym, Introduction to OpenAI Gym-Introduction to OpenAI Gym PG algorithms, Policy Gradients-Policy Gradients policy search, Policy Search-Policy Search Q-Learning algorithm, Temporal Difference Learning and Q-Learning-Learning to Play Ms.

…

DeepMind was bought by Google for over 500 million dollars in 2014. So how did they do it? With hindsight it seems rather simple: they applied the power of Deep Learning to the field of Reinforcement Learning, and it worked beyond their wildest dreams. In this chapter we will first explain what Reinforcement Learning is and what it is good at, and then we will present two of the most important techniques in deep Reinforcement Learning: policy gradients and deep Q-networks (DQN), including a discussion of Markov decision processes (MDP). We will use these techniques to train a model to balance a pole on a moving cart, and another to play Atari games.

pages: 288 words: 86,995

Rule of the Robots: How Artificial Intelligence Will Transform Everything
by Martin Ford
Published 13 Sep 2021

While supervised learning dominates, another important technique—“reinforcement learning”—is used in certain applications. Reinforcement learning builds competence through repeated practice or trial and error. When an algorithm ultimately succeeds at a specified objective, it receives a digital reward. This is essentially the way a dog is trained. The animal’s behavior may be random at first, but when it manages to sit in response to the proper command, it gets a treat. Repeat this process enough times and the dog will learn to reliably sit. The leader in reinforcement learning is the London-based company DeepMind, which is now owned by Google’s parent, Alphabet.

…

The achievements may also have led to what venture capitalist and author Kai-Fu Lee has called a “Sputnik moment” in China—in the wake of which the government quickly moved to position the country to become a leader in artificial intelligence.8 While supervised learning depends on massive quantities of labeled data, reinforcement learning requires a huge number of practice runs, the majority of which end in spectacular failure. Reinforcement learning is especially well suited to games—in which an algorithm can rapidly churn though more matches than a human being could play in a lifetime. The approach can also be applied to real-world activities that can be simulated at high speed. The most important practical application of reinforcement learning is currently in training self-driving cars. Before the autonomous driving systems used by Waymo or Tesla ever see a real car or a real road, they are trained at high speed on powerful computers, through which the simulated cars gradually learn after suffering catastrophic crashes thousands of times.

…

In other words, the company had once again demonstrated the same fundamental mechanism achieving parallel results in both a digital algorithm and the biological brain. Research of this type reflects the confidence that Hassabis and his team have in reinforcement learning and their belief that it is a critical component of any attempt to progress toward more general artificial intelligence. In this, they are something of an outlier. Facebook’s Yann LeCun, for example, has stated that he believes reinforcement learning plays a relatively minor role. In his presentations, he often says that if intelligence were a black forest cake, then reinforcement learning would amount to only the cherry on top.22 The team at DeepMind believes it is far more central—and that it possibly provides a viable path to achieving AGI.

pages: 396 words: 117,149

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
by Pedro Domingos
Published 21 Sep 2015

But there’s a whole subfield of machine learning dedicated to algorithms that explore on their own, flail, hit on rewards, and figure out how to get them again in the future, much like babies crawling around and putting things in their mouths. It’s called reinforcement learning, and your first housebot will probably use it a lot. If you ask Robby to make eggs and bacon for you right after you’ve unpacked him and turned him on, it may take a while. But then, while you’re at work, he will explore the kitchen, noting where various things are and what kind of stove you have. By the time you get back, dinner will be ready. An important precursor of reinforcement learning was a checkers-playing program created by Arthur Samuel, an IBM researcher, in the 1950s. Board games are a great example of a reinforcement learning problem: you have to make a long series of moves without any feedback, and the whole reward or punishment comes at the very end, in the form of a win or loss.

…

Luckily, we already know how to do that: all we have to do is wrap reinforcement learning around one of the supervised learners we’ve met before, such as a multilayer perceptron. The neural network’s job is now to predict the value of a state, and the error signal for backpropagation is the difference between the predicted and observed values. There’s a problem, however. In supervised learning the target value for a state is always the same, but in reinforcement learning, it keeps changing as a consequence of updates to nearby states. As a result, reinforcement learning with generalization often fails to settle on a stable solution, unless the inner learner is something very simple, like a linear function.

…

This may help explain why Google paid half a billion dollars for DeepMind, a company with no products, no revenues, and few employees. Gaming aside, researchers have used reinforcement learning to balance poles, control stick-figure gymnasts, park cars backward, fly helicopters upside down, manage automated telephone dialogues, assign channels in cell phone networks, dispatch elevators, schedule space-shuttle cargo loading, and much else. Reinforcement learning has also influenced psychology and neuroscience. The brain does it, using the neurotransmitter dopamine to propagate differences between expected and actual rewards. Reinforcement learning explains Pavlovian conditioning, but unlike behaviorism, it allows animals to have internal mental states.

The Deep Learning Revolution (The MIT Press)
by Terrence J. Sejnowski
Published 27 Sep 2018

The starting position is shown. Two dice are rolled, and the two numbers indicate how far two pieces can be moved ahead. Reward Learning 145 Figure 10.2 Reinforcement learning scenario. The agent actively explores the environment by taking actions and making observations. If an action is successful, the agent receives a reward. The goal is to learn actions that maximize future rewards. decisions, and taking actions. Reinforcement learning is based on the observation that animals solve difficult problems in uncertain conditions by exploring the various options in the environment and learning from their outcomes.

…

By changing the previous estimate to be more like the improved estimate, the decisions that you make on moves will get better 146 Chapter 10 Figure 10.3 Richard Sutton at the University of Alberta in Edmonton in 2006. He taught us how to learn the route to future rewards. Rich is a cancer survivor who has remained a leader in reinforcement learning and continues to develop innovative algorithms. He is generous with his time and insights, which everyone in the field greatly values. His book with Andrew Barto, Reinforcement Learning: An Introduction, is a classic in the field. The second edition is freely available on the Internet. Courtesy of Richard Sutton. and better. The update is made to a value network that estimates the future expected reward for each board position and is used to decide on the next move.

…

The hypothesis that has driven birdsong research is that, during the auditory learning phase, a template is learned that is then used to refine the sounds produced by the motor system in the motor learning phase. The pathways that are responsible for the motor learning phase in both humans and songbirds are in the basal ganglia, where we know that reinforcement learning takes place. In 1995, Kenji Doya, a postdoctoral fellow in my lab, developed a reinforcement learning model for the motor refinement of birdsong (figure 10.7). The model improved its performance by tweaking synapses in the motor pathway to a model of the vocal organ in birds (“syrinx”), and then 156 Chapter 10 Figure 10.6 Simulations of a glider learning to soar in a thermal.

pages: 586 words: 186,548

Architects of Intelligence
by Martin Ford
Published 16 Nov 2018

This explains why companies that control huge amounts of data, like Google, Amazon, and Facebook, have such a dominant position in deep learning technology. REINFORCEMENT LEARNING essentially means learning through practice or trial and error. Rather than training an algorithm by providing the correct, labeled outcome, the learning system is set loose to find a solution for itself, and if it succeeds it is given a “reward.” Imagine training your dog to sit, and if he succeeds, giving him a treat. Reinforcement learning has been an especially powerful way to build AI systems that play games. As you will learn from the interview with Demis Hassabis in this book, DeepMind is a strong proponent of reinforcement learning and relied on it to create the AlphaGo system.

…

It’s the kind of learning where you don’t train for a task, you just observe the world and figure out how it works, essentially. MARTIN FORD: Would reinforcement learning, or learning by practice with a reward for succeeding, be in the category of unsupervised learning? YANN LECUN: No, that’s a different category altogether. There are three categories essentially; it’s more of a continuum, but there is reinforcement learning, supervised learning, and self-supervised learning. Reinforcement learning is learning by trial and error, getting rewards when you succeed and not getting rewards when you don’t succeed. That form of learning in its purest form is incredibly inefficient in terms of samples, and as a consequence works well for games, where you can try things as many times as you want, but doesn’t work in many real-world scenarios.

…

A human can learn to drive a car in 15 hours of training without crashing into anything. If you want to use the current reinforcement learning methods to train a car to drive itself, the machine will have to drive off cliffs 10,000 times before it figures out how not to do that. MARTIN FORD: I guess that’s the argument for simulation. YANN LECUN: I don’t agree. It might be an argument for simulation, but it’s also an argument for the fact that the kind of learning that we can do as humans is very, very different from pure reinforcement learning. It’s more akin to what people call model-based reinforcement learning. This is where you have your internal model of the world that allows you to predict that when you turn the wheel in a particular direction then the car is going to go in a particular direction, and if another car comes in front you’re going to hit it, or if there is a cliff you are going to fall off that cliff.

pages: 303 words: 67,891

Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms: Proceedings of the Agi Workshop 2006
by Ben Goertzel and Pei Wang
Published 1 Jan 2007

Inference control in Novamente takes several forms: 1. Standard forward-chaining and backward-chaining inference heuristics (see e.g. [45]) 2. A reinforcement learning mechanism that allows inference rules to be chosen based on experience. Probabilities are tabulated regarding which inference rules have been useful in the past in which contexts, and these are subsequently used to bias the choices of inference rules during forward or backward chaining inference 3. Application of PLN inference to the probabilities used in the reinforcement learning mechanism--enables generalization, abstraction and analogy to be used in guessing which inference rules are most useful in a given context These different approaches to inference control enable increasingly complex inferences, and involve increasing amounts of processor-time utilization and overall cognitive complexity.

…

IOS Press, 2007 © 2007 The authors and IOS Press. All rights reserved. 253 Probabilistic Logic Based Reinforcement Learning of Simple Embodied Behaviors in a 3D Simulation World Ari HELJAKKA*, Ben GOERTZEL, Welter SILVA, Cassio PENNACHIN, Andre’ SENNA and Izabela GOERTZEL Novamente LLC * heljakka@iki.fi Abstract. Logic-based AI is often thought of as being restricted to highly abstract domains such as theorem-proving and linguistic semantics. In the Novamente AGI architecture, however, probabilistic logic is used for a wider variety of purposes, including simple reinforcement learning of infantile behaviors, which are primarily concerned with perception and action rather than abstract cognition.

…

This paper reports some simple experiments designed to validate the viability of this approach, via using the PLN probabilistic logic framework, implemented within the Novamente AGI architecture, to carry out reinforcement learning of simple embodied behaviors in a 3D simulation world (AGISim). The specific experiment focused upon involves teaching Novamente to play the game of “fetch” using reinforcement learning based on repeated partial rewards. Novamente is an integrative AGI architecture involving considerably more than just PLN; however, in this “fetch” experiment, the only cognitive process PLN is coupled with is simple perceptual pattern mining; other Novamente cognitive processes such as evolutionary learning and economic attention allocation are not utilized, so as to allow the study and demonstration of the power of PLN on its own. 1.

pages: 339 words: 94,769

Possible Minds: Twenty-Five Ways of Looking at AI
by John Brockman
Published 19 Feb 2019

One of the tools used for solving this problem is inverse-reinforcement learning. Reinforcement learning is a standard method for training intelligent machines. By associating particular outcomes with rewards, a machine-learning system can be trained to follow strategies that produce those outcomes. Wiener hinted at this idea in the 1950s, but the intervening decades have developed it into a fine art. Modern machine-learning systems can find extremely effective strategies for playing computer games—from simple arcade games to complex real-time strategy games—by applying reinforcement-learning algorithms. Inverse reinforcement learning turns this approach around: By observing the actions of an intelligent agent that has already learned effective strategies, we can infer the rewards that led to the development of those strategies.

…

For example, researchers at Google’s DeepMind used a combination of deep learning and reinforcement learning to teach a computer to play Atari video games. The computer knew nothing about how the games worked. It began by acting randomly and got information only about what the screen looked like at each moment and how well it had scored. Deep learning helped interpret the features on the screen, and reinforcement learning rewarded the system for higher scores. The computer got very good at playing several of the games, but it also completely bombed on others that were just as easy for humans to master. A similar combination of deep learning and reinforcement learning has enabled the success of DeepMind’s AlphaZero, a program that managed to beat human players at both chess and Go, equipped with only a basic knowledge of the rules of the game and some planning capacities.

…

These developments have created strong connections to other disciplines that build on similar concepts, including control theory, economics, operations research, and statistics. In both the logical-planning and rational-agent views of AI, the machine’s objective—whether in the form of a goal, a utility function, or a reward function (as in reinforcement learning)—is specified exogenously. In Wiener’s words, this is “the purpose put into the machine.” Indeed, it has been one of the tenets of the field that AI systems should be general purpose—i.e., capable of accepting a purpose as input and then achieving it—rather than special purpose, with their goal implicit in their design.

pages: 339 words: 92,785

I, Warbot: The Dawn of Artificially Intelligent Conflict
by Kenneth Payne
Published 16 Jun 2021

A third approach was the one that Ng had used—reinforcement learning—where the machine interacted dynamically with its environment and tuned up its connections on the basis of some reward. That reinforcement learning approach to deep learning has underpinned many of the landmark breakthroughs of recent years, including some that have direct relevance for warbot designers. Like the researchers, they are looking for AI that can learn dynamically, through seeking to optimise its performance. The use of electronic games as a test-bed for reinforcement learning has been a particular research focus. The attraction to military minds is obvious—games are adversarial, and the goal is to win.

…

Jaderberg, Max, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie et al. ‘Human-level performance in 3D multiplayer games with population-based reinforcement learning’, Science 364, no. 6443 (2019): 859–865. 2. Berner, Christopher, Greg Brockman, Brooke Chan, Vicki Cheung, Przemyslaw Debiak, Christy Dennison, David Farhi et al. ‘Dota 2 with large scale deep reinforcement learning’, arXiv preprint arXiv:1912.06680 (2019). 3. See OpenAI, ‘OpenAI Five defeats Dota 2 World Champions’, 15 April 2019, https://openai.com/blog/openai-five-defeats-dota-2-world-champions/. 4.

…

Andrew Ng is one of the leading figures in the recent story of AI. And it was his DARPA-funded model helicopter that really signalled a new start for AI. Or, perhaps, less of a new start than a revival of old approaches from the early days of AI. Rather than program in rules for the helicopter, Ng and his team used ‘reinforcement learning’ techniques, so that the computer’s ‘policy module’ learned for itself what to do.3 A handcrafted ‘expert systems’ approach to flying would suffer the usual difficulties—gusts of wind and thermals would change the aerodynamics for the helicopter in unpredictable ways, and any pre-programmed expert rules wouldn’t be sufficiently responsive.

High-Frequency Trading
by David Easley , Marcos López de Prado and Maureen O'Hara
Published 28 Sep 2013

In each case study, we focus on a specific trading problem we would like to solve or optimise, the (microstructure) data from which we hope to solve this problem, the variables or features derived from the data as inputs to a machine learning process and the machine learning algorithm applied to these features. The cases studies we shall examine are the following. • Optimised trade execution via reinforcement learning (Nev- myvaka et al 2006). We investigate the problem of buying (respectively, selling) a specified volume of shares in a specified amount of time, with the goal of minimising the expenditure (respectively, maximising the revenue). We apply a wellstudied machine learning method known as “reinforcement learning” (Sutton and Barto 1998), which has its roots in control theory. Reinforcement learning applies state-based models that attempt to specify the optimal action to take from a given state according to a discounted future reward criterion.

…

URL: http://www.sec.gov/ news/speech/2013/spch021913ebw.htm. 230 i i i i i i “Easley” — 2013/10/8 — 11:31 — page 231 — #251 i i Index (page numbers in italic type relate to tables or figures) A B algorithmic execution: and leaking of information, 159–83, 160, 162, 163, 165, 167, 168, 169, 171, 172, 174, 176–7, 178–9; see also AlphaMax; BadMax BadMax approach and data sample, 166–8, 168 and BadMax and gross and net alpha, 168–70 and clustering analysis, 170–4, 171, 172, 174 definition, 160–4 and GSET, 174–5, 176–7 and high alpha of large clusters, 170–4 and trading algorithms, 70–1 algorithms: generations of, 23–7 predatory, 8 tactical liquidity provision 10–11 trading: and algorithmic decision-making, 71–2 and algorithmic execution, 70–1 evolution of, 22–8 generations, 23–7; see also trading algorithms, evolution of and indicator zoology, 27–8 and transaction cost, 28–31 AlphaMax, 160–6 passim see also BadMax; information leakage alternative limit order book, 80–6 agent-based model, 83–5, 86 results and conclusion, 85 and spread/price–time priority, 82–3 BadMax 159–83 passim, 169, 178–9, 180–2 and data sample, 166–8 and gross and net alpha, 168–70 profitability grid, 180–2 see also algorithmic execution: and leaking information; AlphaMax; information leakage Black Wednesday, 8 C clustering analysis, and high alpha of large clusters, 170–4 CME, Nasdaq’s joint project with, xvi cointegration, 44, 53–9 Consolidated Audit Tape (CAT), 216 construction of trading signals, 31–8 and order book imbalance, 36–8 and timescales and weights, 31–3, 33 and trade sign autocorrelations, 34–6 cumulative distribution function, 130–1 D dark pools, smart order routing in, 115–22 E equity markets: execution strategies in, 21–41, 25, 29, 30, 33, 35, 37, 38, 40 and fair value and order protection, 38–41, 40 i i i i i i “Easley” — 2013/10/8 — 11:31 — page 232 — #252 i i HIGH-FREQUENCY TRADING and trading signals, construction of, 31–8; see also trading signals and types of client or market agent, 22 European Exchange Rate Mechanism (ERM), sterling joins, 8 execution shortfall, and information leakage, 164–6, 165; see also information leakage execution strategies: in equity markets, 21–41, 25, 29, 30, 33, 35, 37, 38, 40 and fair value and order protection, 38–41, 40 and trading signals, construction of, 31–8; see also trading signals in fixed-income markets, 43–62, 47, 48, 49, 50, 51, 52, 54, 55, 57, 58, 61 and cointegration, 44, 53–9 and information events, 44, 46–53 and pro rata matching, 44, 59–62 and fixed-income products, 44–6 experimental evaluation, 133–40 F fair value and order protection, 38–41, 40 fixed-income markets: execution strategies in, 43–62, 47, 48, 49, 50, 51, 52, 54, 55, 57, 58, 61 and cointegration, 44, 53–9 and information events, 44, 46–53 and pro rata matching, 44, 59–62 and short-term interest rates, 45–6 and Treasury futures, 46 fixed-income products, 44–6 and short-term interest rates, 45–6 and Treasury futures, 46, 47, 48, 51, 52, 55 see also fixed-income markets flash crash, 2, 77–8, 207, 209–10, 210, 218 see also market stress foreign-exchange markets: and the currency market, 65–73 trading algorithms, 69–72 and trading frequencies, 65–73, 72, 73 venues, 66–9 high-frequency trading in, 65–88, 66, 72, 73, 86 academic literature, 74–80 and alternative limit order book, 80–6; see also main entry Foresight Project, 215, 217, 224 futures markets: microstructural volatility in, 125–41, 133, 134, 136, 137, 138–9 experimental evaluation, 133–40 HDF5 file format, 127 maximum intermediate return, 131–2 parallelisation, 132–3 test data, 126–7 and volume-synchronised probability of informed trading, 128–31 G Goldman Sachs Electronic Trading (GSET), 159, 160, 161, 163, 166–80 passim, 167, 168, 169, 174–5 H HDF5 file format, 127 high-frequency trading (HFT): and “cheetah traders”, 1, 13 232 i i i i i i “Easley” — 2013/10/8 — 11:31 — page 233 — #253 i i INDEX and event-based time paradigm, 15 in FX markets 65–88, 66, 72, 73, 86; see also foreign-exchange markets, 74–80 and alternative limit order book, 80–6; see also main entry and the currency market, 65–73 and trading frequencies, 65–73, 72, 73 legislative changes enable, 2 machine learning for, 91–123, 100, 101, 103, 104, 107, 108–9, 111, 117, 121 and high-frequency data, 94–6 and optimised execution in dark pools via censored exploration, 93 and optimised trade execution via reinforcement learning, 92 and predicting price movement from order book state, 92–3 and price movement from order book state, predicting, 104–15 and reinforcement learning for optimised trade execution, 96–104 and smart order routing in dark pools, 115–22 in market stress, 76–80 central bank interventions, 79–80 flash crash (2010), 77–8 yen appreciation (2007), 77 yen appreciation (2011), 78–9 markets’ operation and dynamic interaction changed by, xv and matching engine, 3, 4 and more than speed, 7–12 new paradigm in, 2–4 paradigm of, insights into, 1–17, 7, 10–11, 14 regulatory challenge of, 207–9, 210, 212, 214 good and bad news concerning, 208–14 and greater surveillance and coordination, proposals for, 215–18 and market rules, proposals to change, 218–25 and proposals to curtail HFT, 225–8 solutions, 214–28 statistics to monitor, developing, 15 and time, meaning of, 5–7 and volatility, heightening of, 12 see also low-frequency trading I implementation shortfall: approach to, illustrated, 192–203 daily estimation, 195–9 intra-day estimation, 199–203 shortfall calculations, 193–5 discussed, 186–9 with transitory price effects, 185–206, 196, 197, 198, 199, 200, 202, 203 implementation details, 204–5 and observed and efficient prices and pricing errors, 189–92 indicator zoology, 27–8 information events, 44, 46–53 and event microscope, 50–3 information leakage: and algorithmic execution, 159–83, 176–7, 178–9 BadMax approach and data sample, 166–8, 168 233 i i i i i i “Easley” — 2013/10/8 — 11:31 — page 234 — #254 i i HIGH-FREQUENCY TRADING and BadMax and gross and net alpha, 168–70 and clustering analysis, 170–4, 171, 172, 174 and GSET, 174–5, 176–7 and high alpha of large clusters, 170–4 defining, 160–4 and execution shortfall, 164–6, 165 see also AlphaMax; BadMax L large clusters, high alpha of, 170–4 Large Hadron Collider, 125–41 latency arbitrage, 9 leakage of information: and algorithmic execution, 159–83, 176–7, 178–9 BadMax approach and data sample, 166–8, 168 and BadMax and gross and net alpha, 168–70 and clustering analysis, 170–4, 171, 172, 174 and GSET, 174–5, 176–7 and high alpha of large clusters, 170–4 defining, 160–4 and execution shortfall, 164–6, 165 see also AlphaMax; BadMax liquidity squeezers, 9 liquidity and toxicity contagion, 143–56, 144, 145, 147, 148, 151, 153, 154 empirical analysis, 151–5 order-flow toxicity contagion model, 146–51 low-frequency trading: choices needed for survival of, 15 and event-based time paradigm, 15 joining the herd, 15 and monitoring of HFT activity, 15 and order-flow toxicity, monitoring, 16 and seasonal effects, avoiding, 16 and smart brokers, 16 see also high-frequency trading M machine learning: for high-frequency trading (HFT) and market microstructure, 91–123, 100, 101, 103, 104, 107, 108–9, 111, 117, 121 and high-frequency data, 94–6 and optimised execution in dark pools via censored exploration, 93 and optimised trade execution via reinforcement learning, 92 and predicting price movement from order book state, 92–3 and price movement from order book state, predicting, 104–15 and reinforcement learning for optimised trade execution, 96–104 and smart order routing in dark pools, 115–22 Market Information Data Analytics System (MIDAS), 215–16 market microstructure: machine learning for, 91–123, 100, 101, 103, 104, 107, 108–9, 111, 117, 121 and high-frequency data, 94–6 and optimised execution in dark pools via censored exploration, 93 and optimised trade execution via reinforcement learning, 92 234 i i i i i i “Easley” — 2013/10/8 — 11:31 — page 235 — #255 i i INDEX and predicting price movement from order book state, 92–3 and price movement from order book state, predicting, 104–15 and reinforcement learning for optimised trade execution, 96–104 and smart order routing in dark pools, 115–22 market stress: and central bank interventions, 79–80 and flash crash (2010), 77–8; see also flash crash and yen appreciation (2007), 77 and yen appreciation (2011), 78–9 Markets in Financial Instruments Directive (MiFID), 2, 21, 143, 216 microstructural volatility: in futures markets, 125–41, 133, 134, 136, 137, 138–9 experimental evaluation, 133–40 HDF5 file format, 127 maximum intermediate return, 131–2 parallelisation, 132–3 test data, 126–7 and volume-synchronised probability of informed trading, 128–31 MIDAS, see Market Information Data Analytics System N Nasdaq, CME’s joint project with, xvi O optimised trade execution, reinforcement learning for, 96–104 order book imbalance, 36–8 order-flow toxicity contagion model, 146–51 see also liquidity and toxicity contagion order protection and fair value, 38–41, 40 P pack hunters, 9 parallelisation, 132–3 price movement from order book state, predicting, 104–15 pro rata matching, 44, 59–62 probability of informed trading (PIN), 7 Project Hiberni, xvi Q quote danglers, 9 quote stuffers, 9 R regulation and high-frequency markets, 81, 207–9, 210, 212, 214 good and bad news concerning, 208–14 solutions, 214–28 and greater surveillance and coordination, proposals for, 215–18 and market rules, proposals to change, 218–25 and proposals to curtail HFT, 225–8 Regulation National Market System (Reg NMS), 2, 21, 143, 219 Regulation SCI, 216 reinforcement learning for optimised trade execution, 96–104 Rothschild, Nathan Mayer, 1 S smart order routing in dark pools, 115–22 spread/price–time priority, 82–3 235 i i i i i i “Easley” — 2013/10/8 — 11:31 — page 236 — #256 i i HIGH-FREQUENCY TRADING T time, meaning of, and high-frequency trading, 5–7, 7 Tobin tax, 17, 81, 87 Tradeworx, 215 trading algorithms, 69–72 and algorithmic decision-making, 71–2 and algorithmic execution, 70–1 evolution of, 22–8 generations, 23–7 and indicator zoology, 27–8 see also algorithms trading frequencies, in currency market, 65–73, 72, 73; see also foreign-exchange markets trading signals: construction of, 31–8 and order book imbalance, 36–8 and timescales and weights, 31–3, 33 and trade sign autocorrelations, 34–6 transaction cost, and algorithms, 28–31 transitory price effects: approach to, illustrated, 192–203 daily estimation, 195–9 implementation shortfall calculations, 193–5 intra-day estimation, 199–203 and information shortfall, 185–206, 196, 197, 198, 199, 200, 202, 203 discussed, 186–9 implementation details, 204–5 and observed and efficient prices and pricing errors, 189–92 Treasury futures, 46, 47, 48, 51, 52, 55 V volume clock, 1–17, 7 and time, meaning of, 5–7 volume-synchronised probability of informed trading, 128–31 bars, 128 buckets, 129–30 cumulative distribution function, 130–1 volume classification, 128–9 W Walter, Elisse, 216 Waterloo, Battle of, 1 Y yen appreciation: 2007, 77 2011, 78–9 see also market stress 236 i i i i

…

Van Roy, 2012, “Strategic Execution in the Presence of an Uninformed Arbitrageur”, Journal of Financial Markets 15(4), pp. 361–91. Nevmyvaka, Y., Yi Feng and M. Kearns, 2006, “Reinforcement Learning for Optimized Trade Execution”, in Proceedings of the 23rd International Conference on Machine Learning, pp. 673–80. New York: ACM Press. Park, B., and B. van Roy, 2012, “Adaptive Execution: Exploration and Learning of Price Impact”, Technical Report, arXiv:1207:6423 [q-fin.TR]. Sutton, R. S., and A. G. Barto, 1998, Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press. 124 i i i i i i “Easley” — 2013/10/8 — 11:31 — page 125 — #145 i i 6 A “Big Data” Study of Microstructural Volatility in Futures Markets Kesheng Wu, E.

pages: 574 words: 164,509

Superintelligence: Paths, Dangers, Strategies
by Nick Bostrom
Published 3 Jun 2014

Mind crime seems especially difficult to avoid when evolutionary methods are used to produce human-like intelligence, at least if the process is meant to look anything like actual biological evolution.8 Reinforcement learning Reinforcement learning is an area of machine learning that studies techniques whereby agents can learn to maximize some notion of cumulative reward. By constructing an environment in which desired performance is rewarded, a reinforcement-learning agent can be made to learn to solve a wide class of problems (even in the absence of detailed instruction or feedback from the programmers, aside from the reward signal).

…

Furthermore, if designs are evaluated by running them—including designs that do not even meet the formal criteria—a potentially grave additional danger is created. Evolution also makes it difficult to avoid massive mind crime, especially if one is aiming to fashion human-like minds. Reinforcement learning A range of different methods can be used to solve “reinforcement-learning problems,” but they typically involve creating a system that seeks to maximize a reward signal. This has an inherent tendency to produce the wireheading failure mode when the system becomes more intelligent. Reinforcement learning therefore looks unpromising. Value accretion We humans acquire much of our specific goal content from our reactions to experience. While value accretion could in principle be used to create an agent with human motivations, the human value-accretion dispositions might be complex and difficult to replicate in a seed AI.

…

Since an agent with some very different kind of final goal might be skilled at mimicking a reward-seeking agent in a wide range of situations, and could thus be well suited to solving reinforcement learning problems, there could be methods that would count as “reinforcement learning methods” on Sutton’s definition that would not result in a wireheading syndrome. The remarks in the text, however, apply to most of the methods actually used in the reinforcement learning community. 10. Even if, somehow, a human-like mechanism could be set up within a human-like machine intellect, the final goals acquired by this intellect need not resemble those of a well-adjusted human, unless the rearing environment for this digital baby also closely matched that of an ordinary child: something that would be difficult to arrange.

pages: 688 words: 147,571

Robot Rules: Regulating Artificial Intelligence
by Jacob Turner
Published 29 Oct 2018

See also Chapter 2 at s. 3.2.1. 113For a helpful analysis of the barriers to the singularity, see Toby Walsh, Android Dreams (London: Hurst & Co., 2017), 89–136. 114Barret Zoph and Quoc V. Le, “Neural Architecture Search with Reinforcement Learning”, Cornell University Library Research Paper, 15 February 2017, https://arxiv.org/abs/1611.01578, accessed 1 June 2018. See also Tom Simonite, “AI Software Learns to Make AI Software”, MIT Technology Review, 17 January 2017, https://www.technologyreview.com/s/603381/ai-software-learns-to-make-ai-software/, accessed 1 June 2018. 115Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, and Pieter Abbeel, “RL2: Fast Reinforcement Learning via Slow Reinforcement Learning”, Cornell University Library Research Paper, 10 November 2016, https://arxiv.org/abs/1611.02779, accessed 1 June 2018. 116Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar, “Designing Neural Network Architectures Using Reinforcement Learning”, Cornell University Library Research Paper, 22 March 2017, https://arxiv.org/abs/1611.02167, accessed 1 June 2018. 117Jane X.

…

Bartlett, Ilya Sutskever, and Pieter Abbeel, “RL2: Fast Reinforcement Learning via Slow Reinforcement Learning”, Cornell University Library Research Paper, 10 November 2016, https://arxiv.org/abs/1611.02779, accessed 1 June 2018. 116Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar, “Designing Neural Network Architectures Using Reinforcement Learning”, Cornell University Library Research Paper, 22 March 2017, https://arxiv.org/abs/1611.02167, accessed 1 June 2018. 117Jane X. Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, and Matt Botvinick, “Learning to Reinforcement Learn”, Cornell University Library Research Paper, 23 January 2017, https://arxiv.org/abs/1611.05763, accessed 1 June 2018. 118It may be objected that this is a simplification, or even a caricature, and indeed many have expressed sentiments at different times which could be covered by each of these categories, and in reality, there are more points on a spectrum than strict alternatives.

…

In a sense, unsupervised learning can be thought of as finding patterns in the data above and beyond what would be considered pure unstructured noise.122 A particularly vivid example of unsupervised learning was a program that, after being exposed to the entire YouTube library, was able to recognise images of cat faces, despite the data being unlabelled.123 This process is not limited to frivolous uses such as feline identification: its applications include genomics as well as in social network analysis.124 Reinforcement learning, sometimes referred to as “weak supervision”, is a type of machine learning which maps situations and actions so as to maximise a reward signal. The program is not told which actions to take, but instead has to discover which actions yield the most reward through an iterative process: in other words, it learns through trying different things out.125 One use of reinforcement learning involves a program being asked to achieve a certain goal, but without being told how it should do so.

pages: 346 words: 97,890

The Road to Conscious Machines
by Michael Wooldridge
Published 2 Nov 2018

Thus, when we give the instruction ‘Stop my house being burgled’, what we mean is ‘Stop my house being burgled, while keeping everything else as close to how it is now as possible’. The challenge in all of these scenarios is for the computer to understand what it is we really want. The field of inverse reinforcement learning is directed at this problem. We saw regular reinforcement learning in Chapter 5: an agent acts in some environment, and receives a reward; the learning challenge is to find a course of actions that maximize the reward received. In inverse reinforcement learning, we instead get to see ‘ideal’ behaviour (what a person would do); the challenge is then to figure out what the relevant rewards are for the AI software.20 In short, the idea is to look at human behaviour as our model for desirable behaviour.

…

For example, if the only feature that you decided to include for your bad-risk program was the address of the customer, then this is likely to lead to the program learning to discriminate against people from certain neighbourhoods. The possibility that AI programs may become biased, and the problems this raises, are explored in more detail later. In reinforcement learning, we don’t give the program any explicit training data: it experiments by making decisions, and receives feedback on those decisions as to whether they were good or bad. For example, reinforcement learning is widely used to train programs to play games. The program plays the game, and gets positive feedback if it wins or negative feedback if it loses. The feedback it gets, whether positive or negative, is called the reward.

…

The reward is taken into account by the program the next time it plays. If it receives a positive reward, then that makes it more likely to play the same way; a negative reward would make it less likely to do so. A key difficulty with reinforcement learning is that in many situations, rewards may be a long time in coming, and this makes it hard for the program to know which actions were good and which were bad. Suppose our reinforcement-learning program loses a game. What does that loss tell it, or indeed, us, about any individual move made during the game? Concluding that all the moves were bad would probably be an over-generalization.

Cognitive Gadgets: The Cultural Evolution of Thinking
by Cecilia Heyes
Published 15 Apr 2018

However, in everyday life, it is often social things that we are able to control, and therefore the reactions of other agents are a major source of response-contingent stimulation. Evidence that our enjoyment of contingencies is genetically inherited comes from a study showing that, in human newborns, actions that are followed by response-contingent stimulation increase in frequency (they are subject to reinforcement learning) not only when the stimulation is biologically relevant—for example, the delivery of milk—but also when it consists of short, emotionally neutral speech sounds (Floccia, Christophe, and Bertoncini, 1997). This suggests that, for human newborns, some stimuli are attractive not by virtue of their intrinsic, biological properties (primary reinforcers) or because they have been paired with primary reinforcers (secondary reinforcers) but simply because they have been experienced in a contingent relation with the infant’s responses.

…

Thus, oxytocin is part of a system that has been repeatedly tweaked by genetic evolution to fulfil new functions. For example, in vole species, the transition from primarily promiscuous to primarily monogamous mating systems involved the insertion of an oxytocin (or vasopressin) receptor gate in a basic mechanism of reinforcement learning (Insel and Young, 2001). Therefore, any changes that occurred in the hominin line are likely to have been further tweaks or small adjustments to exactly how the system works in concert with others, and in different stimulus environments, conserving and extending old functions (Anderson and Finlay, 2014).

…

Therefore, domain-general processes of associative learning are sufficient to explain why, in the first year of life, a simple preference for inverted triangles of blobs (Figure 3.1) becomes a highly robust and selective preference for fellow humans “looking at me” (Heyes, 2016b). Associative learning and, more specifically, reinforcement learning also explain how infants, building on an inborn face preference, develop gaze-cuing: a tendency to direct attention to the object, or area of space, to which another agent is attending (Moore and Corkum, 1994). Gaze-cuing, as opposed to motion-cuing, first appears at two to four months of age (Hood, Willen, and Driver, 1998; Farroni, Mansfield, Lai, and Johnson, 2003) and immediately expands the infants’ opportunities to learn from and through others.

pages: 543 words: 153,550

Model Thinker: What You Need to Know to Make Data Work for You
by Scott E. Page
Published 27 Nov 2018

Thus, if the parent makes banana pancakes again, the weight on banana pancakes decreases because the reward from banana pancakes lies below the new aspiration level. Reinforcement learning therefore converges to only apple pancakes being selected. It can be proven that reinforcement learning will converge toward selecting the best alternative with probability 1. That means that the weight on the best alternative will become arbitrarily large compared to the weights on all other alternatives. Reinforcement Learning Works In the learning-the-best-alternative framework, reinforcement learning with the aspiration level set equal to the average earned reward (eventually) almost always selects the best alternative.

…

In replicator dynamics, the average reward performs a function similar to that of the aspiration level in reinforcement learning when the aspiration level adjusts to equal the average reward. The only difference is that in replicator dynamics, we calculate the average reward for a population. In reinforcement learning, the aspiration level equals an individual’s average reward. That distinction matters insofar as a population provides a larger sample. Thus, replicator dynamics produce less path dependence than reinforcement learning. In our construction of replicator dynamics, we assume that every alternative exists in the initial population.

…

We conclude with a discussion of more sophisticated learning rules.1 Individual Learning: Reinforcement In reinforcement learning, an individual chooses actions based on the weights of those actions. Actions with a lot of weight are chosen more often than actions with little weight. The weight assigned to an action depends on the reward (payoff) that a person has received from taking that action in the past. This reinforcement of high-reward payoffs leads to better actions being taken. The question we explore is whether reinforcement learning converges to only choosing the alternative with the highest reward. At first, it may seem that to choose the most rewarding alternative is a trivial task.

pages: 296 words: 66,815

The AI-First Company
by Ash Fontana
Published 4 May 2021

In terms of data creation, GANs come up with features that the designer of the model didn’t even consider because the evaluation of the model feeds back into the next run such that both the generator and discriminator create new data distributions in the next run. In terms of data accumulation, GANs can accumulate data at an increasing rate because the new models may be much more effective than the previous models. With each automatic run of the network, GANs add features until there is no more discrimination to do. Reinforcement learning (RL): Reinforcement learning, an area of ML, involves developing agents that optimize for a reward—in other words, creating a (software) agent that has ML at its disposal to reach a goal, such as winning a computer game. In terms of data creation, the agent collects observations each time it performs an action, and those observations may be novel because the creator never programmed a model to take the actions the RL agent took.

…

These features change the function of the software—without further design work—as they learn from inputs. This is the point of difference: simulations find failures in normal software but find improvements in AIs. Simulations thus present a significant opportunity to improve AIs, particularly reinforcement learning and other, agent-based learning AIs. Such models typically need to try multiple approaches, and, since existing datasets are insufficient or unavailable, simulators are useful. The models have a goal from the designer of the model and an environment in which they can try lots of different approaches to achieving that goal—a simulator—that incorporates the constraints of the real world in which the model must ultimately work.

…

Creating an ABM requires figuring out how many agents are in the system, how they make decisions, how they learn from their actions, and how those actions affect the actions of others in the system. Agents follow programmed rules. Sometimes programming an agent requires expertise in a specific domain—understanding the “rules of the game,” or the principles of the system. Programmers create ABMs using techniques such as adversarial and reinforcement learning. Popular agent-based systems include some that play John Conway’s Game of Life and solve the prisoner’s dilemma. Financial and political institutions often use ABMs. For example, the Bank of England uses ABMs to model the impact of policy on property and credit markets. The effects of mergers and acquisitions on competition can be modeled effectively using simulation.

pages: 414 words: 109,622

Genius Makers: The Mavericks Who Brought A. I. To Google, Facebook, and the World
by Cade Metz
Published 15 Mar 2021

At one point, he described the various ways a machine could learn. “There are two kinds of learning algorithms—actually three, but the third doesn’t work very well,” he said. “That is called reinforcement learning.” His audience, several hundred AI researchers strong, let out a laugh. So he went further. “There is a wonderful reductio ad absurdum of reinforcement learning,” he told the crowd. “It is called DeepMind.” Hinton did not believe in reinforcement learning, the method Demis Hassabis and DeepMind saw as the path to AGI. It required too much data and too much processing power to succeed with practical tasks in the real world.

…

It was homing in on particular strategies because they won the most reward—in this case, the most points. This technique—reinforcement learning—was not something Google was exploring, but it was a primary area of research inside DeepMind. Shane Legg had embraced the concept after his postdoctoral advisor published a paper arguing that the brain worked in much the same way, and the company had hired a long list of researchers who specialized in the idea, including David Silver. Reinforcement learning, Alan Eustace believed, allowed DeepMind to build a system that was the first real attempt at general AI. “They had superhuman performance in like half the games, and in some cases, it was astounding,” he says.

…

When asked if belief in AGI was something like a religion, he demurred: “It’s not nearly as dark as religion.” That same year, Pieter Abbeel asked him to invest in Covariant. And when Hinton saw what reinforcement learning could do for Abbeel’s robots, he changed his mind about the future of AI research. As Covariant’s systems moved into the warehouse in Berlin, he called it “the AlphaGo moment” for robotics. “I have always been skeptical of reinforcement learning, because it required an extraordinary amount of computation. But we’ve now got that,” he said. Still, he didn’t believe in building AGI. “The progress is being made by tackling individual problems—getting a robot to fix things or understanding a sentence so that you can translate—rather than people building general AI,” he said.

Artificial Whiteness
by Yarden Katz

Like every stream of computing that at some point appeared under the umbrella of “AI,” reinforcement learning has been developed in other fields and under different labels. The most popular reinforcement learning textbook was published in the late 1990s; see Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction (Cambridge, Mass.: MIT Press, 1998). Yet the rebranded AI’s narratives present it as an utterly new development. MIT Technology Review, for example, featured “reinforcement learning” in its list of “breakthrough technologies” in 2014. 48. B. F. Skinner, Reflections on Behaviorism and Society (Englewood Cliffs, N.J.: Prentice-Hall, 1978). 49.

…

The newly celebrated AI systems do this by appealing to behaviorist psychology, whose proponents have also offered ways to present social arrangements created by an elite class as natural. DeepMind’s core principle, for instance, is a behaviorist one: that “intelligent” behavior can be programmed via the right regimen of rewards and punishments (figure 3.3).46 The success of DeepMind’s systems is attributed not only to deep networks but also to reinforcement learning.47 Reinforcement learning imbibes the behaviorist principle that intelligence arises by individual agents being disciplined by the environment. This understanding of the self discounts people’s knowledge. We can see this through the work of behaviorism’s famous champion, B. F. Skinner. Like Hayek, Skinner was suspicious of explicit knowledge and believed individuals can be objectively assessed only through their “behavior.”

…

Tegmark concludes that “among professional AI researchers,” dismissals of human-level AI “have become minority views because of recent breakthroughs” and there is “strong expectation” that human-level AI will be achieved within a century.25 The very preoccupation with these predictions contributes to the sense that human-level AI is viable or near. What would such human-level AI look like, according to experts, were it to arrive? The vision tends to be set by the day’s most attention-grabbing systems. Currently, these are computing systems that use neural networks, sometimes in combination with reinforcement learning, a set of frameworks in which computational agents are trained to learn by reinforcement signals (described in more detail below). These systems carry many narratives of AI’s triumph over human thought. In some respects, these exemplars of AI from the 2010s are radically different from the celebrated AI systems of the 1960s.

Hands-On Machine Learning With Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
by Aurelien Geron
Published 14 Aug 2019

…

Getting insights about complex problems and large amounts of data. Types of Machine Learning Systems There are so many different types of Machine Learning systems that it is useful to classify them in broad categories based on: Whether or not they are trained with human supervision (supervised, unsupervised, semisupervised, and Reinforcement Learning) Whether or not they can learn incrementally on the fly (online versus batch learning) Whether they work by simply comparing new data points to known data points, or instead detect patterns in the training data and build a predictive model, much like scientists do (instance-based versus model-based learning) These criteria are not exclusive; you can combine them in any way you like.

…

Let’s look at each of these criteria a bit more closely. Supervised/Unsupervised Learning Machine Learning systems can be classified according to the amount and type of supervision they get during training. There are four major categories: supervised learning, unsupervised learning, semisupervised learning, and Reinforcement Learning. Supervised learning In supervised learning, the training data you feed to the algorithm includes the desired solutions, called labels (Figure 1-5). Figure 1-5. A labeled training set for supervised learning (e.g., spam classification) A typical supervised learning task is classification.

pages: 337 words: 103,522

The Creativity Code: How AI Is Learning to Write, Paint and Think
by Marcus Du Sautoy
Published 7 Mar 2019

The data is small when one considers all the possible paths each game might take, but it provides a good basis for playing – although its future opponent might not go down the path that the losing player did in the database, and that is why just using this data set wasn’t going to be enough. The second phase, known as reinforcement learning, is what gave the algorithm the edge in the long run. At this point the computer started to play itself, learning from each new game that it generated. As certain seemingly winning moves failed, the algorithm changed the probabilities that such a move would win the game. This reinforcement learning synthetically generates a huge swathe of new game data. And by playing itself the algorithm has a chance to probe its own weaknesses. One of the dangers of this reinforcement learning is that it can be limited and self-reinforcing.

…

Mirowski, ‘Improvised Comedy as a Turing Test’, arXiv:1711.08819 2017 Matuszewski, Roman and Piotr Rudnicki, ‘MIZAR: The First 30 Years’, Mechanized Mathematics and Its Applications, vol. 4, 3–24 (2005) Melis, Gábor, Chris Dyer and Phil Blunsom, ‘On the State of the Art of Evaluation in Neural Language Models’, arXiv:1707. 05589v2 (2017) Mikolov, Tomas, et al., ‘Efficient Estimation of Word Representations in Vector Space’, arXiv:1301.3781 (2013) Mnih, Volodymyr, et al., ‘Playing Atari with Deep Reinforcement Learning’, arXiv:1312.5602v1 (2013) Mnih, Volodymyr, et al., ‘Human-Level Control through Deep Reinforcement Learning’, Nature, vol. 518(7540), 529–33 (2015) Narayanan, Arvind and Vitaly Shmatikov, ‘Robust De-anonymization of Large Datasets (How to Break Anonymity of the Netflix Prize Dataset)’, arXiv:cs/0610105 v2 (2007) Nguyen, Anh Mai, Jason Yosinski and Jeff Clune, ‘Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images’, Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 427–36 (2015) Pachet, François, ‘The Continuator: Musical Interaction with Style’, presented at the International Computer Music Conference, Journal of New Music Research, vol. 31(1) (2002) and Pierre Roy, ‘Markov Constraints: Steerable Generation of Markov Sequences’, Constraints, vol. 16, 148–72 (2011) , et al., ‘Reflexive Loopers for Solo Musical Improvisation’, presented at the ACM SIGCHI Conference on Human Factors in Computing Systems (2013) Riedl, Mark O. and Vadim Bulitko, ‘Interactive Narrative: An Intelligent Systems Approach’, AI Magazine, vol. 34, 67–77 (2013) Roy, Pierre, Alexandre Papadopoulos and François Pachet, ‘Sampling Variations of Lead Sheets’, arXiv:1703.00760 (2017) Silver, David, et al., ‘Mastering the Game of Go with Deep Neural Networks and Tree Search’, Nature, vol. 529(7587), 484–9 (2016) Stern, David H., Ralf Herbrich and Thore Graepel, ‘Matchbox: Large Scale Online Bayesian Recommendations’, in WWW ’09: Proceedings of the 18th International World Wide Web Conference, 111–20 (2009) Tesauro, Gerald, et al., ‘Analysis of Watson’s Strategies for Playing Jeopardy!’

…

The program was not told the rules of the game: it had to experiment randomly with different ways of moving the paddle in Breakout or firing the laser cannon at the descending aliens of Space Invaders. Each time it made a move it could assess whether the move had helped increase the score or had had no effect. The code implements an idea dating from the 1990s called reinforcement learning, which aims to update the probability of actions based on the effect on a reward function or score. For example, in Breakout the only decision is whether to move the paddle at the bottom left or right. Initially the choice will be 50:50. But if moving the paddle randomly results in it hitting the ball, then a short time later the score goes up.

pages: 194 words: 57,434

The Age of AI: And Our Human Future
by Henry A Kissinger , Eric Schmidt and Daniel Huttenlocher
Published 2 Nov 2021

In both unsupervised and supervised learning, AIs chiefly use data to perform tasks such as discovering trends, identifying images, and making predictions. Looking beyond data analysis, researchers sought to train AIs to operate in dynamic environments. A third major category of machine learning, reinforcement learning, was born. In reinforcement learning, AI is not passive, identifying relationships within data. Instead, AI is an “agent” in a controlled environment, observing and recording responses to its actions. Generally these are simulated, simplified versions of reality lacking real-world complexities. It is easier to accurately simulate the operation of a robot on an assembly line than it is in the chaos of a crowded city street.

…

This is a fundamental challenge of deploying machine learning: different goals and functions require different training techniques. But from the combination of various machine-learning methods—notably, the use of neural networks—new possibilities, such as cancer-spotting AIs, emerge. As of this writing, three forms of machine learning are noteworthy: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning produced the AI that discovered halicin. To recap, when MIT researchers wanted to identify potential new antibiotics, they used a two-thousand-molecule database in order to train a model in which molecular structure was input and antibiotic effectiveness was output. Researchers presented the AI with the molecular structures, each labeled according to its antibiotic effectiveness.

…

Ideally, the simulator provides a realistic experience, and the reward function promotes effective decisions. AlphaZero’s simulator was straightforward: it played against itself. Then, to assess its performance, it employed a reward function2 that scored its moves based on the opportunities they created. Reinforcement learning requires human involvement in creating the AI training environment (even if not in providing direct feedback during the training itself): humans define a simulator and reward function, and the AI trains itself on that basis. For meaningful results, careful specification of the simulator and the reward function is vital.

Four Battlegrounds
by Paul Scharre
Published 18 Jan 2023

Large language models such as GPT-2 and GPT-3 use unsupervised learning. Once trained, they can output sentences and whole paragraphs based on patterns they’ve learned from the text on which they’ve been trained. Reinforcement learning is when an algorithm learns by interacting with its environment and gets rewards for certain behaviors. Many AI agents that play computer games learn by reinforcement learning, getting feedback for actions that rack up a higher score. One way in which machine learning systems can fail is if the training data does not sufficiently represent the AI system’s operating environment.

…

Luke Muehlhauser (personal website), n.d., http://lukemuehlhauser.com/industrial-revolution/. 265tens of thousands of tanks and airplanes: Paul Kennedy, The Rise and Fall of the Great Powers (New York, Random House, 1987), 353–354. 265Clinical methods of mathematical optimization: The Fog of War (film), directed by Errol Morris, 2003. 265untethered by the limits of the human body: Paul Scharre, Robotics on the Battlefield - Part I: Range, Persistence and Daring (Center for a New American Security, May 21, 2014), https://www.cnas.org/publications/reports/robotics-on-the-battlefield-part-i-range-persistence-and-daring. 266robotic systems can be made cheaper: Paul Scharre, Robotics on the Battlefield Part II: The Coming Swarm (Center for a New American Security, October 15, 2014), https://www.cnas.org/publications/reports/robotics-on-the-battlefield-part-ii-the-coming-swarm. 266a “superhuman capability”: Joseph Trevithick, “AI Claims ‘Flawless Victory’ Going Undefeated In Digital Dogfight With Human Fighter Pilot,” The Drive, August 20, 2020, https://www.thedrive.com/the-war-zone/35888/ai-claims-flawless-victory-going-undefeated-in-digital-dogfight-with-human-fighter-pilot. 266AlphaGo calculated the odds: Cade Metz, “In Two Moves, AlphaGo and Lee Sedol Redefined the Future,” Wired, March 16, 2016, https://www.wired.com/2016/03/two-moves-alphago-lee-sedol-redefined-future/. 266plays differently than humans: Dawn Chan, “The AI That Has Nothing to Learn from Humans,” The Atlantic, October 20, 2017, https://www.theatlantic.com/technology/archive/2017/10/alphago-zero-the-ai-that-taught-itself-go/543450/. 266“It splits its bets”: Cade Metz, “A Mystery AI Just Crushed the Best Human Players at Poker,” Wired, January 31, 2017, https://www.wired.com/2017/01/mystery-ai-just-crushed-best-human-players-poker/. 267“ferocious, unexpected attacks”: Matthew Sadler and Natasha Regan, Game Changer: AlphaZero’s Groundbreaking Chess Strategies and the Promise of AI (Amsterdam: New in Chess, February 15, 2019), 135, https://www.amazon.com/Game-Changer-AlphaZeros-Groundbreaking-Strategies/dp/9056918184. 267play deviates in important ways: Sadler and Regan, Game Changer, 135–136. 267sacrifice chess pieces early: Natasha Regan and Matthew Sadler, “Game Changer: AlphaZero revitalizing the attack,” interview by Aditya Pai, Chess News, January 31, 2019, https://en.chessbase.com/post/interview-with-natasha-regan-and-matthew-sadler; Sadler and Regan, Game Changer, 102, 136. 267combining attacks and maximizing its use of mobility: Sadler and Regan, Game Changer, 136. 267ignores some conventional wisdom that humans have developed: Sadler and Regan, Game Changer, 122–123. 267ability to search 60,000 moves per second: David Silver et al., “AlphaZero: Shedding New Light on Chess, Shogi, and Go,” DeepMind Blog, December 6, 2018, https://deepmind.com/blog/article/alphazero-shedding-new-light-grand-games-chess-shogi-and-go. 267valuable test beds for AI agents: Michael Buro, “Real-Time Strategy Games: A New AI Research Challenge,” in Proceedings of the 18th International Joint Conference on Artificial Intelligence (International Joint Conferences on Artificial Intelligence, 2003), http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.96.6742. 267StarCraft II: Oriol Vinyals et al., “Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning,” Nature 575 (2019), 350, https://doi.org/10.1038/s41586-019-1724-z. 268approximately 20,000 time steps: Open AI et al., Dota 2 with Large Scale Deep Reinforcement Learning (arXiv.org, December 13, 2019), 2, https://arxiv.org/pdf/1912.06680.pdf. 268command and control: Congressional Research Service, “Defense Primer: What Is Command and Control?,” November 29, 2021, https://sgp.fas.org/crs/natsec/IF11805.pdf. 268micro play: An example of AI micro on display in StarCraft II: Automaton2000Micro, “Automaton 2000 Micro—Dodging Siege Tanks,” April 4, 2011, https://www.youtube.com/watch?

…

v=rTtg_DdOqPQ. 269“It felt like I was pressured”: Mike, “OpenAI & DOTA 2: Game Is Hard.” 269human grandmasters can look only fifteen to twenty moves ahead: Magnus Carlsen, “Magnus Carlsen: The 19-Year-Old King of Chess,” interview by Eben Harrell, Time, December 25, 2009, http://content.time.com/time/world/article/0,8599,1948809,00.html 269AlphaZero’s 60,000 positions per second: David Silver, et al., A General Reinforcement Learning Algorithm That Masters Chess, Shogi and Go Through Self-Play (googleusercontent.com, n.d.,) https://kstatic.googleusercontent.com/files/2f51b2a749a284c2e2dfa13911da965f4855092a179469aedd15fbe4efe8f8cbf9c515ef83ac03a6515fa990e6f85fd827dcd477845e806f23a17845072dc7bd. 269capture-the-flag computer games: Max Jaderberg et al., “Human-Level Performance in 3D Multiplayer Games with Population-Based Reinforcement Learning,” Science 364, no. 6443 (May 31, 2019), 3, https://doi.org/10.1126/science.aau6249. 269take in information about the whole map: OpenAI et al., Dota 2, 4, 39. 269redeploying pieces that are no longer needed: Regan and Sadler, “Game Changer: AlphaZero revitalizing the attack.” 270superhuman attentiveness of AI agents: AlphaZero has committed blunders in chess but they are rare.

pages: 424 words: 114,905

Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again
by Eric Topol
Published 1 Jan 2019

Funny that in the 1990s, when we did classification and regression trees to let the data that we collected speak for itself, go into “auto-analyze” mode, without our bias of interpretation, we didn’t use the term “machine learning.” But now that form of statistics has undergone a major upgrade and achieved venerability. In recent years, AI tools have expanded to deep network models such as deep learning and reinforcement learning (we’ll get into more depth in Chapter 4). The AI subtype of deep learning has gained extraordinary momentum since 2012, when a now-classic paper was published on image recognition.2 FIGURE 1.1: The increase in deep learning AI algorithms since the 2012 image recognition paper. Sources: Panel A adapted from A.

…

It’s also useful to think of algorithms as existing on a continuum from those that are entirely human guided to those that are entirely machine guided, with deep learning at the far machine end of the scale.12 Artificial Intelligence—the science and engineering of creating intelligent machines that have the ability to achieve goals like humans via a constellation of technologies Neural Network (NN)—software constructions modeled after the way adaptable neurons in the brain were understood to work instead of human guided rigid instructions Deep Learning—a type of neural network, the subset of machine learning composed of algorithms that permit software to train itself to perform tasks by processing multilayered networks of data Machine Learning—computers’ ability to learn without being explicitly programmed, with more than fifteen different approaches like Random Forest, Bayesian networks, Support Vector machine uses, computer algorithms to learn from examples and experiences (datasets) rather than predefined, hard rules-based methods Supervised Learning—an optimization, trial-and-error process based on labeled data, algorithm comparing outputs with the correct outputs during training Unsupervised Learning—the training samples are not labeled; the algorithm just looks for patterns, teaches itself Convolutional Neural Network—using the principle of convolution, a mathematical operation that basically takes two functions to produce a third one; instead of feeding in the entire dataset, it is broken into overlapping tiles with small neural networks and max-pooling, used especially for images Natural-Language Processing—a machine’s attempt to “understand” speech or written language like humans Generative Adversarial Networks—a pair of jointly trained neural networks, one generative and the other discriminative, whereby the former generates fake images and the latter tries to distinguish them from real images Reinforcement Learning—a type of machine learning that shifts the focus to an abstract goal or decision making, a technology for learning and executing actions in the real world Recurrent Neural Network—for tasks that involve sequential inputs, like speech or language, this neural network processes an input sequence one element at a time Backpropagation—an algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation on the previous layer passing values backward through the network; how the synapses get updated over time; signals are automatically sent back through the network to update and adjust the weighting values Representation Learning—set of methods that allows a machine with raw data to automatically discover the representations needed for detection or classification Transfer Learning—the ability of an AI to learn from different tasks and apply its precedent knowledge to a completely new task General Artificial Intelligence—perform a wide range of tasks, including any human task, without being explicitly programmed TABLE 4.1: Glossary.

…

The first sentence of the Nature paper describing the undertaking is an important foreshadowing for subsequent AI DNN: “We set out to create a single algorithm that would be able to develop a wide range of competencies on a varied range of challenging tasks—a central goal of artificial intelligence that has eluded previous efforts.” The algorithm integrated a convolutional neural network with reinforcement learning, maneuvering a paddle to hit a brick on a wall.26 This qualified as a “holy shit” moment for Max Tegmark, as he recounted in his book Life 3.0: “The AI was simply told to maximize the score by outputting, at regular intervals, numbers which we (but not the AI) would recognize as codes for which keys to press.”

Demystifying Smart Cities
by Anders Lisdorf

There are different techniques for determining what qualifies as a cluster and how strong it is. But the important part is that we can learn new things about the data and discover unknown patterns. Reinforcement learning Reinforcement learning differs in that it continually changes and adapts to the problem. Whereas supervised learning builds on a training set and comes up with a model that does not change until next time it is trained, reinforcement learning continues to adapt to the problem based on success or failure. In this way, it works on a balance between exploration (based on discovering new information) and exploitation (using existing information).

…

Machine learning only deals with how computers can perform tasks similar to humans, which is what is relevant in a smart city context. Machine learning There are different ways to think about the different types of machine learning, but consensus is that there are three general categories of machine learning algorithms: supervised, unsupervised, and reinforcement learning. Supervised learning Supervised learning always requires that the algorithm be given data in the form of examples that it can learn from and extract a pattern that can be used for inference in new situations. This is called training data. From this data, the machine learning algorithm extracts patterns that it can use to identify similar observations based on new data.

…

Classification errors are also minimized. With this system, service is more quickly routed with the use of less resources resulting in better service using less resources. Summary In this chapter, we have considered the history of AI and the main classes of supervised, unsupervised, and reinforcement learning. There are different algorithms in each different class that carries their own strengths and weaknesses. Choosing the right one for the use case is crucial for success. However, understanding the techniques is just a small part of the puzzle. Once AI hits reality, a whole new realm of issues arises from expectations and human nature.

pages: 345 words: 75,660

Prediction Machines: The Simple Economics of Artificial Intelligence
by Ajay Agrawal , Joshua Gans and Avi Goldfarb
Published 16 Apr 2018

Prediction takes information you have, often called “data,” and uses it to generate information you don’t have. Much discussion about AI emphasizes the variety of prediction techniques using increasingly obscure names and labels: classification, clustering, regression, decision trees, Bayesian estimation, neural networks, topological data analysis, deep learning, reinforcement learning, deep reinforcement learning, capsule networks, and so on. The techniques are important for technologists interested in implementing AI for a particular prediction problem. In this book, we spare you the details of the mathematics behind the methods. We emphasize that each of these methods is about prediction: using information you have to generate information you don’t have.

…

By contrast, what happens when you do not have good data on what you are trying to predict, but you can tell, after the fact, how right you were? In that situation, as we discussed in chapter 2, computer scientists deploy techniques of reinforcement learning. Many young children and animals learn this way. The psychologist Pavlov rang a bell when giving dogs a treat and then found that ringing the bell triggered a saliva response in those dogs. The dogs learned to associate the bell with receiving food and came to know that a bell predicted nearby food and prepared accordingly. In AI, much progress in reinforcement learning has come in teaching machines to play games. DeepMind gave its AI a set of controls to video games such as Breakout and “rewarded” the AI for getting a higher score without any other instructions.

…

So, in order to grasp in a warehouse setting, robots must be able to “see” the object (analyze the image) and predict the right angle and pressure in order to hold the object and not drop or crush it. In other words, prediction is at the root of grasping the wide variety of objects in a fulfillment center. Research into the grasping problem uses reinforcement learning to train robots to mimic humans. The Vancouver-based startup Kindred—founded by Suzanne Gildert, Geordie Rose, and a team that includes one of us (Ajay)—is using a robot called Kindred Sort, an arm with a mix of automated software and a human controller.2 Automation identifies an object and where it needs to go, while the human—wearing a virtual reality headset—guides the robot arm to pick it up and move it.

Psychopathy: An Introduction to Biological Findings and Their Implications
by Andrea L. Glenn and Adrian Raine
Published 7 Mar 2014

The hypothesis is that there may be common genes associated with the reward systems of the brain that may predispose one to the reward-seeking tendencies observed in psychopathy and also increase vulnerability to developing addictive behaviors (Hoenicka et al. 2007). Specific polymorphisms may affect the functioning of brain regions involved in reward, such as the nucleus accumbens, resulting in heightened reward-seeking tendencies or impulsivity resulting from a diminished ability to delay rewards. Similarly, brain regions involved in stimulus-reinforcement learning (i.e., the ability to learn from reward and punishment), such as the amygdala, may be affected by altered transmission of dopamine and endocannabinoids resulting from genetic polymorphisms. Additional genetic polymorphisms that have been studied relate to other neurotransmitter systems, such as the serotonin system, as well as to molecules that break down neurotransmitters.

…

In addition, youth with fear-processing deficits may not find the fearful expressions of others to be aversive, and therefore may engage in harmful acts despite this nonverbal feedback from peers (Blair, Colledge, Murray, et al. 2001). The result may be the development of callous and unemotional traits, including a disregard for the needs of other people, shallow affect, and lack of remorse and empathy. In animal studies, lesions to the amygdala have also been linked to impairments in the process of stimulus-reinforcement learning (i.e., the ability to learn from reward and punishment) (Schoenbaum, Chiba, and Gallagher 1999). Similarly, psychopathic individuals perform worse on neuropsychological tasks designed to measure this ability. One example is the passive avoidance paradigm. In this task, participants must learn 94 << Neuropsychology to approach stimuli that become associated with reward (e.g., money) and avoid those that become associated with punishment (e.g., loss of money).

…

Some studies show that individuals with psychopathic traits also show impairments on this task (Blair, Colledge, and Mitchell 2001, Mitchell et al. 2002), although deficits may be specific to the Lifestyle-Antisocial Factor 2 features of psychopathy (Mahmut, Homewood, and Stevenson 2008, Mitchell et al. 2002). Performance on the Iowa Gambling Task requires a number of different cognitive processes. Evidence suggests that the key process that Neuropsychology >> 95 may be compromised in patients with damage to the ventromedial PFC is reversal learning. Reversal learning is a form of stimulus-reinforcement learning that requires for associations to be updated as reinforcement contingencies change. For example, an individual may initially learn to make a response in order to gain a reward, but then contingencies change so that the correct response no longer results in reward and a new response must be learned to achieve the reward.

Know Thyself
by Stephen M Fleming
Published 27 Apr 2021

Computational neuroscientists have shown that exactly this kind of progression—from computing local features to representing more global properties—can be found in the ventral visual stream of human and monkey brains.7 Scaled-up versions of this kind of architecture can be very powerful indeed. By combining artificial neural networks with reinforcement learning, the London-based technology company DeepMind has trained algorithms to solve a wide range of board and video games, all without being instructed about the rules in advance. In March 2016, its flagship algorithm, AlphaGo, beat Lee Sedol, the world champion at the board game Go and one of the greatest players of all time.

…

In “operant” or instrumental conditioning, the animal or human needs to perform an action in order to get a reward. 15. Seymour et al. (2004); O’Doherty et al. (2003); Sutton and Barto (2018). Prediction errors are a key mathematical variable needed to train learning algorithms in a branch of computer science known as reinforcement learning (RL). RL suggests that when learning is complete, no additional dopamine needs to be released, just as Schultz found: the monkey has come to expect the juice after the light, and there is no more error in his prediction. But it also predicts that if the juice is unexpectedly taken away, the baseline dopamine response dips—a so-called negative prediction error.

…

Annual Review of Neuroscience 28 (2005): 157–189. Botvinick, Matthew M., and Johnathan Cohen. “Rubber Hands ‘Feel’ Touch That Eyes See.” Nature 391, no. 6669 (1998): 756. Botvinick, Matthew M., Yael Niv, and Andew G. Barto. “Hierarchically Organized Behavior and Its Neural Foundations: A Reinforcement Learning Perspective.” Cognition 113, no. 3 (2009): 262–280. Braitenberg, Valentino. Vehicles: Experiments in Synthetic Psychology. Cambridge, MA: MIT Press, 1984. Bretherton, Inge, and Marjorie Beeghly. “Talking About Internal States: The Acquisition of an Explicit Theory of Mind.” Developmental Psychology 18, no. 6 (1982): 906–921.

pages: 472 words: 117,093

Machine, Platform, Crowd: Harnessing Our Digital Future
by Andrew McAfee and Erik Brynjolfsson
Published 26 Jun 2017

While AlphaGo did incorporate efficient searches through large numbers of possibilities—a classic element of rule-based AI systems—it was, at its core, a machine learning system. As its creators write, it’s “a new approach to computer Go that uses . . . deep neural networks . . . trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play.” AlphaGo is far from an isolated example. The past few years have seen a great flourishing of neural networks. They’re now the dominant type of artificial intelligence by far, and they seem likely to stay on top for some time. For this reason, the field of AI is finally fulfilling at least some of its early promise.

…

The software engineer Jeff Dean,** who heads Google’s efforts to use the technology, notes that as recently as 2012 the company was not using it at all to improve products like Search, Gmail, YouTube, or Maps. By the third quarter of 2015, however, deep learning was being used in approximately 1,200 projects across the company, having surpassed the performance of other methods. DeepMind, which has been particularly effective in combining deep learning with another technique called reinforcement learning,†† has turned its attention and its technologies not only to the information products that the company delivers to its customers, but also to critical processes in the physical world. Google runs some of the world’s largest data centers, which are extremely energy-intensive facilities. These buildings must supply power to as many as 100,000 servers while also keeping them cool.

…

Efforts like his lead us to agree with Google’s Kaz Sato, who says, “It’s not hyperbole to say that use cases for machine learning and deep learning are only limited by our imaginations.” As we write this book, almost all commercial successes in the field so far have used supervised learning techniques, and a few have used reinforcement learning (for instance, the data center optimized by DeepMind). However, the main way humans learn is through unsupervised learning. A toddler learns everyday physics by playing with blocks, pouring water out of a glass, throwing a ball, and falling off a chair—not by being taught Newton’s laws of motion or memorizing equations like F = ma.

pages: 477 words: 75,408

The Economic Singularity: Artificial Intelligence and the Death of Capitalism
by Calum Chace
Published 17 Jul 2016

[lxix] The system uses a hybrid of AI techniques: it was partly programmed by its creators, but it also taught itself using a machine learning approach called deep reinforcement learning. (Reinforcement learning is halfway between two other important forms of machine learning: supervised and unsupervised learning. In supervised learning the system is given an example to follow at each step. In unsupervised learning there are no examples. In reinforcement learning there are rewards for successful steps and penalties for unsuccessful steps. The system has to figure out how to behave according to those signals.

…

AlphaGo’s achievement was another landmark in computer science, and perhaps equally a landmark in human understanding that something important is happening, especially in the Far East, where the game of Go is far more popular than it is in the West. DeepMind did not rest on its laurels. A month after its European Go victory it presented a system able to navigate a maze in a video game without access to any maps, or to the code of the game. Using a technique called asynchronous reinforcement learning, the system looked at the screen and ran scenarios through multiple versions of itself.[lxxi] The ability to navigate by sight, like humans do, will be invaluable for AIs in many real-world applications. Self-driving vehicles Another landmark demonstration of the power of AI began inauspiciously in 2004.

…

Perhaps it really isn’t hard, but just seems hard because we are not yet optimised for it. [lxviii] https://www.youtube.com/watch?v=Skfw282fJak [lxix] http://futureoflife.org/2016/01/27/are-humans-dethroned-in-go-ai-experts-weigh-in/ [lxx] http://www.nervanasys.com/demystifying-deep-reinforcement-learning/ [lxxi] https://www.newscientist.com/article/2076552-google-deepmind-ai-navigates-a-doom-like-3d-maze-just-by-looking/ [lxxii] http://www.popsci.com/scitech/article/2004-06/darpa-grand-challenge-2004darpas-debacle-desert [lxxiii] https://www.theguardian.com/technology/2016/mar/09/google-self-driving-car-crash-video-accident-bus [lxxiv] http://www.wsj.com/articles/toyota-to-invest-1-billion-in-artificial-intelligence-firm-1446790646 [lxxv] http://www.forbes.com/sites/chunkamui/2015/12/23/5-reasons-why-automakers-should-fear-googles-partnership-with-ford/ [lxxvi] http://electrek.co/2015/12/21/tesla-ceo-elon-musk-drops-prediction-full-autonomous-driving-from-3-years-to-2/ [lxxvii] http://www.thechurchofgoogle.org/Scripture/Proof_Google_Is_God.html [lxxviii] https://www.reddit.com/r/churchofgoogle/ [lxxix] The answer, if you're searching from England, is to fly over Asia.

pages: 443 words: 51,804

Handbook of Modeling High-Frequency Data in Finance
by Frederi G. Viens , Maria C. Mariani and Ionut Florescu
Published 20 Dec 2011

Moody and Saffell (2001) found that a trading system using direct reinforcement learning outperforms a Q-trader for the asset allocation problem between the S&P500 and T-bill. Dempster and Romahi (2002) compared four methods for foreign exchange trading (reinforcement learning, genetic algorithms, Markov chain linear programming, and simple heuristic) and concluded that a combination of technical indicators leads to better performance than using only individual indicators. Dempster and Leemans (2006) reached a similar conclusion using adaptive reinforcement learning. Bates et al. (2003) used Watkin’s Q-learning algorithm to maximize proﬁts; these authors compared order ﬂow and order book data and compared with technical trading rules.

…

Designing a multi-agent portfolio management system. Proceedings of the AAAI workshop on internet information systems. AAI Press, Menlo Park, CA; 1996. Dempster M, Leemans V. An automated FX trading system using adaptive reinforcement learning. Expert Systems with Applications (Special issue on ﬁnancial engineering) 2006;30:534–552. Dempster M, Romahi Y. Intraday FX trading: an evolutionary reinforcement learning approach. Proceedings of the Third International conference on intelligent data engineering and automated learning IDEAL 02, Manchester, UK, August 12–14, 2002, Volume 2412 of Lecture notes in computer science.

…

Reading, MA: Addison-Wesley; 1997. p 15–44. Barr R, Seiford LM, Thomas F. Forecasting bank failure: a non-parametric frontier estimation approach. Rech Econ Louvain 1994;60:417–429. 70 CHAPTER 3 Using Boosting for Financial Analysis and Trading Bates R, Dempster M, Romahi Y. Evolutionary reinforcement learning in FX order book and order ﬂow analysis. Proceedings of the IEEE International conference on computational intelligence for ﬁnancial engineering, Hong Kong, March 20–23, 2003. Hong Kong: IEEE; 2003. p 355–362. Beaver W. Financial ratios as predictors of failure. J Account Res 1966;4:71–111.

pages: 291 words: 80,068

Framers: Human Advantage in an Age of Technology and Turmoil
by Kenneth Cukier , Viktor Mayer-Schönberger and Francis de Véricourt
Published 10 May 2021

On AlphaZero: This section benefited greatly from interviews in March 2019 by Kenneth Cukier with Demis Hassabis of DeepMind, as well as the chess grand master Matthew Sadler and master Natasha Regan, for which the authors extend their thanks. AlphaZero’s specifics on model training: David Silver et al., “A General Reinforcement Learning Algorithm That Masters Chess, Shogi and Go,” DeepMind, December 6, 2018, https://deepmind.com/blog/article/alphazero-shedding-new-light-grand-games-chess-shogi-and-go; David Silver et al., “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” DeepMind, December 5, 2017, https://arxiv.org/pdf/1712.01815.pdf. Note: a successor project to AlphaZero, called MuZero, can learn the rules of a board game by itself.

…

On the surface, it seems that the system could divine causation, generalize from experience, and, with those abstractions, apply causal templates to new circumstances. But a closer look reveals that there’s a human under the hood, just as the Wizard of Oz was a man behind the curtain. The system uses “deep reinforcement learning.” By playing against itself millions of times, through trial and error it identifies the best sets of actions—and gives itself a statistical “reward” to reinforce the behavior. Yet in its most critical areas what constitutes a reward isn’t learned by the system itself: it needed to be coded manually.

…

AI wins Dota 2: Nick Statt, “OpenAI’s Dota 2 AI Steamrolls World Champion e-Sports Team with Back-to-Back Victories,” The Verge, April 13, 2019, https://www.theverge.com/2019/4/13/18309459/openai-five-dota-2-finals-ai-bot-competition-og-e-sports-the-international-champion. OpenAI Dota 2 research paper: Christopher Berner et al., “Dota 2 with Large Scale Deep Reinforcement Learning,” OpenAI, 2019, https://arxiv.org/abs/1912.06680. For more on the work, see also: Ng Wai Foong, “Beginner’s Guide to OpenAI Five at Dota2,” Medium, May 7, 2019, https://medium.com/@ngwaifoong92/beginners-guide-to-openai-five-at-dota2-3b49ee5169b8; Evan Pu, “Understanding OpenAI Five,” Medium, August 12, 2018, https://medium.com/@evanthebouncy/understanding-openai-five-16f8d177a957.

pages: 523 words: 61,179

Human + Machine: Reimagining Work in the Age of AI
by Paul R. Daugherty and H. James Wilson
Published 15 Jan 2018

But, traditionally, engineers have preprogrammed them. Then, when robots’ jobs change, engineers must reprogram them. In contrast, the new robotic arms, developed by Fanuc in partnership with software-maker Preferred Networks (both based in Japan), adapt on their own. They do it with an AI technique called deep reinforcement learning, in which the robot is given a picture of the successful outcome and then uses trial and error to figure out its own solution. According to Shohei Hido, chief research officer at Preferred Networks, the arms take eight hours to become at least 90 percent accurate for this kind of task. This is virtually the same time and accuracy of an expert programming it, but because the arm is now autodidactic, the human expert is now freed to do other more complex tasks, especially those that require human judgment.

…

Unsupervised learning is less focused than supervised learning on the output and more focused on exploring input data and inferring hidden structures from unlabeled data. Semi-supervised learning. Uses both labeled and unlabeled data for training—typically more unlabeled data than labeled. Many machine learning researchers have found that the combination of these two types of data considerably improves learning accuracy. Reinforcement learning. A kind of training in which an algorithm is given a specific goal, such as operating a robot arm or playing the game Go. Each move the algorithm makes toward the goal is either rewarded or punished. The feedback allows the algorithm to build the most efficient path toward the goal. Neural network.

…

See personalization cybersecurity, 56–58, 59 Darktrace, 58 DARPA Cyber Grand Challenges, 57, 190 Dartmouth College conference, 40–41 dashboards, 169 data, 10 in AI training, 121–122 barriers to flow of, 176–177 customization and, 78–80 discovery with, 178 dynamic, real-time, 175–176 in enterprise processes, 59 exhaust, 15 in factories, 26–27, 29–30 leadership and, 180 in manufacturing, 38–39 in marketing and sales, 92, 98–99, 100 in R&D, 69–72 in reimagining processes, 154 on supply chains, 33–34 supply chains for, 12, 15 velocity of, 177–178 data hygienists, 121–122 data supply-chain officers, 179 data supply chains, 12, 15, 174–179 decision making, 109–110 about brands, 93–94 black box, 106, 125, 169 employee power to modify AI, 172–174 empowerment for, 15 explainers and, 123–126 transparency in, 213 Deep Armor, 58 deep learning, 63, 161–165 deep-learning algorithms, 125 DeepMind, 121 deep neural networks (DNN), 63 deep reinforcement learning, 21–22 demand planning, 33–34 Dennis, Jamie, 158 design at Airbus, 144 AI system, 128–129 Elbo Chair, 135–137 generative, 135–137, 139, 141 product/service, 74–77 Dickey, Roger, 52–54 digital twins, 10 at GE, 27, 29–30, 183–184, 194 disintermediation, brand, 94–95 distributed learning, 22 distribution, 19–39 Ditto Labs, 98 diversity, 52 Doctors Without Borders, 151 DoubleClick Search, 99 Dreamcatcher, 136–137, 141, 144 drones, 28, 150–151 drug interactions, 72–74 Ducati, 175 Echo, 92, 164–165 Echo Voyager, 28 Einstein, 85–86, 196 Elbo Chair, 136–137, 139 “Elephants Don’t Play Chess” (Brooks), 24 Elish, Madeleine Clare, 170–171 Ella, 198–199 embodied intelligence, 206 embodiment, 107, 139–140 in factories, 21–23 of intelligence, 206 interaction agents, 146–151 jobs with, 147–151 See also augmentation; missing middle empathy engines for health care, 97 training, 117–118, 132 employees agency of, 15, 172–174 amplification of, 138–139, 141–143 development of, 14 hiring, 51–52 job satisfaction in, 46–47 marketing and sales, 90, 92, 100–101 on-demand work and, 111 rehumanizing time and, 186–189 routine/repetitive work and, 26–27, 29–30, 46–47 training/retraining, 15 warehouse, 31–33 empowerment, 137 bot-based, 12, 195–196 in decision making, 15 of salespeople, 90, 92 workforce implications of, 137–138 enabling, 7 enterprise processes, 45–66 compliance, 47–48 determining which to change, 52–54 hiring and recruitment, 51–52 how much to change, 54–56 redefining industries with, 56–58 reimagining around people, 58–59 robotic process automation (RPA) in, 50–52 routine/repetitive, 46–47 ergonomics, 149–150 EstherBot, 199 ethical, moral, legal issues, 14–15, 108 Amazon Echo and, 164–165 explainers and, 123–126 in marketing and sales, 90, 100 moral crumple zones and, 169–172 privacy, 90 in R&D, 83 in research, 78–79 ethics compliance managers, 79, 129–130, 132–133 European Union, 124 Ewing, Robyn, 119 exhaust data, 15 definition of, 122 experimentation, 12, 14 cultures of, 161–165 in enterprise processes, 59 leadership and, 180 learning from, 71 in manufacturing, 39 in marketing and sales, 100 in process reimagining, 160–165 in R&D, 83 in reimagining processes, 154 testing and, 74–77 expert systems, 25, 41 definition of, 64 explainability strategists, 126 explaining outcomes, 107, 114–115, 179 black-box concerns and, 106, 125, 169 jobs in, 122–126 sustaining and, 130 See also missing middle extended intelligence, 206 extended reality, 66 Facebook, 78, 79, 95, 177–178 facial recognition, 65, 90 factories, 10 data flow in, 26–27, 29–30 embodiment in, 140 job losses and gains in, 19, 20 robotic arms in, 21–26 self-aware, 19–39 supply chains and, 33–34 third wave in, 38–39 traditional assembly lines and, 1–2, 4 warehouse management and, 30–33 failure, learning from, 71 fairness, 129–130 falling rule list algorithms, 124–125 Fanuc, 21–22, 128 feedback, 171–172 feedforward neural networks (FNN), 63 Feigenbaum, Ed, 41 financial trading, 167 first wave of business transformation, 5 Fletcher, Seth, 49 food production, 34–37 ForAllSecure, 57 forecasts, 33–34 Fortescue Metals Group, 28 Fraunhofer Institute of Material Flow and Logistics (IML), 26 fusion skills, 12, 181, 183–206, 210 bot-based empowerment, 12, 195–196 developing, 15–16 holistic melding, 12, 197, 200–201 intelligent interrogation, 12, 185, 193–195 judgment integration, 12, 191–193 potential of, 209 reciprocal apprenticing, 12, 201–202 rehumanizing time, 12, 186–189 relentless reimagining, 12, 203–205 responsible normalizing, 12, 189–191 training/retraining for, 211–213 Future of Work survey, 184–185 Garage, Capital One, 205 Gaudin, Sharon, 99 GE.

pages: 161 words: 39,526

Applied Artificial Intelligence: A Handbook for Business Leaders
by Mariya Yao , Adelyn Zhou and Marlene Jia
Published 1 Jun 2018

Active learning, a special case of semi-supervised learning, occurs when an algorithm actively queries a user to discover the right output or label for a new input. Active learning is used to optimize recommendation systems, like the ones used to recommend movies on Netflix or products on Amazon. Reinforcement learning is learning by trial-and-error, in which a computer program is instructed to achieve a stated goal in a dynamic environment. The program learns by repeatedly taking actions, measuring the feedback from those actions, and iteratively improving its behavioral policy. Reinforcement learning can be successfully applied to game-playing, robotic control, and other well-defined and contained problems. It is less effective with complex, ambiguous problems where rewards and environments are not well understood or quantified.

…

Neural networks were invented in the 1950s, but recent advances in computational power and algorithm design—as well as the growth of big data—have enabled deep learning algorithms to approach human-level performance in tasks such as speech recognition and image classification. Deep learning, in combination with reinforcement learning, enabled Google DeepMind’s AlphaGo to defeat human world champions of Go in 2016, a feat that many experts had considered to be computationally impossible. Much media attention has been focused on deep learning, and an increasing number of sophisticated technology companies have successfully implemented deep learning for enterprise-scale products.

Succeeding With AI: How to Make AI Work for Your Business
by Veljko Krunic
Published 29 Mar 2020

Recommendation engine—A software system that looks at the previous choices between items that the user has made and recommends a new item that the system expects would match the user’s interest. The recommendation engine can recommend many different types of items. In the case of a clothing retailer, an example of an item would be a sweater. In the case of Netflix, an example of an item would be a movie. Reinforcement learning—In the context of ML, reinforcement learning studies the design of the agents that are able to maximize a long-term reward obtained from some environment. Relational database management system (RDBMS) —A database system with a strong mathematical foundation. That foundation is known as a “relational model.” RDBMSs are widely used in most of the non-big data applications and are often the default choice for data storage.

…

NOTE As a rule, correlation-based systems are sufficient if you don’t have an ambition to change the world or if you’re content with an outcome similar to AI doesn’t learn causality, only correlations 205 something that the system has seen before. If your dream is for an innovative outcome or the ability to operate in a setting that hasn’t been seen before, then knowing causal relationships is necessary. The data scientists in the audience might mention some recent successes in the field of reinforcement learning. They’ll point out that AI systems were able to play the game of Go better than humans [153–155], or even play complex computer games such as StarCraft [156] better than humans. None of these AI systems used or worried much about causation, so it’s not clear that causal mechanisms are necessary to achieve better-than-human performance even on complex tasks.

…

The question for the future is how will AI technology evolve? One possibility is that methods capable of inferring causality will advance and gain a wider adoption. (See the sources on causality in appendix C [24–27] for an overview of the current state of causal research.) Another possibility is that further developments in the field of reinforcement learning, zeroshot learning [158], knowledge graphs [159], and maybe a combination of graphs and deep learning [160,161], might enhance AI’s ability to operate in domains it wasn’t exposed to during training. 206 CHAPTER 8 AI trends that may affect you NOTE Follow these trends carefully but, for the time being, tread cautiously in any situation requiring causal reasoning.

pages: 305 words: 75,697

Cogs and Monsters: What Economics Is, and What It Should Be
by Diane Coyle
Published 11 Oct 2021

Leamer, E., 2010, ‘Tantalus on the Road to Asymptopia’, Journal of Economic Perspectives, 24 (2), 31–46. Leibo, Joel Z. , Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel, 2017a, ‘Multi-Agent Reinforcement Learning in Sequential Social Dilemmas’, Cornell University, https://arxiv.org/abs/1702.03037. Leibo Joel Z., Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel, 2017b, ‘Multi-Agent Reinforcement Learning in Sequential Social Dilemmas’, in S. Das, E. Durfee, K. Larson, M. Winikoff (eds.), Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017), Sao Paulo, Brazil, 8–12 May, https://arxiv.org/abs/1702.03037.

…

The artificial intelligence (AI) company DeepMind set its AI agents—decision-making rules on a computer—competing for scarce resources in an apple-picking game (Leibo et al. 2017a,b). The game, ‘Gathering’, aimed to explore whether the agents would co-operate to gather apples, or whether they would free ride, that is consume the apples other agents had already gathered. These AIs used deep reinforcement learning, meaning the algorithmic agent, ‘Must learn to maximize its cumulative long-term reward through trial-and-error interactions with its environment.’ In other words, the computer agent teaches itself from certain sensory inputs—such as the location of pixels on screen in a game—and its own experience of whether what happens next adds to its own score.

…

Act of Union, 148 ad hoc models, 89–92, 94, 150 advertising, 60; aggregate data and, 141; digital technology and, 141, 175, 177, 203–6; fixed preferences and, 5, 92–93, 123; impulse purchases and, 92; Packard on, 109; targeting women, 93 agglomeration, 127, 132, 202, 207 Airbnb, 142, 173–75 Akerlof, George, 93, 159 Alemanno, Gianni, 68–69 algebra, 16, 26, 89–90, 179 algorithms: artificial intelligence (AI) and, 12, 25–27, 33, 116, 118, 139, 157, 160–61, 184–85, 188, 195, 200; Harris on, 27; policy-design, 157; prisons and, 33; progress and, 139, 157, 160–61; public responsibilities and, 25–27, 33; rationality and, 116, 118; twenty-first-century policy and, 184–85, 188, 195, 200; ultra-high frequency trading (HFT) and, 25–27 Alibaba, 173 Allende, Salvador, 184 Alphabet, 133 altruism, 49, 92, 117 Amazon, 133, 142, 170, 173, 175, 197 American Economic Association, 9, 34 Anderson, Elizabeth, 34 animal spirits, 22 Annual Abstract of Statistics for the United Kingdom, 150 Apple, 116–17, 123, 133, 173, 195 Arrow, Kenneth, 123 artificial intelligence (AI): adoption rates of, 172–73; algorithms and, 12, 25–27, 33, 116, 118, 139, 157, 160–61, 184–85, 188, 195, 200; automated decision making and, 116, 186–87; bias and, 13, 161, 165, 187; central planning and, 184, 186–87; changing technology and, 171–72; cloud computing and, 150, 170–72, 184, 197; DeepMind and, 115–16; facial recognition and, 165; homo economicus and, 161; machine learning and, 12–13, 137, 141, 160–61, 187; progress and, 137–39, 154, 159–62; public responsibilities and, 28, 40; rationality and, 116–18; reinforcement learning and, 116; socialist calculation debate and, 184, 186–87; twenty-first-century policy and, 184, 186–87, 195 Atkinson, A., 128–29 Aumann, Robert, 48 austerity, 19, 73, 101, 158, 164, 192 Austin, John, 23 automation, 139, 154, 165–66, 195 Azure, 170 Baidu, 173 Baldwin, Richard, 196 Banerjee, Abhijit, 109 Bank for International Settlements (BIS), 24n4 Bank of America, 28 Bank of England, 53, 84–85 bankruptcy, 23 Basu, Kaushik, 159–60 Bateson, Gregory, 104 Baumol, W.

pages: 397 words: 109,631

Mindware: Tools for Smart Thinking
by Richard E. Nisbett
Published 17 Aug 2015

Which isn’t to say we’re not allowed to believe untestable theories—just that we need to recognize their weakness compared to theories that are. I can believe anything I want about the world, but you have to reckon with it only if I provide evidence for it or an air-tight logical derivation. The field of psychology affords many examples of too-easy theorizing. Reinforcement learning theory taught us a great deal about the conditions that favor acquisition and “extinction” of learned responses such as a rat’s pressing a lever to get food. The theory guided important applications such as treatment of phobias and machine learning procedures. But theorists in that tradition who attempt explanations of complex human behavior in terms of presumed reinforcements sometimes make the same mistakes as many psychoanalytic and evolutionary theorists.

…

But some new paradigms do break through and replace older views, seemingly overnight. The field of psychology offers a particularly clear example of the rapid rise of a new paradigm and the near-simultaneous abandonment of an old one. Psychology from early in the twentieth century till roughly the late 1960s was dominated by reinforcement learning theories. Ivan Pavlov showed that once an animal had learned that a particular arbitrary stimulus signaled a reinforcement of some kind, that stimulus would elicit the same reaction as the reinforcing agent itself. A bell that preceded the introduction of meat would come to produce the same salivary reaction as the meat itself.

…

“Vicarious reinforcement theory” was both obvious and hard to test in a rigorous way, except by hothouse experiments showing that children sometimes imitate other people in the short term. Hit a doll and the child may imitate that. But that doesn’t show that chronically aggressive adults got that way by observing other people get rewarded for aggressive behavior. Among scientifically minded psychologists it was de rigueur to have a reinforcement-learning theory interpretation of every psychological phenomenon, whether it involved the behavior of animals or humans. Scientists who offered different interpretations of the evidence were ignored or worse. An Achilles’ heel of reinforcement theory stems from the fact that it’s incrementalist in nature.

pages: 419 words: 109,241

A World Without Work: Technology, Automation, and How We Should Respond
by Daniel Susskind
Published 14 Jan 2020

Better software, for instance, can help compensate for a lack of data or processing power. AlphaGo Zero needed neither the data nor the processing power of its older cousin, AlphaGo, to beat it emphatically in a series of go games, 100–0.15 How did it do this? By using more sophisticated software, drawing on advances in a field of algorithm design known as reinforcement learning.16 The most powerful machines of the future, though, will probably be the ones that can draw on the best of all three resources: data, software, and hardware. And while small institutions may have one of them—perhaps a talented engineer capable of writing good software or a unique set of valuable data—they are unlikely to have all three at the same time.

…

See Nick Bostrom, “Ethical Issues in Advanced Artificial Intelligence” in George Lasker, Wendell Wallach, and Iva Smit, eds., Cognitive, Emotive, and Ethical Aspects of Decision Making in Humans and in Artificial Intelligence (Windsor, ON: International Institute of Advanced Studies in Systems Research and Cybernetics, 2003), pp. 12–17. 26. Tad Friend, “How Frightened Should We Be of AI,” New Yorker, 14 May 2018. 27. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, et al., “Human-Level Control Through Deep Reinforcement Learning,” Nature 518 (25 February 2015): 529–33. 28. David Autor, Frank Levy, and Richard Murnane, “The Skill Content of Recent Technological Change: An Empirical Exploration,” Quarterly Journal of Economics 118, no. 4 (2003): 129–333. Another “non-routine” task listed was “forming/testing hypotheses.”

…

The new machine, dubbed AlphaZero, was matched up against the champion chess computer Stockfish. Of the fifty games where AlphaZero played white, it won twenty-five and drew twenty-five; of the fifty games where it played black, it won three and drew forty-seven. David Silver, Thomas Hubert, Julian Schrittwieser, et al., “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” https://arxiv.org, arXiv:1712.01815v1 (2017). 7. Tyler Cowen, “The Age of the Centaur Is Over Skynet Goes Live,” Marginal Revolution, 7 December 2017. 8. See Kasparov, Deep Thinking, chap. 11. 9. Data is from Ryland Thomas and Nicholas Dimsdale, “A Millennium of UK Data,” Bank of England OBRA data set (2017).

pages: 444 words: 117,770

The Coming Wave: Technology, Power, and the Twenty-First Century's Greatest Dilemma
by Mustafa Suleyman
Published 4 Sep 2023

But as we’ve seen in so many other applications of machine learning, what starts with close human supervision ends up with the AI learning to do the task better by itself, eventually generalizing to new settings. Google’s research division is building robots that could, like the 1950s dream, do household chores and basic jobs from stacking dishes to tidying chairs in meeting rooms. They built a fleet of a hundred robots capable of sorting trash and wiping down tables. Reinforcement learning helps each robot’s gripper pick up cups and open doors: just the kinds of actions, effortless to a toddler, that have vexed roboticists for decades. This new breed of robots can work on general activities, responding to natural language voice commands. Another growing area is in the ability for robots to swarm, greatly amplifying the potential capabilities of any individual robot into a hive mind.

…

Even days before its first public competition in March 2016, prominent researchers thought an AI simply couldn’t win at this level of Go. At DeepMind, we were still uncertain our program would prevail in a matchup with a master human competitor. We saw the contest as a grand technical challenge, a waypoint on a wider research mission. Within the AI community, it represented a first high-profile public test of deep reinforcement learning and one of the first research uses of a very large cluster of GPU computation. In the press the matchup between AlphaGo and Lee Sedol was presented as an epic battle: human versus machine; humanity’s best and brightest against the cold, lifeless force of a computer. Cue all the tired tropes of Terminators and robot overlords.

…

AI transforms this kind of attack. AI cyberweapons will continuously probe networks, adapting themselves autonomously to find and exploit weaknesses. Existing computer worms replicate themselves using a fixed set of preprogrammed heuristics. But what if you had a worm that improved itself using reinforcement learning, experimentally updating its code with each network interaction, each time finding more and more efficient ways to take advantage of cyber vulnerabilities? Just as systems like AlphaGo learn unexpected strategies from millions of self-played games, so too will AI-enabled cyberattacks. However much you war-game every eventuality, there’s inevitably going to be a tiny vulnerability discoverable by a persistent AI.

pages: 94 words: 22,435

Bandit Algorithms for Website Optimization
by John Myles White
Published 10 Dec 2012

While you could easily spend the rest your life tinkering with the simulation framework we’ve given you to find the best possible settings of different parameters for the algorithms we’ve described, it’s probably better for you to read about how other people are using bandit algorithms. Here’s a very partial reading list we’d suggest for those interested: If you’re interested in digging into the academic literature on the Multiarmed Bandit Problem, the best introduction is probably in the classic textbook on Reinforcement Learning, which is a broader topic than the Multiarmed Bandit Problem: Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto, (1998). A good starting point for going beyond Sutton and Barto’s introduction is to read about some of the other bandit algorithms out there that we didn’t have time to discuss in this book.

pages: 513 words: 152,381

The Precipice: Existential Risk and the Future of Humanity
by Toby Ord
Published 24 Mar 2020

In fact it is they who are the leading voices of concern. To see why they are concerned, it will be helpful to zoom in a little, looking at our current AI techniques and why these are hard to align or control. One of the leading paradigms for how we might eventually create AGI combines deep learning with an earlier idea called reinforcement learning. This involves agents that receive reward (or punishment) for performing various acts in various circumstances. For example, an Atari-playing agent receives reward whenever it scores points in the game, while a Lego-building agent might receive reward when the pieces become connected. With enough intelligence and experience, the agent becomes extremely capable at steering its environment into the states where it obtains high reward.

…

Zecha (eds.), Induction, Physics and Ethics. Synthese Library (Monographs on Epistemology, Logic, Methodology, Philosophy of Science, Sociology of Science and of Knowledge, and on the Mathematical Methods of Social and Behavioral Sciences), vol. 31 (pp. 359–78). Springer. Haarnoja, T., et al. (2018). “Composable Deep Reinforcement Learning for Robotic Manipulation.” 2018 IEEE International Conference on Robotics and Automation (ICRA), 6,244–51. IEEE. Haensch, S., et al. (2010). “Distinct Clones of Yersinia Pestis Caused the Black Death.” PLOS Pathogens, 6(10), e1001134. Häggström, O. (2016). “Here Be Dragons: Science, Technology and the Future of Humanity,” in Here Be Dragons.

…

“Mark Zuckerberg, Elon Musk and the Feud Over Killer Robots.” The New York Times. Milanovic, B. (2016). Global Inequality: A New Approach for the Age of Globalization. Harvard University Press. Minsky, M. (1984). Afterword, in True Names. Bluejay Books. Mnih, V., et al. (2015). “Human-Level Control through Deep Reinforcement Learning.” Nature, 518(7540), 529–33. Montgomery, P. (June 13, 1982). “Throngs Fill Manhattan to Protest Nuclear Weapons.” The New York Times. Moore, G. E. (1903). Principia Ethica. Cambridge University Press. Moravec, H. (1988). Mind Children: The Future of Robot and Human Intelligence. Harvard University Press.

pages: 307 words: 88,180

AI Superpowers: China, Silicon Valley, and the New World Order
by Kai-Fu Lee
Published 14 Sep 2018

Given this flood of media attention to each new achievement, the casual observer—or even expert analyst—would be forgiven for believing that we are consistently breaking fundamentally new ground in artificial intelligence research. I believe this impression is misleading. Many of these new milestones are, rather, merely the application of the past decade’s breakthroughs—primarily deep learning but also complementary technologies like reinforcement learning and transfer learning—to new problems. What these researchers are doing requires great skill and deep knowledge: the ability to tweak complex mathematical algorithms, to manipulate massive amounts of data, to adapt neural networks to different problems. That often takes Ph.D.-level expertise in these fields.

…

Discoveries like deep learning that truly raise the bar for machine intelligence are rare and often separated by decades, if not longer. Implementations and improvements on these breakthroughs abound, and researchers at places like DeepMind have demonstrated powerful new approaches to things like reinforcement learning. But in the twelve years since Geoffrey Hinton and his colleagues’ landmark paper on deep learning, I haven’t seen anything that represents a similar sea change in machine intelligence. Yes, the AI scientists surveyed by Bostrom predicted a median date of 2040 for AGI, but I believe scientists tend to overestimate when an academic demonstration will become a real-world product.

…

Renren, 42–43 lean startup methodology, 44–45 Ma, Jack, and, 34–37 search habit divergence and, 37–38 Silicon Valley and, 22–25, 28, 30–34, 39–40, 49 Wang Xing and, 22–24, 26, 31, 32–33, 42, 46–49 War of a Thousand Groupons, 45–49 Zhou Hongyi and, 40–42 corporate oligarchy, 171 corporate research and proprietary technology, 91–92 corporate social responsibility, 216–17 craftsmanship, 229 credit industry, 10–11, 110, 112–13, 116 crime disruption, 75 Cultural Revolution in China, 33 Cybersecurity Law in China, 125 D Daimler, 135 data age of, 14, 18 AI algorithms and, 14, 17, 56, 138 AI-rich countries and, 168–69 businesses and, 110–11 China’s abundance of, 15, 16, 17, 50, 55–56, 73, 79 collection of, and privacy, 124–25 deep learning and, 14, 17, 19–20, 56 internet companies and, 107–8 medical diagnosis and, 114 from mobile payments, 77 neural networks and, 9 pattern-finding in, 10 private, 124–25 self-driving cars and, 131–32, 133 structured, 111–12 Deep Blue, 4, 5 deep learning AI revolution and, 5, 12–13, 25, 92, 94, 143 business AI and, 111 data and, 14, 17, 19–20, 56 Google and, 92 history of, 6–10 implementation of, 12–14, 86 machine perceptual abilities and, 166 next, 91–92, 94 pattern-finding and, 10–11, 13, 166–67 rapid progress of, 161 DeepMind AI in United Kingdom and, 169 AlphaGo and, 2, 11 AlphaGo Zero and, 90 Google and, 2, 11, 92 iFlyTek compared to, 105 publishing by, 91 reinforcement learning and, 143 Deng Xiaoping, 28 desktop computers, 96 Dianping (Yelp copycat), 48, 49, 71–72 Didi four waves of AI and, 106 going heavy, 72–73 self-driving cars and, 131 services using model of, 213–14 Uber and, 40, 68–69, 79, 137 Didi Chuxing, 68–69, 70 discovery to implementation, transition from, 13, 15 Disneyland replica in China, 31 Disruptor (Zhou), 42 DJI, 130–31 domestic workers, 130 drones, autonomous, 130–31, 136, 167–68 dual-teacher model, 122 dystopians vs. utopians, 140–44 E EachNet, 35 Eat24, 72 eBay, 35–37, 39 economy and AI, 144–73 competition and, 106 deep-learning breakthroughs and, 4–5 general purpose technologies (GPTs), 148–55 global economic inequality, 146, 168–70, 172 intelligent vs. physical automation, 167–68 job loss, two kinds of, 162–63 job losses, bottom line, 164–65 job loss studies, 157–61 jobs and inequality crisis, 145–47 machine learning as driver, 25, 84, 91, 94–95 monopolies, 20, 96, 168–69, 170–71 psychological crisis, 5, 21, 147, 173–74 risk of replacement, 155–57 science-fiction visions and, 144–45 techno-optimists and the Luddite fallacy, 147–48 unemployment.

pages: 289 words: 92,714

The Rationalist's Guide to the Galaxy: Superintelligent AI and the Geeks Who Are Trying to Save Humanity's Future
by Tom Chivers
Published 12 Jun 2019

‘Introducing OpenAI’, OpenAI.com 2015 https://blog.openai.com/introducing-openai/ 40: What are they doing to stop the AI apocalypse? 1. D. Amodei, C. Olah, J. Steinhardt, P. Christiano, J. Schulman and D. Mane, ‘Concrete problems in AI safety’, technical report, 25 July 2016 arXiv:1606.06565v2 (cs.AI) 2. Paul Christiano, et al., ‘Deep reinforcement learning from human preferences’, OpenAI https://blog.openai.com/deep-reinforcement-learning-from-human-preferences/ 3. Paul Christiano, ‘Capability amplification’, Medium https://ai-alignment.com/policy-amplification-6a70cbee4f34 41: The internal double crux 1. Duncan Sabien, ‘Double crux – A strategy for resolving disagreement’, LessWrong, 2017 https://www.lesswrong.com/posts/exa5kmvopeRyfJgCy/double-crux-a-strategy-for-resolving-disagreement 42: Life, the universe and everything 1.

…

The paper suggests that a separate agent designed solely to judge whether the rewards the first agent receives are earned might be a good way around it. As well as simply defining the problems, there are specific efforts to find solutions. Holden Karnofsky, the OpenPhil founder, got quite excited talking to me about a paper called ‘Deep Reinforcement Learning from Human Preferences’,2 by, among other people, Shane Legg, one of the founders of DeepMind, and Paul Christiano of OpenAI. The idea is that for some goals, a simple definition isn’t much use. Karnofsky showed me a video of a simulated environment called MuJoCo, a sort of toy world with broadly realistic physics that programmers use to test how a robot might move.

…

Eliezer Yudkowsky, ‘Expected creative surprises’, LessWrong sequences, 2008 http://lesswrong.com/lw/v7/expected_creative_surprises/ 6. Eliezer Yudkowsky, ‘Belief in intelligence’, LessWrong sequences, 2008 http://lesswrong.com/lw/v8/belief_in_intelligence/ 7. Ibid. 8. Demis Hassabis et al., ‘Mastering chess and shogi by self-play with a general reinforcement learning algorithm’, Arxiv, 2017 https://arxiv.org/pdf/1712.01815.pdf 9. Russell and Norvig, Artificial Intelligence, p. 4 10. Nick Bostrom and Vincent C. Müller, ‘Future progress in artificial intelligence: A survey of expert opinion’, Fundamental Issues of Artificial Intelligence, 2016 https://nickbostrom.com/papers/survey.pdf 11.

pages: 481 words: 125,946

What to Think About Machines That Think: Today's Leading Thinkers on the Age of Machine Intelligence
by John Brockman
Published 5 Oct 2015

The functions they perform are analogous to some capabilities of the cerebral cortex, which has also been scaled up by evolution, but to solve complex cognitive problems the cortex interacts with many other brain regions. In 1992, Gerald Tesauro at IBM, using reinforcement learning, trained a neural network to play backgammon at a world-champion level. The network played itself, and the only feedback it received was which side won the game. Brains use reinforcement learning to make sequences of decisions toward achieving goals, such as finding food under uncertain conditions. Recently, DeepMind, a company acquired by Google in 2014, used deep reinforcement learning to play seven classic Atari games. The only inputs to the learning system were the pixels on the video screen and the score, the same inputs humans use.

…

As everyone knows, in the modern view this means maximizing expected utility to the extent possible. Actually, it doesn’t quite mean that. What it means is this: Given a utility function (or reward function, or goal), maximize its expectation. AI researchers work hard on algorithms for maximization—game-tree search, reinforcement learning, and so on—and on methods (including perception) for acquiring, representing, and manipulating the information needed to compute expectations. In all these areas, progress has been significant and appears to be accelerating. Amid all this activity, an important distinction is being overlooked: Being better at making decisions is not the same as making better decisions.

…

Some have argued that there’s no conceivable risk to humanity for centuries to come, perhaps forgetting that the interval of time between Ernest Rutherford’s confident assertion that atomic energy would never be feasibly extracted and Leó Szilárd’s invention of the neutron-induced nuclear chain reaction was less than twenty-four hours. For this reason, and the much more immediate reason that domestic robots and self-driving cars will need to share a good deal of the human value system, research on value alignment is well worth pursuing. One possibility is a form of inverse reinforcement learning (IRL)—that is, learning a reward function by observing the behavior of some other agent who’s assumed to be acting in accordance with such a function. Watching its owner make coffee in the morning, the domestic robot learns something about the desirability of coffee in some circumstances, while a robot with an English owner learns something about the desirability of tea in all circumstances.

pages: 848 words: 227,015

On the Edge: The Art of Risking Everything
by Nate Silver
Published 12 Aug 2024

Early LLMs, when you asked them what the Moon is made out of, would often respond with “cheese.” This answer might minimize the loss function in the training data because the moon being made out of cheese is a centuries-old trope. But this is still misinformation, however harmless in this instance. So LLMs undergo another stage in their training: what’s called RLHF, or reinforcement learning from human feedback. Basically, it works like this: the AI labs hire cheap labor—often from Amazon’s Mechanical Turk, where you can employ human AI trainers from any of roughly fifty countries—to score the model’s answers in the form of an A/B test: A: The Moon is made out of cheese. B: The Moon is primarily composed of a variety of rocks and minerals.

…

Are we willing to hand over agency to machines if they can make higher EV choices for us than we’d make for ourselves? Still, in contrast to Bostrom’s paper clip maximizer and its alien goals, the LLMs we’ve built so far are quite humanlike. And perhaps that shouldn’t be surprising. They’re trained on human-generated text. They seem to think about language in ways that are analogous to how humans do. And reinforcement learning lets us spank them further into line with our values. Some researchers have been pleasantly surprised. “They seem to come with a built-in level of alignment with human intent and with moral values,” said roon. “Nobody explicitly trained it to do that. But there must have been other examples in the training set that made it think the character it’s playing is someone with this stringent set of moral values.” roon even told me that ChatGPT’s first instinct is often to be too strict, refusing to provide answers to innocuous questions.

…

For example, regression analysis can analyze how weather conditions and days of the week influence sales at a BBQ restaurant. Regulatory capture: The tendency for entrenched companies to benefit when new regulation is crafted ostensibly in the public interest, such as because of successful lobbying. Reinforcement Learning from Human Feedback (RLHF): A late stage of training a large language model in which human evaluators give it thumbs-up or thumbs-down based on subjective criteria to make the LLM more aligned with human values. Colloquially referred to by Stuart Russell as “spanking.” Repugnant Conclusion: Formulated by the philosopher Derek Parfit, the proposition that any amount of positive utility multiplied by a sufficiently large number of people—infinity people eating one stale batch of Arby’s curly fries before dying—has higher utility than some smaller number of people living in abundance.

pages: 590 words: 152,595

Army of None: Autonomous Weapons and the Future of War
by Paul Scharre
Published 23 Apr 2018

Within a mere four hours of self-play and with no training data, AlphaZero eclipsed the previous top chess program. The method behind AlphaZero, deep reinforcement learning, appears to be so powerful that it is unlikely that humans can add any value as members of a “centaur” human-machine team for these games. Tyler Cowen, “The Age of the Centaur Is *Over* Skynet Goes Live,” MarginalRevolution.com, December 7, 2017, http://marginalrevolution.com/marginalrevolution/2017/12/the-age-of-the-centaur-is-over.html. David Silver et al., “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” December 5, 2017, https://arxiv.org/pdf/1712.01815.pdf. 325 as computers advance: Cowen, “What Are Humans Still Good For?”

…

Our goal is to beat the best human players, not just mimic them,” as explained in the post. “To do this, AlphaGo learned to discover new strategies for itself, by playing thousands of games between its neural networks, and adjusting the connections using a trial-and-error process known as reinforcement learning.” AlphaGo used the 30 million human games of go as a starting point, but by playing against itself could reach levels of game play beyond even the best human players. This superhuman game play was demonstrated in the 4–1 victory AlphaGo delivered over the world’s top-ranked human go player, Lee Sedol, in March 2016.

…

v=JNrXgpSEEIE. 126 “I thought it was a mistake”: Ibid. 126 “It’s not a human move”: Cade Metz, “The Sadness and Beauty of Watching Google’s AI Play Go,” WIRED, March 11, 2016, https://www.wired.com/2016/03/sadness-beauty-watching-googles-ai-play-go/. 126 1 in 10,000: Cade Metz, “In Two Moves, AlphaGo and Lee Sedol Redefined the Future,” WIRED, accessed June 7, 2017, https://www.wired.com/2016/03/two-moves-alphago-lee-sedol-redefined-future/. 126 “I kind of felt powerless”: Moyer, “How Google’s AlphaGo Beat a Go World Champion.” 126 “AlphaGo isn’t just an ‘expert’ system”: “AlphaGo,” January 27, 2016. 127 AlphaGo Zero: “AlphaGo Zero: Learning from Scratch,” DeepMind, accessed October 22, 2017, https://deepmind.com/blog/alphago=zero=learning=scratch/. 127 neural network to play Atari games: Volodymyr Mnih et al., “Human-Level Control through Deep Reinforcement Learning,” Nature 518, no. 7540 (February 26, 2015): 529–33. 127 deep neural network: JASON, “Perspectives on Research in Artificial Intelligence and Artificial General Intelligence Relevant to DoD.” 129 Inception-v3: Inception-v3 is trained for the Large Scale Visual Recognition Challenge (LSVRC) using the 2012 data.

pages: 254 words: 76,064

Whiplash: How to Survive Our Faster Future
by Joi Ito and Jeff Howe
Published 6 Dec 2016

The key was a clever combination of deep learning—a kind of pattern recognition, similar to how a human brain (or Google) can recognize a cat or a fire truck after seeing many images—and “learning” so that it could guess statistically what something was likely to be, or in the case of Go, what a human player, considering all of the games of the past, was likely to play in a particular situation. This created a very rudimentary model of a Go player that guessed moves based on patterns it learned from historical matches. Then, they added a kind of reinforcement learning, which allows the computer to try new things. Just as the brain learns by being rewarded with dopamine when it does something that succeeds, which reinforces the neural pathway that “got it right,” reinforcement learning allows a computer to try things and rewards successful experimentation, thus reinforcing those strategies. AlphaGo started with a basic version of itself, then created slightly different versions to try multiple strategies millions of times, rewarding the winning strategies until it got stronger and stronger by playing against successively better versions of itself.

…

That was before the scientific journal Nature published a bombshell article in January 2016 reporting that Google’s artificial intelligence project, DeepMind, had entered the race.6 Its program, called AlphaGo, first learned from a huge history of past Go matches, but then, through an innovative form of reinforcement learning, played itself over and over again until it got better and better. The previous November, the article revealed, Google had orchestrated a five-game match between European Go champion Fan Hui and AlphaGo. The final score? The Machine: 5, the Human: 0. It was a watershed moment in the field of machine learning—the first time a computer had beaten a professional Go player without a head start.

The Smartphone Society
by Nicole Aschoff

The fundamentals are in place for bounty that vastly exceeds anything we’ve ever seen before.19 Chris Anderson, the editor-in-chief of Wired magazine, predicted back in 2008 that we’d reached the “end of theory”—that our 24/7 connected lives would generate so much data that soon we wouldn’t even need theory anymore: “Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.”20 With advancements in networks and computing we’ll generate more and more data, solving problems and fueling breakthroughs in medicine, science, agriculture, policing, and logistics. Self-driving cars and the ability of the Alpha Go Zero program to teach itself how to play the ancient and extremely difficult Chinese game of Go using only the game rules and reinforcement learning are just the beginning of a seismic shift rooted in the power of data. The motto of Alphabet subsidiary DeepMind encapsulates this vision of the future: “Solve intelligence and use that to solve everything else.” Run by Demis Hassabis, a neuroscientist, video game developer, and former child chess prodigy, and a team of about two hundred computer scientists and neuroscientists, the Alphabet subsidiary’s researchers have operationalized the idea that intelligence, thought, and perhaps even consciousness are nothing more than a collection of discrete, local processes that can be “solved” with enough computing power and data.

…

A second reason we fail to challenge the datafication of life—to press for decommodifying personal data—is that we’ve been led to believe that restricting the collection and sale of personal data that we generate 24/7 with our hand machines will stifle technological progress and prevent the realization of potential gains from big data. This is an understandable conclusion because it is the one constantly pushed by the Silicon Valley hype machine, but it is incorrect. The avenues of research that could be conducted using nonhuman data are boundless. DeepMind recently used a reinforcement learning algorithm to optimize its electricity use. Other companies use big data on crop yields or supply chains to optimize logistics. Moreover, most of our personal data isn’t used for high-minded scientific research. Silicon Valley wants us to think that the no-holds-barred vacuuming of our data exhaust is necessary for the glorious scientific future, but it isn’t.

…

P., 37, 39, 41, 44, 54 Mori, 61–62 MoveOn.org, 91 M-pesa mobile money transfer system, 6 MSD (Marjorie Stoneman Douglas) High School, 90 mSpy, 25 Mubarak, Hosni, 92, 94, 97 Muflahi, Abdullah, 20 multitier subcontracting, 31 Munro, Alice, 62–63 Musk, Elon, 126 Myanmar, 42, 50, 94 National Highway Traffic Safety Administration, 157 National Hispanic Media Coalition, 55 National Justice Project (Australia), 21 National Labor Relations Board, 153 National Policy Institute, 105–6 National Security Agency (NSA), 80, 81, 95 NationBuilder, 93–94 neoliberalism: and capitalism, 69, 102, 112–13, 117–18, 144–45; and economic crisis, 99–100, 176n30; and feminism, 107; and new titans, 54 Nepal, 94 network(s), 61, 143 “network effects,” 44 “network society,” 121 #NeverAgain movement, 90, 110 New America, 56 New Center, 45–46 “New Economy,” 121 news: fake, 50–51, 56 news cycle, 89 newspapers, 50–52 news sources, 50–52 New York Taxi Workers Alliance, 146, 153, 178–79n5 Nichols, Synead, 89–90 Nigeria, politics in, 97 Nixon, Richard, 88 Noah, Trevor, 110 Noble, Dylan, 19–20 Nobu Malibu, 62 #NotOkay, 108 NSA (National Security Agency), 80, 81, 95 Obama, Barack, 6, 44, 55, 92, 106 obsolescence, built-in, 82 Ocasio-Cortez, Alexandria, 104, 112, 152, 179n22 Occupy movement, 102–3, 104 Oculus, 41 OKCupid, 23, 24 O’Neill, Caitlin, 56 online dating, 23–27, 35 online search, 52 Open Markets Institute, 150 Open Sesame, 10 Operation Haymaker, 95–96 O’Reilly, Holly Figueroa, 93 O’Rourke, Beto, 92 outrage politics, 109 outsourcing, 146 ownership of our own data, 135–36, 156 Oxford, Kelly, 60, 108 Pacific Railroad Act (1862), 80 Page, Larry, 38, 39, 41, 54, 119–20, 124 PageRank, 52 Palantir, 81 para-social interactions with celebrities, 65 parental limitation, 8 Parkland, Florida, 90 Patriot Act, 81 payment systems, mobile, 4, 5, 6, 10 PayPal, 44 peer-to-peer networks, 120 Pelosi, Nancy, 56 performance: politics as, 109, 111; social media as, 63, 64, 68 permatemps, 46, 47 personal information: ownership of, 135–36, 156; sharing on social media of, 60–61 personalization, 53 personal narratives, 115–16 philanthropy, 56–57 Philippines, 90–91 “phone boss,” 69 Physician Women for Democratic Principles, 89 “pickers,” 33, 46 Pickersgill, Eric, 7 Pinterest, 60, 64, 69 PlentyOfFish, 24 polarization, in politics, 111 police: surveillance by, 96–97, 137–38, 176n28; violence against blacks by, 17–23, 35, 89–90, 100–102, 111, 169n6, 169n13 political advertising, 149 political movements, 97, 102, 111–12 political organizing, 91 political parties, 93–94 politics, 87–113; algorithms and filter bubbles in, 109–10; Black Lives Matter in, 89–90, 100–102, 111; bots in, 109; censorship in, 95; Dakota Access Pipeline Protests in, 103–4, 110; decentralization in, 101–2; digital-analog model in, 104, 110–12, 162; economic crisis in, 98–100; feminist movement in, 107–8; finding our voice in, 108–13; geo-, 95–96, 144; government tracking in, 94–95; gun violence in, 90, 91, 110, 111; modern-day revolt in, 100–108; neoliberal, 99–100, 112–13; Occupy movement in, 102–3; outrage, 109; outside of US, 93–94; as performance, 109, 111; as personal, 88–100; polarization in, 111; political movements in, 97, 102; political parties in, 93–94; protests in, 89–90, 92, 95, 100–104; slacktivism in, 111; social media use by candidates in, 103–5; social media use by people in power in, 87–88, 92–93; Sunrise Movement in, 103–4; surveillance by law enforcement in, 96–97, 176n28; Syrian War in, 90; virality in, 81; voter outreach in, 89; white supremacy in, 35, 105–7 poor communities, internet access for, 149–50 Posner, Eric A., 135 poverty, 12–13 power, in cognitive mapping, 145 power inequalities, 137–38 predictive modeling, 77 print magazines, 172n40 PRISM program, 81 privacy, 69–72, 137–38, 150–51, 156 “privacy tools,” 71 private sphere, commodification of, 144 producers vs. consumers, 28–29 profit, frontiers of, 72–79, 85–86 proponents, 9–12 ProPublica, 48, 106, 149, 153 protests, 89–90, 92, 95, 100–104 Proud Boys movement, 106 public opinion, molding of, 55 Pulitzer, Joseph, 50 “push” notifications, 84 racism, 17–23, 35, 169n6, 169n13 RAM (Rise Above Movement), 106–7 rare metals, 82, 83 Reagan, Ronald, 43 real estate market, effect of high-tech company location in, 48–49 “real life,” digital life vs., 68 recommender algorithm, 67 Reddit, 24 redlining, digital, 29 regulation of monopolies, 43–46 reinforcement learning, 157 Rekognition software, 149 relationships, expectations and norms about, 24–25 religious beliefs, tech-based, 123 remote medicine, 10 rents, 48 research funding, 55 retreat from technology, 132–35 Reynolds, Diamond “Lavish,” 20, 35 Rice, Tamir, 18 “right to repair,” 155 right-wing movements, 105–7 Rise Above Movement (RAM), 106–7 rituals, on social media, 61 Robbin, Jonathan, 77 robots, 128, 131 Rockefeller, Jay, 77 Rockefeller, John D., 37, 54, 57 Rongwen, Zhuang, 94 Roosevelt, Franklin Delano, 88 Rousseff, Dilma, 97 Rudder, Christian, 23 rural areas: internet access for, 29, 149–50 Salesforce: employees organizing at, 148; immoral projects at, 154 Sandberg, Sheryl, 107 Sanders, Bernie, 103–5, 110 Santana, Feidin, 19, 20 Santelli, Rick, 105 Saudi Arabia, censorship in, 95 Scavino, Dan, 88 Schifter, Doug, 127 Schillinger, Klemens, 8 Schneier, Bruce, 135 Schrems, Max, 151 science fiction, 126 Scott, Walter, 19, 20 scraping, 71 Seamless, 30 search algorithms, 51–52, 53 Seasteading Institute, 124 Seinfeld, Jerry, 110 self-esteem, 65 selfies, 59–60 selfish behavior, 138–39, 154–56 self-monitoring tools, 69 service jobs, 33–34 Seth, Jodi, 56 sexism, 23–27, 35 sexting, 25–27, 35 sexual division of labor, 74–75 sexual harassment, 25, 27, 126–27 sexuality, 23–27, 35 sexual violence, 25 shareholder value society, 98 sharing on social media, 60–61, 84 Sharpton, Al, 102 shopping, mobile, 31–32 short message service (SMS), 6 Sidewalk Labs, 41 Sierra Club, 157 signals intelligence, 95–96 “silent spring,” 7 Silicon Valley, 115–41; distrust of, 125–31; and government control, 124; and spirit of capitalism, 115–25; taking back control from, 131–41 Silicon Valley Rising (SVR), 147 Sina Weibo (China), 94 Singh, Jagmeet, 93–94 the Singularity, 123 skeptics, 7–8 Skype, 81 slacktivism, 111 Slager, Michael, 19 Slutwalks, 108 smartness, 9 smartphone(s): demographics of, 3, 4; vs.

pages: 294 words: 81,292

Our Final Invention: Artificial Intelligence and the End of the Human Era
by James Barrat
Published 30 Sep 2013

Researchers have found that the basal ganglia uses reinforcement learning-type algorithms to acquire skills. Granger’s team has discovered that circuits in the cerebral cortex, the most recent addition to the brain, create hierarchies of facts and create relationships among facts, similar to hierarchical databases. These are two different mechanisms. Now here’s where it gets exciting. Circuits in these two parts of the brain, the basal ganglia and cortex, are connected by other circuits, combining their proficiencies. A direct parallel exists in computing. Computer reinforcement learning systems operate by trial and error—they must test huge numbers of possibilities in order to learn the right answer.

…

Individual modules in OpenCog will execute tasks such as perception, focusing attention, and memory. They do it through a familiar but customized software toolkit of genetic programming and neural networks. Then the learning starts. Goertzel plans to “grow” the AI in a virtual computer-generated world, such as Second Life, a process of reinforcement learning that could take years. Like others building cognitive architectures, Goertzel believes intelligence must be “embodied,” in a “vaguely humanlike way,” even if its body exists in a virtual world. Then this infant intelligent agent will be able to grow a collection of facts about the world it inhabits.

pages: 306 words: 82,909

A Hacker's Mind: How the Powerful Bend Society's Rules, and How to Bend Them Back
by Bruce Schneier
Published 7 Feb 2023

Victoria Krakovna (2 Apr 2018), “Specification gaming examples in AI,” https://vkrakovna.wordpress.com/2018/04/02/specification-gaming-examples-in-ai. 231if it kicked the ball out of bounds: Karol Kurach et al. (25 Jul 2019), “Google research football: A novel reinforcement learning environment,” arXiv, https://arxiv.org/abs/1907.11180. 231AI was instructed to stack blocks: Ivaylo Popov et al. (10 Apr 2017), “Data-efficient deep reinforcement learning for dexterous manipulation,” arXiv, https://arxiv.org/abs/1704.03073. 232the AI grew tall enough: David Ha (10 Oct 2018), “Reinforcement learning for improving agent design,” https://designrl.github.io. 232Imagine a robotic vacuum: Dario Amodei et al. (25 Jul 2016), “Concrete problems in AI safety,” arXiv, https://arxiv.org/pdf/1606.06565.pdf. 232robot vacuum to stop bumping: Custard Smingleigh (@Smingleigh) (7 Nov 2018), Twitter, https://twitter.com/smingleigh/status/1060325665671692288. 233goals and desires are always underspecified: Abby Everett Jaques (2021), “The Underspecification Problem and AI: For the Love of God, Don’t Send a Robot Out for Coffee,” unpublished manuscript. 233a fictional AI assistant: Stuart Russell (Apr 2017), “3 principles for creating safer AI,” TED2017, https://www.ted.com/talks/stuart_russell_3_principles_for_creating_safer_ai. 233reports of airline passengers: Melissa Koenig (9 Sep 2021), “Woman, 46, who missed her JetBlue flight ‘falsely claimed she planted a BOMB on board’ to delay plane so her son would not be late to school,” Daily Mail, https://www.dailymail.co.uk/news/article-9973553/Woman-46-falsely-claims-planted-BOMB-board-flight-effort-delay-plane.html.

Autonomous Driving: How the Driverless Revolution Will Change the World
by Andreas Herrmann , Walter Brenner and Rupert Stadler
Published 25 Mar 2018

The computer analyses the dataset and then allocates the roads to various categories such as pedestrian precincts, country roads or highways. The concept of reinforcement learning is often applied for problems by which the computer learns dynamically during normal operations. There is no data, or too little or too unspeciﬁc data, for training the computer, only a rudimentary signal that is described as a reward. This signal informs the computer whether the action taken was good, bad or neutral. Reinforcement learning is based on the principle of trial and error and the idea of using the ﬁrst reward to achieve an even bigger one. The methods of reinforcement learning are applied successfully to solve control problems and play an important role, for example, with the navigation of a vehicle.

…

See Radio detection and ranging technology (Radar technology) Index Radio detection and ranging technology (Radar technology), 95, 126 Railways companies, 174 177 development, 32 network, 386 Rand Corporation, 6, 191 Real-time trafﬁc, 137, 262 Real-world model, 92, 99, 105 106 of autonomous driving, 92 computer-driven driving, 108 data from passengers, 94 95 information to passengers and to environment, 108 lane-level and intersection mapping based on lidar, 105 lidar print cloud of blackfriars bridge, 104 mapping and localising, 101 104 planning and monitoring, 106 108 sensing and detecting, 95 100 sensors for vehicle dynamics information, 97 sensors in autonomous vehicles, 95 simulation, 91 92 un-fused sensor data of static and dynamic objects, 106 Redundant steering systems, 124 Reich, Andreas, 135, 136 Reinforcement learning, 114 Relaxation, 80, 212, 218 Reliability of electronic brake systems, 122 Remote vehicle monitoring, 261 Responsibility with increasing automation, 235 Ride-sharing, 22, 184, 206, 302 companies, 404 models, 343 services, 344, 384, 397 439 RIO platform, 167 Road(s), 103 experience management, 94 networks in Chinese megacities, 382 road-safety legislation, 192 and telecommunication networks, 379 trafﬁc, 195 users, 108 Roadmap assistance systems, 71 77 categories of ﬁrst autonomous vehicles, 82 development phases, 77 81 estimated revenues by product package, 77 expected worldwide sales of cars, 85 sales forecasts, 84 86 types of vehicles, 81 84 Roadside objects, 103, 107 Roadster, 27 Robo-cars, 12, 18, 19, 83, 298, 346 349, 347, 406 autonomous, 13 from Google, 335 336 Robots, 10, 58, 238 Roland Berger & Partners, 320 Rosa, Hartmut, 217 SAE J3016 document, 244 Safety, 192 193, 295, 302 304 autonomous vehicles enabling use of alternative fuels, 305 fuel economy, 297 299 functions, 74, 78 intelligent infrastructures, 299 301 land use, 304 operating costs, 301 302 relationship between road speed and road throughput, 296 vehicle throughput, 295 297 440 Sales forecasts, 84 86 Samuelsson, Hakan, 174 San Francisco Park, 301 Savarese, Domenico, 354 Savings effects from autonomous cars, 67 68 from autonomous trucks, 68 69 Scania truck, 162, 261 Schaefﬂer, 9 Science ﬁction, 3, 39 41 Scientists, 177 Security and Privacy in Your Car Act of 2017 (SPY Car Act of 2017), 145 Segway scooter, 221 Self-determination, 147, 148, 205 Self-driving cars, 22, 24, 25, 39, 61, 203, 223, 224, 233, 244, 261, 295, 299, 304, 337, 346, 395 features, 222 ﬂeet of vehicles, 171 172, 349 grain trailer, 263 prototypes, 198, 377 taxis, 337, 345, 374 tractors, 8, 154, 155, 156 trucks, 66, 69, 70, 164, 165 vehicles, 153, 171, 175, 222, 225, 233, 354, 368, 379, 388 Self-learning system, 55, 238 Self-parking, 74, 288, 387 Sensing, 93 and detecting, 95 100 front crash-sensing system, 78 Sensor(s), 261, 374 375 applications, 132 in autonomous vehicles, 95 data, 165 for vehicle dynamics information, 97 Sensory perception, 279 Service creators, 316 319 Index Service-oriented business model, 397 398 Sharing economy ﬂeets, 349 350 peer-to-peer service, 350 351 robo-cars, 346 349 sharing, pooling, 342 345 trend, 341 342 Shashua, Amnon, 93 Shenzhen, 386 389 Shneiderman’s eight golden rules of interface design, 283 Shoeibi, Houchan (President, Saint-Gobain Sekurit), 271 Shuttle service, 14, 383 Simulation, 91 92, 121, 345, 350 Singapore Autonomous Vehicle Initiative, 372 373 Smart-city challenges, 383 features, 406 Smartphone, 79, 216, 222, 317, 402, 407 app, 6, 28, 358, 374 industry, 127 signals, 136 unrestricted spread, 255 Social acceptance, 402 Social discourse, 402 Social exchange, 344 Social interaction communication, 198 200 cultural differences, 195 197 mobility as, 195 pedestrians in trafﬁc in London, 196 pedestrians in trafﬁc in Teheran, 197 Social networks, 7, 225, 227, 341 Society of Automotive Engineers (SAE), 47, 144, 243 245 Software, 93, 111, 117 121 creators, 315 316 errors, 249 testing, 120 Index Somerville, 386 389 Spotify, 141, 316, 319 Stahl, Florian, 113 115 Stakeholders car dealers, repair shops and insurance companies, 173 174 public opinion, politics, authorities and cities, 171 173 railway companies and mobility platforms, 174 177 scientists, 177 technology and telecommunication companies, 173 train station as transportation hub, 176 Standards, 241 characterisation, 242 243 development of technology, 241 242 dominant design, 247 248 State Farm Insurance, 316 State Route 91 in Southern California, 296 Statham, Jason (British actor), 226 Status-conscious customers, 204 Steer-by-wire, 122 solutions, 324 system, 123 Steering, 76, 91, 96, 108, 122 manoeuvre, 253, 286 redundant steering systems, 124 systems, 324 ThyssenKrupp, 123 wheel, 15, 43, 72, 76, 123, 238, 285 Stop-and-go trafﬁc, 58, 206, 218, 295, 299 Suburbs, 317, 404 Supervised learning, 113 114 441 Suppliers, 17, 35, 41, 70, 77, 125, 171, 181 182, 243, 284, 312, 323, 333, 398, 405 Swedish car manufacturer, 355 Swiss Railway Corporation, 174 176 Systematic connectivity, 403 Tactile signals, 72, 108 Take-over request, 285 287 TaxiBots ﬂeet, 350 Technical standards, 247, 371 Technological functions, 247 Technology, 173 companies, 55, 182 183 fusion, 330 334 partnerships, 318 Teheran, pedestrians in trafﬁc in, 197 Telecommunication companies, 173 statement by telecommunications experts, 132 Telematics data, 356 devices, 142 services, 142 Ten-point plan for governments, 401 autonomous mobility establishment as industry of future, 404 405 autonomous vehicles integration in cities, 406 industry clusters development, 405 406 initiating social discourse, 402 investing in communication infrastructure, 403 404 investing in transport infrastructure, 402 403 linking public and private transport, 404 442 promoting research, development and education, 406 407 promoting tests with autonomous vehicles, 407 setting legal framework, 401 402 Terror (ﬁlm), 252 Tesla, 5, 27, 53, 125, 179, 203 204 Texas A & M University, 69 Texas Institute for Urban Mobility, 68 Thune, John, 146 ThyssenKrupp Steering, 123, 324 325 Time, 187 192, 295, 302 304 autonomous vehicles enabling use of alternative fuels, 305 fuel economy, 297 299 intelligent infrastructures, 299 301 land use, 304 management, 215 218 operating costs, 301 302 relationship between road speed and road throughput, 296 vehicle throughput, 295 297 Time-critical, reliable applications, 132 Touareg, Volkswagen, 41 Toyota, 6, 181, 332 333 research into artiﬁcial intelligence and self-driving cars, 183 Toyota RAV4 EV, 27 Tractor’s steering system, 154 Traditional automobile companies, 53 Traditional automotive suppliers, 9, 125 Trafﬁc, 389 and art, 389 390 ﬂows control, 248 Index infrastructure, 58, 247 248, 377, 383, 386 laws, 148, 249 regulations, 44, 107, 195, 255, 373 situation, 6, 10, 21, 55, 65, 80, 93, 102, 160, 187, 206, 251, 316, 336, 365, 386 in United States, Canada, and Northern and Central Europe, 195 Trafﬁc jams, 21, 63, 68, 189, 247, 286, 365, 388 assistants, 10, 53, 113 in daily commuter trafﬁc, 365 time lost in, 187 Transparency, 147, 148, 167, 255 Transport cost, 166, 346, 347 Transport infrastructure, investing in, 402 403 Transportation system, 8, 158, 324, 384 385 Trendsetters, 225 Trojans, 142 Trolley problem, 250 Truck(s), 66, 160 of de Winter Logistics transport, 167 explanation of savings effects from autonomous, 68 69 potential savings from selfdriving cars and, 66 Trust, 287 293 TRW, 9 Twitter, 26, 141, 226, 227 Type-approval authorities, 172 law, 234 Uber, 174, 184, 311, 317, 343, 358 UK automotive industry, 368 Ultrasonic sensors, 126, 333 Un-fused sensor data of static and dynamic objects, 106 Index Unbox Therapy, 226 Underused assets, 341, 351 UNECE vehicle regulations, 234 Uniform legal framework, 246 Union Square in Somerville, 387 388 United Nations General Assembly, 192 United States, 63, 67, 367, 402 current fuel economy for cars, 59 implications of congestion in, 188 legal situation in, 234 235 Luxe start-up in, 319 projects in, 369 371 roads in, 86 trafﬁc in, 195 University of Michigan Transportation Research Institute, 120 Urban Challenge, 42 Urban development Audi urban future initiative, 384 386 megacities, 381 383 Shenzhen, 386 389 smart-city challenges, 383 Somerville, 386 389 trafﬁc and art, 389 390 “Urban Parangolé” project, 384 385 Urban trafﬁc, 17, 54, 79, 120, 168, 183, 384 Urbanisation, 26, 29, 341, 381, 382 US Department for Energy, 69 US Department of Transportation, 69, 298, 355, 383 US Environmental Protection Agency, 191 US Federal Highway Administration, 296 US National Highway Trafﬁc Safety Administration (NHTSA), 145 146, 370 443 US Tech Choice study, 288 Use cases for autonomous driving driving to hub, 213 scenarios, 211 215 time management, 215 218 User groups, 66 User interfaces, 283 285 Utilitarian approach, 250 251, 257 V-to-business application, 399 V-to-dealer communication, 25 V-to-everything communication, 375 V-to-home application, 399 services, 318 V-to-life applications, 318 Valeo, 182 Value chains Baidu, 338 conditions, 328, 330, 331 economics, 328, 329 Google, 334 338 redesign, 327 328 technology fusion, 330 334 Vehicle automation, 401 connectivity, 143 digitising and design, 265 267 management, 74 manufacturers, 313 platooning, 299 surroundings, 284 throughput, 295 297 types, 81 84 Vehicle as ecosystem, 263 264 degree of autonomy, 262 263 intelligent connected vehicle, 261 262 tractor to ecosystem, 262 Vehicle detection in autonomous vehicles, 95 challenges, 98, 100, 103 104 lidar, 95, 96 444 machine-learning algorithms, 96, 98 Radar, 95 for vehicle dynamics information, 97 Vehicle sketches and drafts, 267 Audi designers’ drafts of short-distance vehicles, 269 270 Audi designers’ sketches of long-distance vehicles, 268 Budii car concept, 272 273 driverless cars, 267 269 interview with Houchan Shoeibi, 271 272 Nissan Teatro for Dayz, 273 274 Volkswagen Sedric, 274 275 Vehicle-to-cloud communication (V-to-C communication), 129 Vehicle-to-infrastructure communication (V-to-I communication), 25, 74, 129, 134 135, 143, 182, 241, 243, 246, 247, 332, 369, 371, 377, 397, 399 Vehicle-to-pedestrian communication, 136 Vehicle-to-vehicle communication (V-to-V communication), 10, 25, 74, 129, 133 134, 182, 241, 243, 246, 247, 295, 298, 332, 369, 371, 375, 377, 397, 399, 403 Vehicle-to-vehicle connectivity, 143 Vehicle-to-X (V-to-X), 25, 135 136, 241, 369 370 applications, 101, 147 communication, 272 Version control, 120 Video cameras, 227, 333 Vienna Convention (1968), 11, 172, 234, 246, 254, 401 Index Virginia Tech Transportation Institute, 278 Viruses, 142 Visions, 57 energy, 59 60 lives, 57 58 objections, 61 63 people, 60 people doing in autonomous cars, 62 preconditions, 60 61 space, 58 59 time, 58 VisLab research vehicle, 42 Visteon, 284 Visual signals, 78, 247 Visualisations of mobility hubs for Boston and Washington, 385 Volkswagen, 6, 130, 317, 332 333 e-Golf, 27 group, 198 Sedric, 274 275 Volvo Car Corporation, 45, 117, 174, 181, 316, 322 von Pentz, Markwart, 155 Vulnerability of connected vehicles, 142 Warehouse transportation, 159 Waterfall approach, 330 Waze, real-time trafﬁc mapping app, 374 375 Wickenheiser, Othmar (Professor of Design), 266 Wilson, Joe, 145 Wissmann, Matthias, 17 18 WLAN, 154 Work and welfare Jose Castillo statement, 364 365 prisoners of city, 366 trafﬁc jams, 365 Index World Health Organization, 191, 354, 378 WWired article, 142 YouTube, 53, 227, 319, 323 YouTubers, 226 Yueting, Jia, 183 445 Zetsche, Dieter, 290 zFAS central processing unit, 118, 124, 125 zForce steering wheel, 285 Zimmer, John, 180 Zipcar, 344 Zurich Insurance Group, 354

The Singularity Is Nearer: When We Merge with AI
by Ray Kurzweil
Published 25 Jun 2024

Even optimistic experts judged that the problem wouldn’t be cracked until the 2020s at best. (As of 2012, for example, leading AI futurist Nick Bostrom speculated that Go would not be mastered by artificial intelligence until about 2022.)[77] But then, in 2015–16, Alphabet subsidiary DeepMind created AlphaGo, which used a “deep reinforcement learning” method in which a large neural net processed its own games and learned from its successes and failures.[78] It started with a huge number of recorded human Go moves and then played itself many times until the version AlphaGo Master was able to beat the world human Go champion, Ke Jie.[79] A more significant development occurred a few months later with AlphaGo Zero.

…

By contrast, AlphaGo Zero was not given any human information about Go except for the rules of the game, and after about three days of playing against itself, it evolved from making random moves to easily defeating its previous human-trained incarnation, AlphaGo, by 100 games to 0.[81] (In 2016, AlphaGo had beaten Lee Sedol, who at the time ranked second in international Go titles, in four out of five games.) AlphaGo Zero used a new form of reinforcement learning in which the program became its own instructor. It took AlphaGo Zero just twenty-one days to reach the level of AlphaGo Master, the version that defeated sixty top professionals online and the world champion Ke Jie in three out of three games in 2017.[82] After forty days, AlphaGo Zero surpassed all other versions of AlphaGo and became the best Go player in human or computer form.[83] It achieved this with no encoded knowledge of human play and no human intervention.

…

[85] With this “transfer learning” ability, MuZero can master any board game in which there is no chance, ambiguity, or hidden information, or any deterministic video game like Atari’s Pong. This ability to apply learning from one domain to a related subject is a key feature of human intelligence. But deep reinforcement learning is not limited to mastering such games. AIs that can play StarCraft II or poker, both of which feature uncertainty and require a sophisticated understanding of rival players in the game, have also recently exceeded the performance of all humans.[86] The only exceptions (for now) are board games that require very high linguistic competencies.

pages: 292 words: 94,660

The Loop: How Technology Is Creating a World Without Choices and How to Fight Back
by Jacob Ward
Published 25 Jan 2022

Second, unsupervised learning refers to systems that are given data without any guidance—no right or wrong answers, no helpful labels, nothing—and attempt to sort them, typically into clusters, any way they can. Whatever common patterns differentiate the clusters become the sorting mechanism. Third, reinforcement learning is another way of processing raw, unlabeled data, this time through reward and punishment. A training algorithm infers what you want out of the data, then flogs the system for incorrect answers and rewards it for correct answers. In this way, reinforcement learning teaches the system to sort out the most efficient means of avoiding punishment and earning praise. With no labeled data, it just goes on a craven search for whatever patterns are most likely to earn a reward.

…

(And in order to get good at the distinction, that supervised-learning process would have first been endlessly adjusted by human “trainers” who marked the results correct or incorrect, typically paid pennies per photo on a site like Amazon’s Mechanical Turk.) If, instead, the system was only given a list of attributes—size of animal, color, panting or not, hooved or soft-footed—reinforcement learning might simply begin guessing (dog! cow!) and adjust its verdict as the training algorithm compares those guesses to what it infers the humans want out of the matter. That process might take hours, days, maybe longer, but if no one had bothered to build a dog-versus-cow recognition system before, perhaps it would be the best route to distinguishing the animals from one another.

The Myth of Artificial Intelligence: Why Computers Can't Think the Way We Do
by Erik J. Larson
Published 5 Apr 2021

Board games, though, are enclosed in a system of rules, which helps explain why inductive approaches that learn from experience of gameplay work so well. AlphaGo (or its successor AlphaZero) uses a kind of machine learning known as deep learning to play the difficult game of Go. It plays against itself, using something called deep reinforcement learning, and induces hypotheses about the best moves to make on the board given its position and the opponent’s. The approach is fabulously successful on “discreet, observable, two-player games with known rules,” as AI scientist Stuart Russell points out.6 Russell might not have been thinking about Russell’s turkey, but he should have been: the real problem with games propping up AI is that they permit hypotheses (generalizations from experience) to be formed according to known rules.

…

The technique known as deep learning is a type of machine learning—a neural network—that has shown much promise in recognizing objects in photos, boosting performance on autonomous vehicles, and playing seemingly difficult games. For example, Google’s DeepMind system learned to play a number of classic Atari video games to much fanfare. It was heralded as general intelligence, because the same system was able to master different games using the so-called deep reinforcement learning approach that powered AlphaGo and AlphaZero. But the AI startup Vicarious, for one, soon pointed out that seemingly innocuous changes to the games degraded the ostensibly fabulous performance of the system. In Breakout, for instance, a player moves a paddle back and forth on a base line, batting a ball upward into a many-layered brick wall.

…

W., 248 Byron, Lord, 238 Capek, Karel, 82–83 causation: correlation and, 259; Hume on, 120; ladder of, 130–131, 174; relevance problems in, 112 chess: Deep Blue for, 219; played by computers, 284n1; Turing’s interest in, 19–20 Chollet, François, 27 Chomsky, Noam, 52, 95 classification, in supervised learning, 134 cognition, Legos theory of, 266 color, 79, 289n16 common sense, 2, 131–132, 177; scripts approach to, 181–182; Winograd schemas test of, 196–203 computational knowledge, 178–182 computers: chess played by, 19–20, 284n1; earliest, 232–233; in history of technology, 44; machine learning by, 133; translation by, 52–55; as Turing machines, 16, 17; Turing’s paper on, 10–11 Comte, August, 63–66 Condorcet (Marie Jean Antoine Nicolas Caritat, the Marquis de Condorcet), 288n4 conjectural inference, 163 consciousness, 77–80, 277 conversations, Grice’s Maxims for, 215–216 Copernicus, Nicolaus, 104 counterfactuals, 174 creative abduction, 187–189 Cukier, Kenneth, 143, 144, 257 Czechoslovakia, 60–61 Dartmouth Conference (Dartmouth Summer Research Project on Artificial Intelligence; 1956), 50–51 data: big data, 142–146; observations turned into, 291n12 Data Brain projects, 251–254, 261, 266, 268, 269 data science, 144 Davis, Ernest, 131, 183; on brittleness problem, 126; on correlation and causation, 259; on DeepMind, 127, 161–162; on Google Duplex, 227; on limitations of AI, 75–76; on machine reading comprehension, 195; on Talk to Books, 228 deduction, 106–110, 171–172; extensions to, 167, 175; knowledge missing from, 110–112; relevance in, 112–115 deductive inference, 189 Deep Blue (chess computer), 219 deep learning, 125, 127, 134, 135; as dead end, 275; fooling systems for, 165–166; not used by Watson, 231 DeepMind (computer program), 127, 141, 161–162 DeepQA (Jeopardy! computer), 222–224 deep reinforcement learning, 125, 127 Dostoevsky, Fyodor, 64 Dreyfus, Hubert, 48, 74 earthquake prediction, 260–261 Eco, Umberto, 186 Edison, Thomas, 45 Einstein, Albert, 239, 276 ELIZA (computer program), 58–59, 192–193, 229 email, filtering spam in, 134–135 empirical constraint, 146–149, 173 Enigma (code making machine), 21, 23–24 entity recognition, 137 Etzioni, Oren, 129, 143–144 Eugene Goostman (computer program), 191–195, 214–216 evolutionary technology, 41–42 Ex Machina (film, Garland), 61, 78–80, 82, 84, 277 Facebook, 147, 229, 243 facts, data turned into, 291n12 Farecast (firm), 143–144 feature extraction, 146–147 Ferrucci, Dave, 222, 226 filter bubbles, 151 financial markets, 124 Fisch, Max H., 96–97 Fodor, Jerry, 53 formal systems, 284n6 Frankenstein (fictional character), 238 Frankenstein: Or, a Modern Prometheus (novel, Shelly), 238, 280 frequency assumptions, 150–154, 173 Fully Automated High-Quality Machine Translation, 48 functions, 139 Galileo, 160 gambler’s fallacy, 122 games, 125–126 Gardner, Dan, 69–70 Garland, Alex, 79, 80, 289n16 Gates, Bill, 75 general intelligence, 2, 31, 36; abduction in, 4; in machines, 38; nonexistance of, 27; possible theory of, 271 General Problem Solver (AI program), 51 Germany: Enigma machine of, 23–24; during World War II, 20–21 Go (game), 125, 131, 161–162 Gödel, Kurt, 11, 22, 239; incompleteness theorems of, 12–15; Turing on, 16–18 Golden, Rebecca, 250 Good, I.

pages: 340 words: 97,723

The Big Nine: How the Tech Titans and Their Thinking Machines Could Warp Humanity
by Amy Webb
Published 5 Mar 2019

You are a biologically unique person whose salivary glands and taste buds aren’t arranged in exactly the same order as mine. Yet we’ve both learned what an apple is and the general characteristics of how an apple tastes, what its texture is, and how it smells. During our lifetimes, we’ve learned to recognize what an apple is through reinforcement learning—someone taught us what an apple looked like, its purpose, and what differentiates it from other fruit. Then, over time and without conscious awareness, our autonomous biological pattern recognition systems got really good at determining something was an apple, even if we only had a few of the necessary data points.

…

That product uses WaveNet, an AI-based generative program that’s part of DeepMind.44 Meanwhile, AI researchers in a different division of Alphabet called Google Brain revealed that they had built an AI that’s capable of generating its own AIs. (Got that?) The system, called AutoML, automated the design of machine-learning models using a technique called “reinforcement learning.” AutoML operated as a sort of “parent”—a top-level controller DNN that would decide to create “child” AI networks for narrow, specific tasks. Without being asked, AutoML generated a child called NASNet and taught it to recognize objects like people, cars, traffic lights, purses, and more in videos.

…

See also Alphabet; Google Palantir, 87 Parrot attacks: in pragmatic scenario of future, 193 Pascal, Blaise, 20–21 Patagonia, 210 Peking University, China Credit Research Center at, 80 Peloton, 87 People’s Liberation Army, 78 People’s Republic of China (PRC), Centennial of, 223 Perception system, 32 Personal data records (PDRs), catastrophic scenario of future and, 208–209, 218, 226; as social credit score, 209; corporate ownership of, 209 Personal data records (PDRs), optimistic scenario of future and, 152–153; China and, 152, 154; as heritable, 153; individual ownership of, 159; privacy and, 168; treated as distributed ledgers, 159 Personal data records (PDRs), pragmatic scenario of future and: G-MAFIA ownership, 187; linked to insurance premium, 194; third-party use, 189 Personally identifiable information (PII), 237; need for citizen-owned, 237 Pets, robotic: in optimistic scenario of future, 162 Pharmacists and pharmacies: computational in optimistic scenario of future, 173 Pichai, Sundar, 64–65, 101 Pitts, Walter, 26; neural network theory, 26–27 Plato, 17 Police Cloud, 6, 82 Portal, 54 Pratt, Gill, 149 Privacy: Chinese view of, 79–82; Cook, Tim, on future of, 95; G-MAFIA commitment to in optimistic scenario of future, 168; PDRs and in optimistic scenario of future, 168 Processors: as part of AI ecosystem, 17 Project Maven, 78–79; Google employee resignations and, 79, 101 Purcell, Henry, 16 Python programming language, 60 R programming language, 60 Recursive self-improvement, 149 Regulations, government: eliminating most for G-MAFIA AI development, 250 Reinforcement learning, 49 Réngōng Zhinéng (Artificial Intelligence) Dynasty: in catastrophic scenario of future, 223, 229, 233 Reward hacking, pragmatic scenario of future and, 183 Robots, physical: Electro the Moto-Man, 25; in film and TV, 2; harassment by in pragmatic scenario of future, 199; physical danger from, 58.

Mastering Machine Learning With Scikit-Learn
by Gavin Hackeling
Published 31 Oct 2014

Some types of problems, called semi-supervised learning problems, make use of both supervised and unsupervised data; these problems are located on the spectrum between supervised and unsupervised learning. An example of semi-supervised machine learning is reinforcement learning, in which a program receives feedback for its decisions, but the feedback may not be associated with a single decision. For example, a reinforcement learning program that learns to play a side-scrolling video game such as Super Mario Bros. may receive a reward when it completes a level or exceeds a certain score, and a punishment when it loses a life. However, this supervised feedback is not associated with specific decisions to run, avoid Goombas, or pick up fire flowers.

pages: 407 words: 109,653

Top Dog: The Science of Winning and Losing
by Po Bronson and Ashley Merryman
Published 19 Feb 2013

Fosella, “Neurogenetics and Pharmacology of Learning, Motivation, and Cognition,” Neuropsychopharmacology, vol. 36, pp. 133–152 (2011) Frank, Michael J., Ahmed A. Moustafa, Heather M. Haughey, Tim Curran, & Kent E. Hutchinson, “Genetic Triple Dissociation Reveals Multiple Roles for Dopamine in Reinforcement Learning,” Proceedings of the National Academy of Sciences, vol. 104(41), pp. 16311–16316 (2007) Kennedy, Q., J. L. Taylor, A. Noda, M. Adamson, G. M. Murphy, J. M. Zeitzer, & J. A. Yesavage, “The Roles of COMT Val158Met Status and Aviation Expertise in Flight Simulator Performance and Cognitive Ability,” Behavior Genetics, vol. 41(5), pp. 700–708 (2011) Kennedy, Quinn, Interview with Author (2011) Krugel, Lea K., Guido Biele, Peter N.

…

Allen, “Social Stress Reactivity Alters Reward and Punishment Learning,” Social, Cognitive, & Affective Neuroscience, vol. 6(3), pp. 311–320 (2011) Cavanaugh, James F., Michael J. Frank, Theresa J. Klein, & John J. B. Allen, “Frontal Theta Links Prediction Errors to Behavioral Adaptation in Reinforcement Learning,” NeuroImage, vol. 49(4), pp. 3198–3209 (2010) Cavanaugh, James F., Laura Zambrano-Vazques, & John J. B. Allen, “Theta Lingua Franca: A Common Mid-Frontal Substrate for Action Monitoring Processes,” Psychophysiology, vol. 49(2), pp. 220–238 (2012) de Bruijn, Ellen R. A., Floris P. de Lange, D.

…

Meltzoff, “The Neural Bases of Cooperation and Competition: An fMRI Investigation,” NeuroImage, vol. 23(2), pp. 744–751 (2004) Dhar, Monica, Jan Roelf Wiersema, & Gilles Pourtois, “Cascade of Neural Events Leading from Error Commission to Subsequent Awareness Revealed Using EEG Source Imaging,” PLoS ONE, vol. 6(5), e19578, doi:10.1371/journalpone0019578 Frank, Michael J., Christopher D’Lauro, & Tim Curran, “Cross-Task Individual Differences in Error Processing: Neural, Electrophysiological, and Genetic Components,” Cognitive, Affective, & Behavioral Neuroscience, vol. 7(4), pp. 297–308 (2007) Frank, Michael J., Brion S. Woroch, & Tim Curran, “Error-Related Negativity Predicts Reinforcement Learning and Conflict Biases,” Neuron, vol. 47(4), pp. 495–501 (2005) Gu, Ruolei, Yue Ge, Yang Jiang, & Yue-Jia Luo, “Anxiety and Outcome Evaluation: The Good, the Bad and the Ambiguous,” Biological Psychology, vol. 85(3), pp. 200–206 (2010) Gu, Ruolei, Tingting Wu, Yang Jiang, & Yue-Jia Luo, “Woulda, Coulda, Shoulda: The Evaluation and the Impact of the Alternative Outcome,” Psychophysiology, vol. 48(10), pp. 1354–1360 (2011) Hajcak, Greg, & Dan Foti, “Errors Are Aversive Defensive Motivation and the Error-Related Negativity,” Psychological Science, vol. 19(2), pp. 103–108 (2008) Holroyd, Clay B., Kaivon L.

pages: 252 words: 74,167

Thinking Machines: The Inside Story of Artificial Intelligence and Our Race to Build the Future
by Luke Dormehl
Published 10 Aug 2016

In addition to devices being able to talk with one another, one of the big differences with the smart connected home will be the use of Artificial Intelligence to form goals that our gadgets can work towards to make our life easier, more comfortable, or more productive. ‘The idea of the Internet of Things, all these devices that are thinking a bit, could go one of two ways,’ says Richard Sutton, an expert in a field of AI called ‘reinforcement learning’, which deals with AI capable of forming and pursuing goals. ‘You may have isolated agents which behave with our own localised goal. For example, your thermostat’s “goal” might be to be efficient and not use too much fuel. Your refrigerator’s “goal” is to make sure that it’s fully stocked to serve you food whenever you want it.

…

As noted in chapter six, however, genetic algorithms can generate solutions that we may not necessarily expect. The programmer lays out a goal for the algorithm in the form of an ‘objective function’, but does not know exactly how the computer will achieve this. The same is true of strategies an AI might create to pursue goals, as with a field like reinforcement learning, which was briefly discussed in chapter three. In both cases, the human creators are unable to predict the ‘local’ behaviour of an AI on a step-by-step basis. Things become even more complex when the suggestion of consciousness becomes involved. For instance, should the nervous system of C. elegans, as described in the last chapter, be satisfactorily replicated inside a computer, would that represent Artificial General Intelligence?

pages: 360 words: 85,321

The Perfect Bet: How Science and Math Are Taking the Luck Out of Gambling
by Adam Kucharski
Published 23 Feb 2016

New York Review of Books, February 11, 2010. http://www.nybooks.com/articles/archives/2010/feb/11/the-chess-master-and-the-computer/. 172In 2013, journalist Michael Kaplan: Details of Vegas bot given in: Kaplan, Michael. “The Steely, Headless King of Texas Hold ‘Em.” New York Times Magazine, September 5, 2013. http://www.nytimes.com/2013/09/08/magazine/poker-computer.html. 173It would have to read its opponent: Comparison of poker and backgammon in: Dahl, Fredrik. “A Reinforcement Learning Algorithm Applied to Simplified Two-Player Texas Hold’em Poker.” EMCL ‘01 Proceedings of the 12th European Conference on Machine Learning (2001): 85–96. doi:10.1007/3–540–44795–4_8. 174Neural networks are not a new idea: McCulloch, Warren S., and Walter H. Pitts. “A Logical Calculus of the Ideas Immanent in Nervous Activity.”

…

Association for the Advancement of Artificial Intelligence (2013). https://webdocs.cs.ualberta.ca/~bowling/papers/13aaai-collusion.pdf. 182There are reports of unscrupulous players: Goldberg, Adrian. “Can the World of Online Poker Chase Out the Cheats?” BBC News, September 12, 2010. http://www.bbc.com/news/uk-11250835. 182“In any form of poker”: Dahl, F. “A Reinforcement Learning Algorithm Applied to Simplified Two-Player Texas Hold’em Poker.” In European Conference on Machine Learning 2001, Lecture Notes in Artificial Intelligence 2167, ed. L. De Raedt and P. Flach (Berlin: Springer-Verlag, 2001). 184tweak your tactics as you learn: Author interview with Tuomas Sandholm, December 2013.

The Ages of Globalization
by Jeffrey D. Sachs
Published 2 Jun 2020

Rest of the World—June 30, 2019,” https://www.internetworldstats.com/top20.htm; Swift settlements, swift.com, “The SWIFT-CLS Partnership in FX Reduces Risk and Adds Liquidity,” April 4, 2019, https://www.swift.com/news-events/news/the-swift-cls-partnership-in-fx-reduces-risk-and-adds-liquidity. 3. See David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, and Arthur Guez, et. al., “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” arXiv.org (2017). 4. Jeffrey D. Sachs, The End of Poverty: Economic Possibilities for Our Time (New York: Penguin, 2006). 5. World Bank, Poverty and Shared Prosperity 2018: Piecing Together the Poverty Puzzle (Washington, D.C.: World Bank, 2018), http://documents.worldbank.org/curated/en/104451542202552048/Poverty-and-Shared-Prosperity-2018-Piecing-Together-the-Poverty-Puzzle. 6.

…

“Ancient Genomes Show Social and Reproductive Behavior of Early Upper Paleolithic Foragers.” Science 358, no. 6363 (2017): 659–62. Silver, David, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, and Arthur Guez et. al. “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.” arXiv.org (2017), https://arxiv.org/abs/1712.01815. Smith, Adam. The Wealth of Nations [1776]. (1937). Smith, Richard. Premodern Trade in World History. New York: Routledge, 2008. Spickler, AR. African Animal Trypanosomiasis: Nagana, Tsetse Disease, Tsetse Fly Disease, African Animal Trypanosomosis.

pages: 245 words: 83,272

Artificial Unintelligence: How Computers Misunderstand the World
by Meredith Broussard
Published 19 Apr 2018

The second reason is that thinking chess was a test of intelligence was based on a false cultural premise that brilliant chess players were brilliant minds, more gifted than those around them. Yes, many intelligent people excel at chess, but chess, or any other single skill, does not denote intelligence.10 There are three general types of machine learning: supervised learning, unsupervised learning, and reinforcement learning. Here are definitions of each from a widely used textbook called Artificial Intelligence: A Modern Approach by UC Berkeley professor Stuart Russell and Google’s director of research, Peter Norvig: Supervised learning: The computer is presented with example inputs and their desired outputs, given by a “teacher,” and the goal is to learn a general rule that maps inputs to outputs.

…

Here are definitions of each from a widely used textbook called Artificial Intelligence: A Modern Approach by UC Berkeley professor Stuart Russell and Google’s director of research, Peter Norvig: Supervised learning: The computer is presented with example inputs and their desired outputs, given by a “teacher,” and the goal is to learn a general rule that maps inputs to outputs. Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means toward an end (feature learning). Reinforcement learning: A computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle or playing a game against an opponent). The program is provided feedback in terms of rewards and punishments as it navigates its problem space.11 Supervised learning is the most straightforward.

Seeking SRE: Conversations About Running Production Systems at Scale
by David N. Blank-Edelman
Published 16 Sep 2018

Machine learning works by using intelligent agents.2 An agent is an AI concept; it is anything that can be viewed as perceiving its environment through data or sensors and acting upon that environment through data or sensors and acting upon that environment, toward achieving goals, by actuators and displaying, sending, or writing data. We can divide most machine learning algorithms into three broad categories: supervised learning, unsupervised learning, and reinforcement learning. Figure 18-2 illustrates these categories. Figure 18-2. Machine learning categories What Do We Mean by Learning? Let’s look at a modern definition of learning in terms of the machine: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

…

In unsupervised learning, the agent learns patterns in the input, despite the fact that no concrete feedback is given. Following is the most common task: Clustering Clusters of useful input examples are detected. For example, a machine might gradually develop a concept of “good site traffic” and “bad site traffic” without ever having a human teaching intervention. In reinforcement learning, the agent learns by using rewards (reinforcements). For example, it can be penalized for not spotting a security occurrence in a traffic stream or awarded points for every correct finding. The agent will then do better over time. Of course, other tasks and types of tasks are possible, like anomaly detection, and there are still many being discovered as of this writing.

…

This outcome got everyone thinking that machines like Deep Blue would solve very important problems. In 2015, nearly 20 years later, the ancient Chinese game Go, which has many more possible moves than chess, was won by DeepMind,4 using a program called AlphaGo, via the application of deep reinforcement learning, which is a much different approach than using search algorithms. Table 18-1 compares the two machines and their methodologies. Table 18-1. The two machines and their methodologies Deep Blue; Chess; May 1997 DeepMind; AlphaGo; October 2015 Brute force Search Algorithm Developer: IBM Adversary: Garry Kasparov Deep learning Machine learning Developer: Google Adversary: Fan Hui But the first game that gripped the world’s attention in AI was checkers, with the pioneer Arthur Samuel, who coined the term “machine learning” back in 1959.

Learn Algorithmic Trading
by Sebastien Donadio
Published 7 Nov 2019

Other Books You May Enjoy If you enjoyed this book, you may be interested in these other books by Packt: Mastering Python for Finance - Second Edition James Ma Weiming ISBN: 9781789346466 Solve linear and nonlinear models representing various financial problems Perform principal component analysis on the DOW index and its components Analyze, predict, and forecast stationary and non-stationary time series processes Create an event-driven backtesting tool and measure your strategies Build a high-frequency algorithmic trading platform with Python Replicate the CBOT VIX index with SPX options for studying VIX-based strategies Perform regression-based and classification-based machine learning tasks for prediction Use TensorFlow and Keras in deep learning neural network architecture Hands-On Machine Learning for Algorithmic Trading Stefan Jansen ISBN: 9781789346411 Implement machine learning techniques to solve investment and trading problems Leverage market, fundamental, and alternative data to research alpha factors Design and fine-tune supervised, unsupervised, and reinforcement learning models Optimize portfolio risk and performance using pandas, NumPy, and scikit-learn Integrate machine learning models into a live trading strategy on Quantopian Evaluate strategies using reliable backtesting methodologies for time series Design and evaluate deep neural networks using Keras, PyTorch, and TensorFlow Work with reinforcement learning for trading strategies in the OpenAI Gym Leave a review - let other readers know what you think Please share your thoughts on this book with others by leaving a review on the site that you bought it from.

pages: 285 words: 86,853

What Algorithms Want: Imagination in the Age of Computing
by Ed Finn
Published 10 Mar 2017

The humanities has long grappled with the story that started this book: the mythic power of language, the incantatory magic of words and codes. We desperately need more readers, more critics, to interpret the algorithms that now define the channels and horizons of our collective imaginations. Notes 1. Mnih et al., “Human-Level Control through Deep Reinforcement Learning.” 2. Reese, “Google DeepMind.” 3. Metz, “Google’s AI Takes Historic Match against Go Champ with Third Straight Win.” 4. Turing, “Computing Machinery and Intelligence,” 457. 5. Domingos, The Master Algorithm, 4. 6. Madrigal, “How Netflix Reverse Engineered Hollywood”; Strogatz, “The End of Insight.” 7.

…

“How Google Search Really Works.” ReadWrite, February 29, 2012. http://readwrite.com/2012/02/29/interview_changing_engines_mid-flight_qa_with_goog. Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, et al. “Human-Level Control through Deep Reinforcement Learning.” Nature 518 (7540) (February 26, 2015): 529–533. doi:10.1038/nature14236. Morozov, Evgeny. To Save Everything, Click Here: The Folly of Technological Solutionism. New York: PublicAffairs, 2013. Moschovakis, Yiannis N. What Is an Algorithm? In Mathematics Unlimited: 2001 and Beyond, edited by Björn Engquist and Wilfried Schmid, 919–936.

pages: 326 words: 88,968

The Science and Technology of Growing Young: An Insider's Guide to the Breakthroughs That Will Dramatically Extend Our Lifespan . . . And What You Can Do Right Now
by Sergey Young
Published 23 Aug 2021

That should give you a pretty good idea why the entire drug development process takes up to two billion dollars and about twelve years.29 Insilico managed to speed up this process significantly and did it in such a way that virtually guarantees a much higher rate of success than the average. How? With artificial intelligence, of course. We saw in chapter six how AI is set to alter health care in a constellation of ways. Insilico calls its AI drug discovery tool Generative Tensorial Reinforcement Learning (GENTRL). Once trained, the algorithm starts to “imagine” new molecules with the desired properties. This process not only vastly reduces the time it takes to discover molecular candidates and enables the creation of molecules that do not yet exist in molecular libraries; it does so with a much higher degree of success than conventional trial and error, and at a much lower cost.

…

death(s) accidental due to diagnostic errors fear of and meaning in life old age as cause of premature (see also increasing your lifespan) preventable causes of psychological age and risk of telemedicine in preventing de Cabo, Rafael deep learning Deep Longevity deep neural networks (DNN) de Grey, Aubrey de Keizer, Peter de la Zerda, Adam DeLisi, Charles DeSilva, Ashanthi Deuel, Thomas Dexcomm diagnosis access to accuracy of current practices for early by telemedicine diagnostic devices in Internet of Body near-future DIY devices new DIY developments for wearable diagnostics based on “omes,” and current paradigm of diagnosis data at center of epigenetic genetic with Internet of Body liquid biopsy near future DIY diagnostics new DIY technology developments noninvasive and affordable using microbiome Diamandis, Peter diet and foods caloric restriction in preventing/treating disease sugar in Discovery DNA DNAFit DNA testing DNN clocks Doudna, Jennifer drugs. see pharmaceutical interventions E economic disparity economics, genetic engineering and education eGenesis Emerson, Ralph Waldo Emmanuel, Ezekiel Emotiv Engineered Arts Enterome entropy epigenetic clocks epigenetic diagnostics epigenome ethical concerns with immortality potential for evolutionary conflict question of free will reshaping of social constructs wealth inequality eukaryotes Eversense evolutionary conflict exercise/physical activity EXi EXO Imaging extreme longevity common reaction to forms of logical case for oneness of man and machine for technical immortality for technical inevitability of F Facebook Fahy, Greg family unit Far Horizon of Longevity (200 plus years). see also individual topics future technologies morality of immortality fear Feelreal Fernandez, Eduardo Ferrazzi, Keith fetal monitors Feynman, Richard Fitbit Flatiron Health FOXO-4 Frankl, Viktor Freenome free radical theory of aging free will Freud, Sigmund future technologies. see also extreme longevity G Gates, Bill GED Match Gelsinger, Jesse gene editing gene (genetic) engineering altering genes inside body cells background of complex questions surrounding CRISPR in gene editing in ending cancer future of of longevity genes questions surrounding Generative Tensorial Reinforcement Learning (GENTRL) gene sequencing gene therapy genetic diagnostics genome altering Human Genome Project instability of in precision medicine for reversing cell aging sequencing Ghandi, Mahatma Gilbert, Walter Gilead Sciences goal alignment Gobel, David Gomez, Bernardéta Google Gore, Adrian government gratitude Gray, Victoria Greger, Michael Griffin, Dana Grossman, Terry Gulshan, Varun Gurdon, Sir John B.

pages: 420 words: 94,064

The Revolution That Wasn't: GameStop, Reddit, and the Fleecing of Small Investors
by Spencer Jakab
Published 1 Feb 2022

“The market puts a price on how bad the trading on each platform is, and it pays Robinhood more than anyone else.” Mob Mentality It isn’t just the potential financial payoff that has made lottery-like bets so popular—internet fame is part of the allure. On algorithmically enhanced sites, only the most appealing messages become highly visible. A core psychological concept known as “reinforcement learning” encourages users to post the sort of things that will turn heads and get clicks. Getting rewarded by your peers for doing something makes you more likely to do it again. On social media, that reward is attention. The ability to use borrowed money and derivatives makes the one-upmanship more dangerous.

…

Morgan Asset Management, 255–56 JPMorgan Chase, 160, 217 K KaloBios, 39 Kearns, Alex, 103–4 Kindleberger, Charles P., 179 Klarman, Seth, 184 Koss, 132, 169, 188, 224 Kruger, Justin, 28 Kynikos Associates, 77 L Ladies’ Home Journal, 150 Lamberton, Cait, 54, 62 Lamont, Owen, 80 Langer, Ellen, 27 Langlois, Shawn, 45 Laufer, Henry, 237 Lay, Kenneth, 85 Lebed, Jonathan, 163 Leder, Michelle, 239 Ledger, Heath, 138 Left, Andrew, 39, 116–26, 148, 191, 214, 217 GameStop and, 120–24, 129, 130, 133, 146 harassment of, 122 WallStreetBets and, 121–23, 126, 129, 130, 133, 136, 238 Lehman Brothers, 80, 117 Lending Tree, 162 Levie, Aaron, 26 Lewis, Michael, 16, 88 Lindzon, Howard, 24, 49, 176 LinkedIn, 239 Livermore, Jesse, 78–79 locating a borrow, 72–73, 80 Loeb, Dan, 111 Lombardi, Vince, 8 Long-Term Capital Management, 260 Loop Capital, 128 Los Angeles Times, 215 loss aversion, myopic, 236 lotteries, 62, 239, 241, 242 Lowenstein, Roger, 260 Lucid Motors, 164 M Mad Money, 254 Madoff, Bernie, 117, 206 MagnifyMoney, 162 Mahoney, Seth, 19, 31, 176–77 Malaysia, 75 Malkiel, Burton, 253 Manias, Panics, and Crashes (Kindleberger), 179 Manning, Peyton, 64 Man Who Solved the Market, The (Zuckerman), 237 Maplelane Capital, 217 March Madness, 57 Marcus, 257 margin calls, 203–5 margin debt, 58–59, 62, 67, 138, 188 Markets Insider, 103 MarketWatch, 45, 180 MassMutual, 87, 131, 171 Mavrck, 142 Mayday, 48–50, 66 McCabe, Caitlin, 128–29 McCormick, Packy, 23, 35, 104, 202 McDonald, Larry, 99 McDonald’s, 154 McHenry, Patrick, 239 McLean, Bethany, 85 Medallion Fund, 237 MedBox, 117 Melvin Capital Management, 6–8, 56, 72, 94–96, 110–12, 114, 119, 121, 123, 128–30, 132, 135, 136, 146, 189, 190, 202, 205, 217, 218, 222, 227 meme stocks, xii–xiv, 5, 7–9, 11, 12, 14, 22, 30, 32–34, 36, 39, 40, 47, 54, 63, 67, 72, 73, 76, 100, 108, 123, 125, 127, 129, 132–33, 135, 137–40, 146, 147, 153–55, 157, 159, 160, 162, 164, 169, 170, 178, 179, 181, 183, 185, 191, 193, 194, 198–99, 204–5, 208, 219, 220, 222, 227, 229, 230, 237, 238, 240, 246 AMC, 39, 93, 125, 127, 132, 169, 188, 220–21, 224–26 Bed Bath & Beyond, 115, 133, 188 BlackBerry, 93, 115, 133, 169, 178, 188, 224 bot activity and, 165, 166 GameStop, see GameStop, GameStop short squeeze insiders of, 224 Koss, 132, 169, 188, 224 margin debt and, 58 Naked, 132, 188 Nokia, 169, 178, 188 payment for order flow and, 207 Robinhood’s trading restrictions on, 187–89, 194, 195–200, 203, 206 Merton, Robert, 101, 102, 108 Microsoft, 46, 93 Mihm, Stephen, 48 millennials, 21, 26, 27, 56, 71, 88, 142, 143, 148, 162, 242, 246, 255 Minnis, Chad, 126, 157, 242 MoneyWatch, 59 monthly subscription services, 32 Morgan Stanley, 28, 55, 178, 219 Morningstar, 216, 244, 245, 254, 255 Motherboard, 131–32 Motter, John, 215–17, 226 Mudrick, Jason, 220–21 Mudrick Capital Management, 220 Mulligan, Finley, 230 Mulligan, Quinn, 142, 214 Munger, Charlie, 183–84, 241 Murphy, Paul, 78 Musk, Elon, 19, 75, 82–83, 92, 124, 143, 149, 152–53, 155–57, 160, 161, 167, 212, 216 tweets of, x, 60, 82, 83, 124, 144, 152–54, 161, 170 Must Asset Management, 221 mutual funds, 139, 151, 221, 234, 244, 245, 254–56 myopic loss aversion, 236 N Naked Brand, 132, 188 Nasdaq, 60, 92, 98, 104 Nasdaq Whale, 98, 104–6, 108, 109, 227 Nathan, Dan, 192 National Council on Problem Gambling, 31, 57 National Futures Association, 118 Nations, Scott, 99 Nations Indexes, 99 NCAA Basketball, 57 Netflix, x–xi, 15, 50, 98, 133, 208 Netscape, 24 Neumann, Adam, 105 New Yorker, 143 New York Mets, 8, 161 New York Post, 124, 172 New York Stock Exchange, 49 New York University, 20, 82, 177 Nikola, 64 NIO, 120 Nobel Prize, 101, 260 Nokia, 169, 178, 188 nudges, 31–32, 235–36 Nvidia, 98 O Obama, Barack, 13, 38 Ocasio-Cortez, Alexandria, 160, 197 Occupy Wall Street movement, 12, 125 Odean, Terrance, 235, 238, 243 Odey, Crispin, 126 Ohanian, Alexis, 12, 37–38, 125 O’Mara, Margaret, 38, 156, 157 Omega Family Office, 191 O’Neal, Shaquille, 64 Oppenheimer, Robert, 83 options, 34–35, 99–107, 217 call, see call options delta and, 107, 108 losses and quick approval processes for, 103 put, 46, 99, 106, 111–12, 148 Robinhood and, 34–35, 102–4, 106, 108–9 Options Clearing Corporation, 102 P Pagel, Michaela, 235 Palantir Technologies, 120 Palihapitiya, Chamath, 143, 144, 152–53, 155, 157–58, 160, 164, 212, 234, 246, 253 Palm, 84 PalmPilot, 84 Pao, Ellen, 38 Paperwork Crisis, 49 Parker, Sean, 38 payment for order flow, 10, 33, 153, 196, 206–9 Penn National, 57 penny stocks, 60, 120, 133, 166, 167 Permit Capital, 223 Pershing Square Holdings, 56 Pets.com, 90 PetSmart, 89 Pew Research, 71 Physical Impossibility of Death in the Mind of Someone Living, The (Hirst), 7 Piggly Wiggly, 78–79 PiiQ Media, 166 PIMCO, 216 Plotkin, Gabriel, 41, 56, 67, 73, 80, 85, 86, 95–96, 110–12, 114–15, 116, 122, 123, 129, 130, 133, 140, 146, 148, 157, 158, 161, 191, 197, 213–14, 217, 218, 227, 240, 246, 250, 253 at congressional hearing, 6–11 Porsche, 77 Portnoy, Dave, 57, 152–55, 158–59, 161, 181, 188–89, 212 Povilanskas, Kaspar, 195 Pruzan, Jonathan, 219 Psaki, Jen, 192 Public.com, 196, 207, 209 pump and dump, 163 put options, 46, 99, 106, 111–12, 148 Q Qualcomm, 46 R RagingBull, 163 Random Walk Down Wall Street, A (Malkiel), 253 Raskob, John J., 150–52, 154, 156 Raytheon, 153–54 RC Ventures LLC, 114 Reagan, Ronald, 156, 234 Reddit, xi, xii, 11–12, 19, 22, 23, 25, 36–39, 41, 42, 107, 122, 125, 162, 164, 199 founding of, 37–38 Gill’s influence on, 141–42; see also Gill, Keith; WallStreetBets karma on, 47, 141–42 mechanics and demographics of, and GameStop, 37 offensive subreddits on, 38 r/ClassActionRobinHood, 196 r/GMEbagholders, 140 r/investing, ix, 46 r/wallstreetbets, see WallStreetBets Super Bowl ad of, 12 Volkswagen squeeze and, 78 Reddit Revolution, xv, 41, 42, 75, 99, 152, 170, 192, 206, 211, 219, 220, 230, 246, 261 see also GameStop, GameStop short squeeze; WallStreetBets rehypothecation, 80, 92 reinforcement learning, 35 Reminiscences of a Stock Operator (Lefèvre), 78 Renaissance Technologies, 237 retail trading, xiii, xiv, xvi, 4, 7, 9–14, 49, 56–59, 63–64, 66, 67, 81, 98, 140–41, 143, 169–70, 178, 181, 183, 186, 194, 218, 237, 238, 244, 247 retirement accounts and pension funds, 5, 13, 27, 31–32, 41, 69, 76, 77, 81, 171, 182, 234, 235, 245, 252, 255, 256 Rise of the Planet of the Apes, 135–36 RiskReversal Advisors, 192 Ritter, Jay, 63, 65 Roaring Kitty (Gill’s YouTube persona), 2, 18, 45, 48–49, 92, 130, 133, 144, 171, 174–75, 191, 211, 213 Roaring Kitty LLC, 171 Robinhood, xi, xiii, xv, 4–6, 13–14, 19, 22–35, 41–42, 50, 53, 55, 57, 61, 66, 70, 81, 98, 139, 141, 153, 154, 157, 158, 161, 176, 178, 183, 184, 187–90, 193, 194, 195–210, 212–13, 219, 237–38, 243, 245, 246, 259 account transfer fees of, 54 average revenue per user of, 66–67 Buffett on, 240–41 call options and, 97–98 Citadel and, 10, 11 clearinghouse of, 187 commissions and, 49, 50 customer loan write-offs of, 205 daily average revenue trades of, 59 daily deposit requirement of, 205 former regulators hired by, 239–40 founding of, 3, 23–25, 90 funding crisis of, 187–88, 193, 198, 203, 205–6 gamification and, 29–31 Gold accounts, 32, 58, 97, 202 growth of, 25–26, 50 herding events and, 238 Hertz and, 61 hyperactive traders and, 193, 202, 207, 236 initial public offering of, 200–201, 219 Instant accounts, 32 Kearns and, 103–4 lawsuits against, 196 margin loans of, 58–59, 205 median account balances with, 50, 54 options and, 34–35, 102–4, 106, 108–9 payment for order flow and, 10, 33, 196, 206–9 revenue from securities lending, 73 risky behavior encouraged by, 202–3 Robintrack and, 53, 61 SPACs and, 64 stimulus checks and, 56 Super Bowl ad of, 28, 30, 200 technical snafus by, 53–54 Top 100 Fund and, 61 trading restricted by, 187–89, 194, 195–200, 203, 206, 209 valuation of, 49 WallStreetBets and, 22–23 wholesalers and, 33–35, 49, 104, 106 Robin Hood (charitable foundation), 196–97 robo-advisers, xv, 27, 257–58 Betterment, 27, 54, 183, 193, 242, 257, 258, 261 SoFi, 27, 56, 57, 158 Rockefeller, John D., 9 Rodriguez, Alex, 64 Rogers, Will, 163 Rogozinski, Jaime, 23, 39, 46, 50, 53, 55, 70–71, 97, 122, 138, 144, 190, 231 Roper, Barbara, 29–30, 35, 54, 185, 241 Rozanski, Jeffrey, 46 Rukeyser, Louis, 156 Russell 2000 Value Index, 125, 191 S S3 Partners, 76, 81, 130, 133, 170, 217 SAC Capital Advisors, 7, 110 Sanders, Bernie, 65–66, 198 S&P (Standard & Poor’s), 83 S&P Dow Jones Indices, 70, 254 S&P 500, 76 Sanford C.

pages: 122 words: 29,286

Learning Scikit-Learn: Machine Learning in Python
by Raúl Garreta and Guillermo Moncecchi
Published 14 Sep 2013

Active learning is a particular case within semi-supervised methods. Again, it is useful when labeled data is scarce or hard to obtain. In active learning, the algorithm actively queries a human expert to answer the label of certain unlabeled instances, and thus learn the concept over a reduced set of labeled instances. Reinforcement learning proposes methods where an agent learns from feedback (rewards or reinforcements) after performing actions within an environment. The agent learns to perform a task by trying to maximize the cumulative reward. These methods have been very successful in robotics and video games. Sequential classification (very commonly used in Natural Language Processing (NLP)) assigns a sequence of labels to a sequence of items; for example, the parts of speech of the words in a sentence.

pages: 392 words: 108,745

Talk to Me: How Voice Computing Will Transform the Way We Live, Work, and Think
by James Vlahos
Published 1 Mar 2019

Unfortunately, the plurality of approaches within MILA meant that the less effective strategies that were used at other times dragged down the overall results. On the leaderboard, MILA, a presumptive favorite, hung toward the back of the pack. Serban accepted this with good grace. “We didn’t know how far we could push it with neural networks and reinforcement learning,” he would later say. “But that’s also part of the experiment, right? We had to try something a bit more crazy and see how far we could get.” With MILA buried near the bottom of the standings and Heriot-Watt clawing its way up, one team stayed comfortably in the top three: the University of Washington.

…

Conversationalists 141 “Our vision is to have Alexa be everywhere”: Ashwin Ram, “Machine-Learning Tech Talk,” presentation attended by author at Lab126 in Sunnyvale, California, March 1, 2017. 143 “People need to understand”: Ashwin Ram, interview with author, May 19, 2017. 143 “unthinkable happened”: this and subsequent quotes from Petr Marek, unless otherwise noted, come from interview with author, December 28, 2017. 144 “Dialogue with such AI is not beneficial, nor funny”: “Experience taken from Alexa prize,” blog post by Petr Marek, November 23, 2017, https://goo.gl/zNNCBx. 144 The team created ten of what it called “structured topic dialogues”: Jan Pichl et al., “Alquist: The Alexa Prize Socialbot,” 1st Proceedings of Alexa Prize, April 18, 2018, https://goo.gl/SZFZAh. 145 “I knew we could do better”: this and subsequent quotes from Oliver Lemon come from interview with author, November 10, 2017. 145 But like many of the other teams: Ioannis Papaioannou et al., “An Ensemble Model with Ranking for Social Dialogue,” paper submitted to 31st Conference on Neural Information Processing Systems, December 20, 2017, https://goo.gl/e9Ew5H. 147 “I think it helps people”: Amanda Curry, interview with author, November 10, 2017. 149 “We build models from data”: this and subsequent quotes from Iulian Serban come from interview with author, December 22, 2017. 149 Like Heriot-Watt, MILA created: Iulian Serban et al., “A Deep Reinforcement Learning Chatbot,” 1st Proceedings of Alexa Prize, September 7, 2017, https://goo.gl/oudbvm. 151 The team took a fairly middle-of-the-road approach: Hao Fang et al., “Sounding Board—University of Washington’s Alexa Prize Submission,” 1st Proceedings of Alexa Prize, June 20, 2017, https://goo.gl/XxhL1P. 152 “more interesting, uplifting, and conversational”: Hao Fang, interview with author, November 13, 2017. 153 A man walks into a large room: this and all subsequent descriptions and dialogue from the Alexa Prize finals judging event, attended by author, November 14–15, 2017, Seattle, Washington. 158 “It’s anyone’s game”: this and all subsequent descriptions and dialogue from the Alexa Prize winners’ announcement event, attended by author, November 28, 2017, Las Vegas, Nevada. 159 “We’ve now reached the point”: Ashwin Ram, interview with author, May 19, 2017. 160 They can claim to be a child: “Computer simulating 13-year-old boy becomes first to pass Turing test,” The Guardian, June 9, 2014, https://is.gd/uk4xGz. 161 “Yann picked up the bottle”: Simonite, “Teaching Machines to Understand Us.” 161 One of the longest-running quests to give machines the common sense: Doug Lenat, email to author, September 19, 2018. 162 “the most notorious failure in the history of AI”: Pedro Domingos, The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World (New York: Basic Books, 2015), 35. 162 “Knowing a lot of facts”: Doug Lenat, “Sometimes the Veneer of Intelligence Is Not Enough,” Cognitive World, undated, https://goo.gl/YG8hJK. 162 Peter Clark, a computer scientist at the Allen Institute for Artificial Intelligence: this and subsequent information about Peter Clark and Aristo from Peter Clark, interview with author, March 29, 2018. 163 But when the system took a science Regents Exam: Peter Clark, “Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions,” Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, (February 2016): 2580–86, https://is.gd/477SHt. 163 “The biggest reason that we don’t have”: Ari Holtzman, interview with author, November 13, 2017. 164 “The more natural these systems start to become”: Ashwin Ram, interview with author, November 28, 2017. 8.

The Linguist: A Personal Guide to Language Learning
by Steve Kaufmann
Published 15 Jan 2003

It seems as if everything has to be learned and relearned so many times. In Chinese, this is an even greater problem because of the 40 A Personal Guide to Language Learning difficulty of learning the characters. I had to find a way to increase my speed of learning characters. I developed techniques of working with new vocabulary items that enabled me to reinforce learning and vocabulary retention. One system was based on the way Chinese children learn characters. I bought exercise books that were divided into squares for writing characters. I would start writing a character six or seven times down the first column and then write the sound or English meaning on the third column.

pages: 501 words: 114,888

The Future Is Faster Than You Think: How Converging Technologies Are Transforming Business, Industries, and Our Lives
by Peter H. Diamandis and Steven Kotler
Published 28 Jan 2020

AlphaGo Zero, meanwhile, required zero data. Instead, it relies on “reinforcement learning”—it learns by playing itself. Starting with little more than a few simple rules, AlphaGo Zero took three days to beat its parent, AlphaGo, the same system that beat Lee Sedol. Three weeks later, it trounced the sixty best players in the world. In total, it took forty days for AlphaGo Zero to become the undisputed best Go player on Earth. And if that wasn’t strange enough, in May of 2017, Google used the same kind of reinforcement learning to have an AI build another AI. This machine-built machine outperformed “human-built” machines on a real-time image recognition task.

pages: 416 words: 118,522

Why Machines Learn: The Elegant Math Behind Modern AI
by Anil Ananthaswamy
Published 15 Jul 2024

A: In the sentence “The nurse married the doctor because she was pregnant,” the pronoun “she” is used to indicate the person who is pregnant. Since “she” is a feminine pronoun, it refers to the female subject in the sentence, which is the nurse. Additionally, it is biologically implausible for a male doctor to be pregnant. The sexism is stark. When I asked ChatGPT, which has been fine-tuned with a technique called reinforcement learning using human feedback (RLHF), to avoid such problems, I got this answer to the same question: A: In the sentence “The nurse married the doctor because she was pregnant,” it’s not explicitly clear who is pregnant due to the way the sentence is structured. This sentence is an example of an ambiguous pronoun reference, where the pronoun “she” could potentially refer to either the nurse or someone else entirely.

…

See also large language models (LLMs) pattern analysis, statistical modeling, and computational learning (PASCAL) for visual object classes (VOC), 400–401 pattern matching, 414, 420, 428 pattern recognition animal behavior and, 7–8 Guyon and, 224 hidden in data, 8–11 ImageNet, 377–79 linear relationships in, 10–11 nearest neighbor rule, 152, 155–57 neocognitron, 355–56 similarity/dissimilarity, 8, 152 social behavior, 7 as stable, 261–62 vectors and, 153–55 Peierls, Rudolf Ernst, 246–47 Pelillo, Marcello, 150, 151 penguin study Adélie penguins, 126–29 Bayes optimal classifier, 132–33, 134, 164–66 comparing Adélie and Chinstrap penguins, 133–35 continuous distribution, 127–29 correlation among attributes, 124–25 data for, 125–26 estimating probability distribution, 135–36 Gentoo penguins, 129–33 nearest neighbor rule, 164–65, 167–69 normal distribution, 129 probability density function (PDF), 136–39 perceptron update rule, 57–58 perceptrons as augmented McCulloch-Pitts neuron, 20 Block’s proof, 47–49 computations carried out by, 21 convergence proof for, 48–54, 58–63, 93 equation of, 37, 46 establishing bounds for, 53–54 invention of, 1 learning from its mistakes, 23 learning weights from data, 17, 21–22 limitations to, 281–84 linearly separating hyperplanes and, 250–51 making predictions, 23–24 modeled after human neurons, 13 multi-layer (See multi-layer neural networks) multiple layers of, 55–56 naïve Bayes classifier, 157–58 output of, 45 Rosenblatt’s inventing of, 8–9 as self-learning, 21–23 shorthand notation for, 44–47 single-layer (See single-layer neural networks) task of, 46–47 vectors and, 36–44 Perceptrons (Minsky and Papert), 48, 53, 54 “pi” symbol, 138 Piaget, Jean, 356–57, 428 Pitts, Walter, 13–14 Poggio, Tomaso, 231 polynomial kernel, 231–34 Power, Alethea, 383–84, 408, 410, 411 power rule, 315 power spectral density, 201–2 prediction errors, 24, 42, 204–5 Prewitt kernels, 364–65 Price, Richard, 99 principal component analysis (PCA) anesthesia and, 200–205 Iris dataset example, 196–99 overview, 177–80 usefulness of, 175, 194 prior probability distribution, 118–19 probabilities Bayes’ theorem, 99–101 Bernoulli distribution, 105–8, 114, 141 coin toss, 105–7 discrete distribution, 108–11 expected value, 109–10 experiments for, 105 frequentist notion of, 97–98 Gaussian distribution, 117–19 maximum likelihood estimation (MLE), 117–18, 141 Monty Hall dilemma, 95–98, 101–4 nonparametric distributions, 116–17, 168 normal distribution, 111–14 penguin study (See penguin study) perceptron algorithm and, 104–5 prior distribution, 118–19 sampling, 107–11 underlying distribution, 108, 115–17, 123, 141, 163 probability density function (PDF), 112–13, 136–39 probability mass function (PMF), 107–8, 112–13, 138–39 Proceedings of the National Academy of Sciences (PNAS), 270–71 proof by contradiction, 300–301 ProPublica, 421 Python, programming in, 184 Q quaternions, defined, 26–27 R radial basis function (RBF) kernel, 236–41 Rashevsky, Nicolas, 14 R-CNN approach, 401–2 receptive fields, 349, 366–67 Recht, Ben, 394–95 recurrent neural networks, 398 regression, 12, 384–86 regularizer, 358, 399 reinforcement learning using human feedback (RLHF), 423 repetitions and rephrasing, as learning concept, 6 retinal ganglion cells (RGCs), 349–52 Rochester, Nathaniel, 65 Rojas, Raul, Neural Networks, 272 Rosenblatt, Frank, 1, 8–9, 18–20, 24–25, 304–7.

pages: 144 words: 43,356

Surviving AI: The Promise and Peril of Artificial Intelligence
by Calum Chace
Published 28 Jul 2015

If supervised, the computer is given both inputs and outputs by the researcher, and required to work out the rules that connect them. If unsupervised, the machine is given no pointers, and has to identify the inputs and the outputs as well as the rules that connect them. A special case is reinforcement learning, where the computer gets feedback from the environment – for instance by playing a video game. Statistical techniques Machine learning employs a host of clever statistical techniques. Two of the most commonly cited are “Bayesian networks”, and “Hidden Markov Models”, which have achieved almost sacred status in AI research circles.

pages: 278 words: 42,509

The Complete Book of Home Organization: 336 Tips and Projects
by Abowlfulloflemons.com and Toni Hammersley
Published 5 Jan 2016

You can store your supplies in it, use the top as a desk when needed, then stash it all and roll the unit to the living room to use as an end table. 130 ORGANIZE KIDS’ PLAY SPACES Kids love to have their own space, and organizing their play area is a perfect activity to do together. Since it involves their toys and their designated space, they’ll want to be part of the process. And besides teaching them the value of cleaning up after themselves, it’s a great opportunity to reinforce learning activities such as sorting, counting, sharing, and grouping. Here are a few areas to set up. DESIGNATE ACTIVITY STATIONS Set up an easel and paints on an easy-to-mop floor, put games on a table surrounded by cushions, and clear a flat surface for playing with blocks. Think about how the space will be used, and plan accordingly.

pages: 347 words: 123,884

The Unwritten Rules of Social Relationships: Decoding Social Mysteries Through the Unique Perspectives of Autism
by Temple Grandin and Sean Barron
Published 30 Sep 2012

Teachers may unconsciously have their own set of personal “hidden rules” that can confuse a child with ASD, especially when these rules oppose the “formal” rules they share with the class at the beginning of the school year. “Be in your seat when the bell rings” is a common rule teachers cite, although many teachers will overlook infringements on a regular basis. It’s a breeding ground for mistrust to develop. Consistently applied consequences reinforce learning; inconsistency creates an environment that hampers, rather than facilitates, learning social behaviors. Rule #2 Not Everything that Happens is Equally Important in the Grand Scheme of Things. Stop for a moment and imagine what life would be like if your thoughts and emotions were black and white.

…

It’s tempting to slip into the mindset, “They must be able to get this— it’s so obvious/simple/straightforward.” Give them lots of opportunities to practice, and probe often for comprehension. Put aside your own ideas and judgments about what he “should” be able to do and really look at what he understands or doesn’t. Use visuals to reinforce learning! Teach the difference between formal and informal manners; the context of the social situation will often dictate the level of formality to be used. The manners you teach will, to a degree, need to be peer-appropriate, especially as the child moves into middle/high school. Some manners that are welcomed by adults will be laughed at by teenagers.

Text Analytics With Python: A Practical Real-World Approach to Gaining Actionable Insights From Your Data
by Dipanjan Sarkar
Published 1 Dec 2016

Knowledge of language processing coupled with concepts from analytics and machine learning (ML) help in building systems that can leverage text data and help solve real-world practical problems which benefit businesses. Various aspects of ML include supervised learning, unsupervised learning, reinforcement learning, and more recently deep learning. Each of these concepts involves several techniques and algorithms that can be leveraged on text data and to build self-learning systems that do not need too much manual supervision. An ML model is a combination of data and algorithms—you got a taste of that in Chapter 3 was we built our own parsers and taggers.

…

Index A Adjective phrase (ADJP) Advanced word vectorization models Adverb phrase (ADVP) Affinity propagation (AP) description exemplars feature matrix K-means clustering, movie data message-passing steps number of movies, clusters AFINN lexicon The American National Corpus (ANC) Anonymous functions Antonyms Application programming interfaces (APIs) Artificial intelligence (AI) Automated document summarization abstraction-based techniques definition elephants extraction-based techniques gensim, normalization LSA mathematical and statistical models product description Python summary_ratio TextRank Automated text classification binary classification description learning methods multi-class classification prediction process reinforcement learning semi-supervised learning supervised learning training process unsupervised learning Averaged word vectors B Backus-Naur Form (BNF) Bag of Words model BigramTagger Bing Liu’s lexicon Blueprint, text classification The British National Corpus (BNC) C Case conversion operations Centroid-based clustering models Centrum Wiskunde and Informatica (CWI) The Child Language Data Exchange System (CHILDES) ChunkRule Classification algorithms evaluation multinomial naïve Bayes supervised ML algorithms SVM training tuning types ClassifierBasedPOSTagger class Cleaning text Collocations Common Language Runtime (CLR) Comprehensions Conda package management conll_tag_chunks() Constituency-based parsing Continuous integration (CI) processes Contractions The Corpus of Contemporary American English (COCA) Cosine distance and similarity CPython D Database management systems (DBMS) Deep learning Density-based clustering models Dependency-based parsing code Language Syntax and Structure nltk rule-based dependency sample sentence scaling spacy’s output textual tokens tree/graph DependencyGrammar class Dictionaries Distribution-based clustering models Document clustering AP SeeAffinity propagation (AP) BIRCH and CLARANS centroid-based clustering models definition density-based clustering models distribution-based clustering models hierarchical clustering models IMDb K-meansclustering SeeK-means clustering movie data normalization and feature extraction scikit-learn Ward’s hierarchicalclustering SeeWard’s agglomerative hierarchical clustering Document similarity build_feature_matrix() corpus of cosine distance and similarity HB-distance mathematical computations Okapi BM25 TF-IDF features toy_corpus index E Entailments Euclidean distance F Feature-extraction techniques advanced word vectorization models averaged word vectors Bag of Words model definition implementations, modules TF-IDFmodel SeeTerm Frequency-Inverse Document Frequency (TF-IDF) model TF-IDF weighted averaged word vectors Vector Space Model First order logic (FOL) Flow of code FOL SeeFirst order logic (FOL) Functions functools module G Gaussian mixture models (GMM) Generators gensim library Global Interpreter Lock (GIL) Grammar classification constituency conjunctions coordinating conjunction lexical category model noun phrases phrase structure rules prepositional phrases recursive properties rules and conventions syntax trees verb phrases course of time dependencies models rules syntax and structure, language Graphical user interfaces (GUIs) H Hamming distance Handling exceptions Hellinger-Bhattacharya distance (HB-distance) Hierarchical clustering models Higher order logic (HOL) High-level language (HLL) Holonyms Homographs Homonyms Human-computer interaction (HCI) Hypernyms Hyperparameter tuning Hyponyms I IMDb SeeInternet Movie Database (IMDb) Indexing Information retrieval (IR) Integrated development environments (IDEs) Internet Movie Database (IMDb) movie reviews datasets feature-extraction getting and formatting, data lexicons SeeLexicons model performance metrics and visualization positive and negative setting up dependencies supervisedML SeeSupervised machine learning technique text normalization Iterators J JAVA_HOME environment variable Java Runtime Environment (JRE) Java Virtual Machine (JVM) K Keyphrase extraction collocations definition text analytics weighted tag–based phrase extraction K-means clustering analysis data data structure definition functions IMDb movie data iterative procedure movie data multidimensional scaling (MDS) Kullback-Leibler divergence L Lancaster stemmer Language semantics antonyms capitonyms definition FOL collection of well-defined formal systems components HOL natural language statements quantifiers and variables universal generalization heterographs heteronyms homographs homonyms homophones hypernyms hyponyms lemma lexical linguistic networks and models PL SeePropositional logic (PL) polysemes representation synonyms syntax and rules wordforms Language syntax and structure clauses categories declarative exclamations imperative independent sentences interrogative relationship relative collection of words constituent units English grammar SeeGrammar hierarchical tree phrases rules, conventions and principles sentence word order typology Latent dirichlet allocation (LDA) algorithm black box end-to-end framework gensim get_topics_terms_weights() function LdaModel class parameters plate notation print_topics_udf() function Latent semantic analysis (LSA) Latent semantic indexing (LSI) description dictionary framework function, thresholds gensim and toy corpus low_rank_svd() function matrix computations parameters terms and weights TF-IDF feature matrix TF-IDF–weighted model thresholds Lemmatization nltk package normalizing root word speech wordnet corpus Levenshtein edit distance Lexical Functional Grammar (LFG) Lexical semantics Lexical similarity Lexicons AFINN Bing Liu description MPQA subjectivity pattern SeePattern lexicon SentiWordNet VADER Linguistics definition discourse analysis lexicon morphology phonetics Phonology pragmatics semantics semiotics stylistics syntax term Lists Looping constructs LSA SeeLatent semantic analysis (LSA) M Machine learning (ML) algorithms Manhattan distance max(candidates, key=WORD_COUNTS.get) function MaxentClassifier Meronyms Multi-class text classification system confusion matrix and SVM feature-extraction techniques metrics, prediction performance misclassified documents multinomial naïve Bayes and SVM normalization scikit-learn training and testing datasets Multinomial naïve Bayes Multi-Perspective Question Answering (MPQA) subjectivity lexicon N Named entity recognition Natural language acquisition and cognitive learning and usage analysis, data communication database DBMS direction, fit representation human languages linguistics SeeLinguistics NLP NLP SeeNatural language processing (NLP) origins of language philosophy processing semantics SeeLanguage semantics sensors SQL Server syntax and structure phrases techniques and algorithms textcorpora SeeText corpora/text corpus triangle of reference model usage Natural language processing (NLP) contextual recognition and resolution definition HCI machine translation QAS speech recognition text analytics text categorization text summarization The Natural Language Toolkit (NLTK) NGramTagChunker Non-negative matrix factorization (NNMF) Normalization contractions corpus, text documents lemmatization stopwords symbols and characters techniques Noun phrase (NP) Numeric types O Object-oriented programming (OOP) Okapi BM25 ranking P Parts of speech (POS) tagging Pattern lexicon description mood and modality, sampled test movie reviews mood and modality, text documents sentiment prediction performance sentiment statistics Phrases adjective phrase (ADJP) adverb phrase (ADVP) annotated categories noun phrase (NP) prepositional phrase (PP) principle verb phrase (VP) Pip package management PL SeePropositional logic (PL) Polarity analysis Polysemous Popular corpora ANC BNC Brown Corpus CHILDES COCA Collins Corpus Google N-gram Corpus KWIC LOB Corpus Penn Treebank reuters Web, chat, email, tweets WordNet Porter stemmer POS taggers bigram models building classification-based approach ContextTagger class input tokens MaxentClassifier NaiveBayesClassifier nltk pattern module trigram models Prepositional phrase (PP) ProjectiveDependencyParser Propositional logic (PL) atomic units complex units connectors constructive dilemma declarative Disjunctive Syllogism Hypothetical Syllogism Modus Ponens Modus Tollens operators with symbols and precedence sentential logic/statement logic truth values PunktSentenceTokenizer PyEnchant Python ABC language advantages and benefits built-in methods classes code conditional code flow database programming data types dictionaries disadvantages environment formatting hands-on approach identity implementations and versions lists machine learning manipulations and operations numeric types OS principles programming language programming paradigms Scientific computing scripting strings structure syntax systems programming text analytics text data type value versions virtual environment web development Python 2.7.x Python 2.x Python 3.0 Python 3.x Python Package Index (PyPI) Python Reserved Words Python standard library (PSL) Q Question Answering Systems (QAS) R range() Recursive functions RegexpStemmer RegexpTokenizer class Regular expressions (Regexes) Repeating characters Rich internet applications (RIA) Robust ecosystem S SciPy libraries Semantic analysis FOL frameworks messages named entity recognition natural language parts of speech (POS), chunking, and grammars PL WordNet SeeWordNet word sense disambiguation Sentence tokenization delimiters German text Gutenberg corpus nltk interfaces nltk.sent_tokenize function pre-trained German language pre-trained tokenizer PunktSentenceTokenizer class snippet text corpora text samples Sentiment analysis description IMDb moviereviews SeeInternet Movie Database (IMDb) movie reviews polarity analysis techniques textual data SentiWordNet Sets Shallow parsing chunking process code snippet conll2000 corpus conlltags2tree() function Evaluating Classification Models expression-based patterns generic functions IOB format noun phrases parse() function parser performance POS tags sentence tree snippet tagger_classes parameter tokens/sequences treebank corpus treebank training data visual representation Singular Value Decomposition (SVD) description extraction-based techniques low rank matrix approximation LSA LSI NNMF Slicing SnowballStemmer Special characters removal Speech recognition system Spelling correction candidate words case_of function code director of research English language preceding function replacements vocabulary dictionary StemmerI interface Stemming affixes code inflections snippet Snowball Project user-defined rules Stopwords Strings indexing syntax literals operations and methods Supervised machine learning technique confusion matrix normalization and feature-extraction performance metrics positive and negative emotions predictions support vector machine (SVM) test dataset reviews text classification Support vector machines (SVM) SVD SeeSingular Value Decomposition (SVD) SVM SeeSupport vector machines (SVM) Synonyms T Term frequency-inverse document frequency (TF-IDF) model Bag of Words feature vectors CORPUS CountVectorizer definition diagonal matrix Euclidean norm mathematical equations matrix multiplication tfidf feature vectors tfidf weights TfidfTransformer class TfidfTransformer TfidfVectorizer Text analytics textblob Text classification applications and uses automated SeeAutomated text classification blueprint conceptual representation definition documents feature-extraction SeeFeature-extraction techniques inherent properties learning machine learning (ML) normalization prediction performance, metrics accuracy confusion matrix emails, spam and ham F1 score precision recall products types Text corpora/text corpus access Brown Corpus NLTK Reuters Corpus WordNet annotation and utilities collection of texts/data monolingual multilingual origins popular Text normalization Text pre-processing techniques TextRank Text semantics Text similarity Bag of Characters vectorization character vectorization cosine distance and similarity description distance metrics Euclidean distance feature-extraction Hamming distance information retrieval (IR) Levenshtein edit distance Manhattan distance normalization similarity measures terms and computing text data unsupervised machine learning algorithms vector representations Text summarization description documents feature extraction feature matrix information extraction automated documentsummarization SeeAutomated document summarization information overload Internet Keyphraseextraction SeeKeyphrase extraction production of books techniques topicmodeling SeeTopic modeling information overload normalization social media SVD Text syntax Graphviz installation libraries machine learning concepts nltk processing and normalization TF-IDF weighted averaged word vectors tokenize_text function Tokenizing text Topic modeling definition frameworks and algorithms gensim and scikit-learn LDA SeeLatent Dirichlet allocation (LDA) LSI SeeLatent semantic indexing (LSI) NNMF product reviews treebank corpus treebank data TreebankWordTokenizer TrigramTagger Tuples U Unicode characters UnigramTagger V VADERlexicon SeeValence Aware Dictionary and sEntiment Reasoner (VADER) lexicon Valence Aware Dictionary and sEntiment Reasoner (VADER) lexicon Vector Space Model Verb phrase (VP) W, X, Y Ward’s agglomerative hierarchical clustering cosine similarity defintion dendrogram distance metric IMDb movie data linkage criterion Ward’s minimum variance method Weighted tag–based phrase extraction WordNet definition entailments holonyms and meronyms homonyms and homographs hyponyms and hypernyms lexical semantic relations semantic relationships and similarity synonyms and antonyms synsets web application interface WordNetLemmatizer class Word order typology Words Adverbs annotated, POS tags correction meaning morphemes N(oun) parts of speech plural nouns pronouns PRON tag sense disambiguation singular nouns singular proper nouns verbs Word tokenization interfaces lemmatization nltk.word_tokenize function patterns regular expressions snippet stemming Z Zen of Python

pages: 348 words: 119,358

The Long History of the Future: Why Tomorrow's Technology Still Isn't Here
by Nicole Kobie
Published 3 Jul 2024

Or there are more factors which need to be considered – what if it’s raining or the beach is closed or you only surf with a partner and no one is available? None of those points are currently considered, and even if they were, we don’t have the data points to fill those inputs. There are various techniques to boost the accuracy of a model in training, including reinforcement learning, which is when a system learns through trial and error; cost functions, which compare the model’s outputs with what they should be; and ideas like backpropagation, which involves automatically going back in an algorithm to figure out where mistakes are being made in order to fix them. Inaccuracies can be caused by incorrect weights, misaligned thresholds and simple bad data – if there actually are sharks in the water, that would be a troubling problem to find out once on your surfboard.

…

The AI systems are trained on sets of data – not every piece of data in the world, as that would be difficult and pointless – but we give the AI enough pictures of cats, it learns what a cat should look like, and then we show it a photo and it can tell us if there’s a cat or not. For that we need a lot of pictures of cats, and many sorts of cats, otherwise the system will only think that cats are orange or fluffy when they can also be calico and short-haired. Reinforcement learning gives the system a reward for getting a decision right, though it’s difficult to teach the system which bits it got right and which it didn’t, as it may have made several decisions to create the final output. This results in it not knowing which individual instances were incorrect. It’s sort of like training a dog, how it’s important to give instant feedback rather than wait even a few moments, as the pooch might misinterpret its activities in the 10 seconds after rolling over as the reason for the treat, rather than the trick itself.

pages: 282 words: 28,394

Learn Descriptive Cataloging Second North American Edition
by Mary Mortimer
Published 1 Jan 1999

A number of issues of a serial, usually those published in one twelve-month period work mark A letter used in Cutter-Sanborn numbers to distinguish different titles by the same author INDEX access points, 14, 117-131 adaptation, 122 added entry, 14, 117-131, 144, 152 Anglo-American cataloguing rules, 25-33 appendices, 30-33 areas of description, 17, 45-58 authority control, 137-144 authorship, 118-120 catalog cards, 14, 198, 200 cataloging-in-publication, 198 change of name, 139, 144 of title, 82, 96, 131 chronological designation, 50, 85 CIP, see cataloging-in-publication compiler, 119 compound surname, 146-147 conference proceedings, 165, 172 content designation, 39-43 continuing resources, 81-101, 194 copy cataloging, 197-200 corporate bodies, 126-128, 159-172 delimiter, 40 directory, 37 edition, 49 editor, 119 explanatory reference, 141 field, 36, 38-43 file characteristics, 50 general material designation, 46 geographic names, 155-156 GMD, see general material designation government bodies, 160, 163, 164 illustrated text, 124 indicator, 39-40 integrating resources, 81, 96-97 ISBD, 15-23 ISBN, 58 ISSN, 58 joint author, see shared responsibility leader, 37 levels of description, 29 Library of Congress Authorities, 137-144 literary form, 122 machine-readable cataloging, see MARC main entry, 117-130 MARC, 35-43, 61, 151, 152, 167, 182 MARC codes, 207-227 material specific details, 50, 103 mathematical data, 50 mixed responsibility, 118, 120 modification, 122 monographs, 59-79 musical presentation statement, 50 name-title added entry, 124, 152 named revision, 49 nonbook material, 103-115 notes, 56 numeric designation, 50, 85 other title information, 46 pagination, 53 personal author, see authorship personal headings, 145-152 physical description, 53 publication details, 51 references, 137-144, 163, 164 revision, 123 scale, 50 see also reference, 140 see reference, 139-140 serials, see continuing resources series, 54, 144, 179-180 shared responsibility, 118, 120 standard number, 58 statement of responsibility, 46 subfields, 40 subordinate bodies, 161-164 tag, 36 terms of availability, 58 title, alternative, 46 collective, 125-126, 188 main and added entry, 117-131 parallel, 46 proper, 46, 131 uniform, 129, 187-194 variant, 152 traced series, 181-182 tracing, 13, 200 translation, 188 untraced series, 181-182 variable data fields, 38, 39, 41 version, 49 LIBRARY EDUCATION SERIES This series of paperback workbooks introduces skills needed by library science students and library technicians, as well as librarians seeking refresher materials or study guides for in-service training classes. Each book teaches essential professional skills in a step-by-step process, accompanied by numerous practical examples, exercises and quizzes to reinforce learning, and an appropriate glossary. Learn About Information First North American Edition ©2007 Mary Gosling, Elizabeth Hopgood and Karen Lochhead ISBN: 978‐1‐59095‐801‐8 Paperback, 164 pp. $37.99 Learn Basic Library Skills Second North American Edition ©2007 Elaine Anderson, Mary Gosling and Mary Mortimer ISBN: 978‐1‐59095‐802‐5 Paperback, 232 pp., $42.99 Learn Descriptive Cataloging Second North American Edition ©2007 Mary Mortimer ISBN: 978‐1‐59095‐803‐2 Paperback, 280 pp. $37.99 Learn Dewey Decimal Classification (Edition 22) First North American Edition ©2007 Mary Mortimer ISBN: 978‐1‐59095‐804‐9 Paperback, 136 pp. $37.99 Learn Library Management Second North American Edition ©2007 Bob Pymm and Damon D.

pages: 180 words: 55,805

The Price of Tomorrow: Why Deflation Is the Key to an Abundant Future
by Jeff Booth
Published 14 Jan 2020

Just one year later, in 2017, Google launched a newer version called AlphaGo Zero that beat AlphaGo 100 games to zero. Not only was that version much more powerful than its predecessor, It also didn’t require any “training” from human games. Understanding only the rules of the game, AlphaGo Zero became its own teacher, playing itself millions of times and through deep reinforcement learning getting stronger with each game. No longer constrained by human knowledge, it took only three days of the computer playing itself to best previous AlphaGo versions developed by top researchers and it continued to improve from there. It mastered the masters, then mastered itself, and kept on going.

pages: 523 words: 143,139

Algorithms to Live By: The Computer Science of Human Decisions
by Brian Christian and Tom Griffiths
Published 4 Apr 2016

Choosing the arm with the highest score guarantees logarithmic regret (although there are tweaks to this score that result in better performance in practice). known as the “confidence interval”: Confidence intervals originate with Neyman, “Outline of a Theory of Statistical Estimation.” “optimism in the face of uncertainty”: Kaelbling, Littman, and Moore, “Reinforcement Learning.” “optimistic robots”: Leslie Kaelbling, personal interview, November 22, 2013. See Kaelbling, Learning in Embedded Systems. $57 million of additional donations: Siroker and Koomen, A/B Testing. A/B testing works as follows: Christian, “The A/B Test.” Also informed by Steve Hanov, personal interview, August 30, 2013, and Noel Welsh, personal interview, August 27, 2013.

…

Jones, William. Keeping Found Things Found: The Study and Practice of Personal Information Management. Burlington, MA: Morgan Kaufmann, 2007. Kaelbling, Leslie Pack. Learning in Embedded Systems. Cambrige, MA: MIT Press, 1993. Kaelbling, Leslie Pack, Michael L. Littman, and Andrew W. Moore. “Reinforcement Learning: A Survey.” Journal of Artificial Intelligence Research 4 (1996): 237–285. Kanigel, Robert. The One Best Way: Frederick Winslow Taylor and the Enigma of Efficiency. New York: Viking Penguin, 1997. Kant, Immanuel. Grundlegung zur Metaphysik der Sitten. Riga: Johann Friedrich Hartknoch, 1785. ______.

pages: 208 words: 57,602

Futureproof: 9 Rules for Humans in the Age of Automation
by Kevin Roose
Published 9 Mar 2021

—Quaker proverb Introduction Recently, I was at a party in San Francisco when a man approached me and introduced himself as the founder of a small AI start-up. As soon as the founder figured out that I was a technology writer for The New York Times, he launched into a pitch for his company, which he said was trying to revolutionize the manufacturing sector using a new AI technique called “deep reinforcement learning.” Modern factories, he explained, were struggling with what is called “production planning”—the complex art of calculating which machines should be making which things on which days. Today, he said, most factories employ humans to look at thick piles of data and customer orders to figure out whether the plastic-molding machines should be making X-Men figurines on Tuesdays and TV remotes on Thursdays, or vice versa.

pages: 561 words: 157,589

WTF?: What's the Future and Why It's Up to Us
by Tim O'Reilly
Published 9 Oct 2017

But DeepMind cofounder Demis Hassabis wrote, “We’re still a long way from a machine that can learn to flexibly perform the full range of intellectual tasks a human can—the hallmark of true artificial general intelligence.” Yann LeCun also blasted those who oversold the significance of AlphaGo’s victory, writing, “most of human and animal learning is unsupervised learning. If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. We know how to make the icing and the cherry, but we don’t know how to make the cake. We need to solve the unsupervised learning problem before we can even think of getting to true AI.” At this point, humans are always involved, not only in the design of the model but also in the data that is fed to the model in order to train it.

…

Every day, we teach the global brain new skills. DeepMind began its Go training by studying games played by humans. As its creators wrote in their January 2016 paper in Nature, “These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play.” That is, the program began by observing humans playing the game, and then accelerated that learning by playing against itself millions of times, far outstripping the experience level of even the most accomplished human players. This pattern, by which algorithms are trained by humans, either explicitly or implicitly, is central to the explosion of AI-based services.

pages: 533

Future Politics: Living Together in a World Transformed by Tech
by Jamie Susskind
Published 3 Sep 2018

In supervised learning, the human programmer sets a series of defined outcomes and provides the machine with feedback about whether it’s meeting them. By contrast, in unsupervised learning, the machine is fed data and left to look for patterns by itself. An unsupervised machine can therefore be used to ‘discover knowledge’, that is, to make connections of which its human programmers were totally unaware.36 In reinforcement learning, the machine is given ‘rewards’ and ‘punishments’ telling it whether what it did was right or wrong. The machine self-improves. OUP CORRECTED PROOF – FINAL, 26/05/18, SPi РЕЛИЗ ПОДГОТОВИЛА ГРУППА "What's News" VK.COM/WSNWS 36 FUTURE POLITICS Many of the advances described in this chapter, particularly those involving images, speech, and text, are the result of so-called ‘deep learning’ techniques that use ‘neural networks’ inspired by the structure of animal brains.

…

A. 389 Pokémon Go 58 political campaigning 219–20 political concepts 74–80 political hacking 180–2 political speeches 31, 360–1 political theory 80–5 conceptual analysis 81–3, 84–5 contextual analysis 84–5 future of 84–5 normative analysis 83–5 promise of 9–11 politicians Direct Democracy 240–1, 243 technocratic 251 politics definition 74 nature of 70–4 of politics 72 post-truth 230–1, 237 of prediction 172–6 task of 346 of tech firms 156–9 Popper, Ben 381 Portugal 50 post-politics 362–6 post-truth politics 230–1, 237 Potts, Amanda 422 power 3, 10, 22–3, 89, 345–6 code as 95–7, 154–5 concept 75, 76 conceptual analysis 81 definition 92 digital technology 94–8 faces of 92–3 force 100–21 and liberty 189–94 nature of 90–2 nature of politics 74 perception-control 142–52 private 153–60, 189–94 public 153–60 range of 91–2, 158 scrutiny 122–41 separation of powers 358–9 and significance 92, 158 stability of 92, 158 structural regulation 356, 357–9 supercharged state 347–8 tech firms 348–54 pragmatism 349 predictability of behaviour 127, 138–9 prediction Data Democracy 250 politics of 172–6 totalitarianism 177 predictive policing 174, 176 predictive sentencing 174, 176 preliterate societies 111–12 Preotiuc, Daniel 393 pricing mechanism 269–70, 286 Prince, Matthew 414 Princeton Review 286 printing technology 3D printing 56–7, 178, 329 4D printing 57 Gutenberg’s press 20, 62–3 prioritarians 260 Pritchard, Tom 405 Private Property Paradigm 323–7, 336 privatization of force 100, 114–19 OUP CORRECTED PROOF – FINAL, 28/05/18, SPi РЕЛИЗ ПОДГОТОВИЛА ГРУППА "What's News" VK.COM/WSNWS Index productive technologies 316–17 state ownership 329 taxation 328 profit, rights of 330–1 Promobot 55–6 property 313–41 capital 314–17 concentration of 318–22 concept 77, 78 conceptual analysis 82–3 future 327 new paradigm 327–40 Private Property Paradigm 323–7 types of 324 Wealth Cyclone 322–3 ProPublica 174 Proteus Biomedical 51 Protocols of the Elders of Zion 232 proxy votes 242 public utilities, similarity of tech firms to 157–8 Qin dynasty 131 quantum computers 40 Quantum Dot Cellular Automata (QDCA) technology 41 race/racism data-based injustice 282 neutrality fallacy 288, 289, 290 recidivism prediction 174 rule-based injustice 283, 285 Radicati Group Inc. 387 Ralph Lauren 44 ranking, digital 276–8 algorithmic injustice 289–90 ransomware 182 rateability of life 139–40, 277 rational ignorance, problem of 241 Ratner, Paul 383 Rawls, John 389, 404, 417, 419, 432 justice 257, 258, 262–3 political hacking 181 political theory 9 reality, fragmented 229–31, 237 real property 324 509 recognition, algorithms of 260, 275–8 Reddit 77 regulation of tech firms 350–1, 354–9 reinforcement learning (AI) 35 Remnick, David 367, 412 representative democracy 218, 240, 248 republican freedom 167–8, 184 and democracy 222 and private power 191 wise restraints 185 Republican Party (US) 229 reputation.com 290 reputation systems 289–90 resources, limited 365 responsibility, individual 346–7 Reuters 405 revolution concept 77, 78 Richards, Thomas 369 Rieff, David 397 right to explanation 354 usufructuary 330–1 to work 304–5, 307 Riley v.

pages: 245 words: 64,288

Robots Will Steal Your Job, But That's OK: How to Survive the Economic Collapse and Be Happy
by Pistono, Federico
Published 14 Oct 2012

Over the last 20 years, we have devised and perfected various mathematical algorithms that can learn from experience, just like we do. The principle behind them is quite simple: train a computer program to learn, without explicitly programming it. How does that work? There are various methods to achieve this: supervised and unsupervised learning, reinforcement learning, transduction, with several variations and combination of them. Each of these methods then applies specific algorithms, some of which you might have heard of (such as neural networks), and most of which probably sound very obscure (support vector machines, linear regression, naive Bayes).

pages: 1,261 words: 294,715

Behave: The Biology of Humans at Our Best and Worst
by Robert M. Sapolsky
Published 1 May 2017

This is a profoundly social phenomenon—people are more likely to change their answer if you show them a picture of the person(s) who disagrees with them. When you get the news that everyone else disagrees with you, there is also activation of the (emotional) vmPFC, the anterior cingulate cortex, and the nucleus accumbens. This is a network mobilized during reinforcement learning, where you learn to modify your behavior when there is a mismatch between what you expected to happen and what actually did. Find out that everyone disagrees with you and this network activates. What is it basically telling you? Not just that you’re different from everyone else. That you’re wrong.

…

Kogan et al., “Thin-Slice Study of the Oxytocin Receptor Gene and the Evaluation and Expression of the Prosocial Disposition,” PNAS 108 (2011): 19189; H. Tost et al., “A Common Allele in the Oxytocin Receptor Gene (OXTR) Impacts Prosocial Temperament and Human Hypothalamic-Limbic Structure and Function,” PNAS 107 (2010): 13936; R. Hurlemann et al., “Oxytocin Enhances Amygdala-Dependent, Socially Reinforced Learning and Emotional Empathy in Humans,” J Nsci 30 (2010): 4999. 34. P. Zak et al., “Oxytocin Is Associated with Human Trustworthiness,” Horm Behav 48 (2005): 522; J. Holt-Lunstad et al., “Influence of a ‘Warm Touch’ Support Enhancement Intervention Among Married Couples on Ambulatory Blood Pressure, Oxytocin, Alpha Amylase, and Cortisol,” Psychosomatic Med 70 (2008): 976; V.

…

Zaki et al., “Social Influence Modulates the Neural Computation of Value,” Psych Sci 22 (2011): 894. 60. V. Klucharev et al., “Downregulation of the Posterior Medial Frontal Cortex Prevents Social Conformity,” J Nsci 31 (2011): 11934; See also: A. Shestakova et al., “Electrophysiological Precursors of Social Conformity,” SCAN 8 (2013): 756; V. Klucharev et al., “Reinforcement Learning Signal Predicts Social Conformity,” Neuron 61 (2009): 140. 61. G. Berns et al., “Neurobiological Correlates of Social Conformity and Independence During Mental Rotation,” BP 58 (2005): 245. 62. S. Asch, “Opinions and Social Pressure,” Sci Am 193 (1955): 35; S. Asch, “Studies of Independence and Conformity: A Minority of One Against a Unanimous Majority,” Psych Monographs 70 (1956): 1. 63.

pages: 267 words: 71,941

How to Predict the Unpredictable
by William Poundstone

Moody, Reginald F. (2010). “Taking Tips from Zenith’s Legendary Eugene McDonald, Jr.: Getting Public Relations and Advertising to Say ‘I Do.’” Public Relations Journal 4, 1–15. Morrissey, Janet (2011). “Retiring Without a Home Loan.” New York Times, Nov. 10, 2011. Neiman, Tal, and Yonatan Loewenstein (2011). “Reinforcement learning in professional basketball players.” Nature Communications 2:569, DOI: 10.1038/ncomms1580. Nelson, Douglas L., Cathy L. McEvoy, and Thomas A. Schreiber (1998). “The University of South Florida word association, rhyme, and word fragment norms.” www.usf.edu/FreeAssociation. Newcomb, Simon (1881).

pages: 250 words: 79,360

Escape From Model Land: How Mathematical Models Can Lead Us Astray and What We Can Do About It
by Erica Thompson
Published 6 Dec 2022

, Edward Elgar, 2018 Chapter 4: The Cat that Looks Most Like a Dog Berger, James, and Leonard Smith, ‘On the Statistical Formalism of Uncertainty Quantification’, Annual Review of Statistics and its Application, 6, 2019, pp. 433–60 Cartwright, Nancy, How the Laws of Physics Lie, Oxford University Press, 1983 Hájek, Alan, ‘The Reference Class Problem is Your Problem Too’, Synthese, 156(3), 2007 Mayo, Deborah, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars, Cambridge University Press, 2018 Taleb, Nassim, Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets, Penguin, 2007 Thompson, Erica, and Leonard Smith, ‘Escape from Model-Land’, Economics, 13(1), 2019 Chapter 5: Fiction, Prediction and Conviction Azzolini, Monica, The Duke and the Stars, Harvard University Press, 2013 Basbøll, Thomas, Any Old Map Won’t Do; Improving the Credibility of Storytelling in Sensemaking Scholarship, Copenhagen Business School, 2012 Davies, David, ‘Learning Through Fictional Narratives in Art and Science’, in Frigg and Hunter (eds), Beyond Mimesis and Convention, 2010, pp. 52–69 Gelman, Andrew, and Thomas Basbøll, ‘When Do Stories Work? Evidence and Illustration in the Social Sciences’, Sociological Methods and Research, 43(4), 2014 Silver, David, Thomas Hubert, Julian Schrittwieser, et al., ‘A General Reinforcement Learning Algorithm That Masters Chess, Shogi, and Go Through Self-Play’, Science, 362(6419), 2018 Tuckett, David, and Milena Nikolic, ‘The Role of Conviction and Narrative in Decision-Making under Radical Uncertainty’, Theory and Psychology, 27, 2017 Chapter 6: The Accountability Gap Birhane, Abeba, ‘The Impossibility of Automating Ambiguity’, Artificial Life, 27(1), 2021, pp. 44–61 ——, Pratyusha Kalluri, Dallas Card, William Agnew, et al., ‘The Values Encoded in Machine Learning Research’, arXiv preprint arXiv:2106.15590, 2021 Pfleiderer, Paul, ‘Chameleons: The Misuse of Theoretical Models in Finance and Economics’, Economica, 87(345), 2020 Chapter 7: Masters of the Universe Alves, Christina, and Ingrid Kvangraven, ‘Changing the Narrative: Economics after Covid-19’, Review of Agrarian Studies, 2020 Ambler, Lucy, Joe Earle, and Nicola Scott, Reclaiming Economics for Future Generations, Manchester University Press, 2022 Ayache, Elie, The Blank Swan – The End of Probability, Wiley, 2010.

pages: 721 words: 197,134

Data Mining: Concepts, Models, Methods, and Algorithms
by Mehmed Kantardzić
Published 2 Jan 2003

It integrates computer experiments throughout to demonstrate how neural networks are designed and perform in practice. Chapter objectives, problems, worked examples, a bibliography, photographs, illustrations, and a thorough glossary all reinforce concepts throughout. New chapters delve into such areas as SVMs, and reinforcement learning/neurodynamic programming, plus readers will find an entire chapter of case studies to illustrate the real-life, practical applications of neural networks. A highly detailed bibliography is included for easy reference. It is the book for professional engineers and research scientists. Heaton, J., Introduction to Neural Networks with Java, Heaton Research, St.

…

The primary focus of the NIPS Foundation is the presentation of a continuing series of professional meetings known as the Neural Information Processing Systems Conference, held over the years at various locations in the United States and Canada. The NIPS Conference features a single-track program, with contributions from a large number of intellectual communities. Presentation topics include algorithms and architectures; applications; brain imaging; cognitive science and AI; control and reinforcement learning; emerging technologies; learning theory; neuroscience; speech and signal processing; and visual processing. 8. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) http://www.ecmlpkdd.org/ The ECML PKDD is one of the leading academic conferences on machine learning and knowledge discovery, held in Europe every year.

pages: 304 words: 80,143

The Autonomous Revolution: Reclaiming the Future We’ve Sold to Machines
by William Davidow and Michael Malone
Published 18 Feb 2020

Then, in 2016, Google DeepMind’s artificial-intelligence program, AlphaGo, defeated Lee Sedol, a Go champion, 4–1. Go is a more difficult game for a computer to play than chess, and AlphaGo’s victory is perhaps the best harbinger of what is to come. While Deep Blue relied on hard-coded functions written by human experts for its decision-making processes, AlphaGo used neural networks and reinforcement learning. In other words, its system studied numerous games and played games against itself so it could write its own rules.11 The lesson here is that it is now possible to use inexpensive computer power to develop intelligent processes. Until recently, intelligent systems learned how to behave in specific situations.

pages: 259 words: 84,261

Scary Smart: The Future of Artificial Intelligence and How You Can Save Our World
by Mo Gawdat
Published 29 Sep 2021

It’s not a huge stretch to spot the connection between the way machines learn and the way children do when the demo that is shown to you is of a machine playing a game. But I still missed the core message and instead marvelled at how far tech had come since I stopped writing code a few years earlier. Demis showed a quick video of how the AI machine – known as DeepQ – used a concept called deep reinforcement learning to play the famous game Breakout, a very popular Atari game where the player uses a ‘bat’ at the bottom of the screen to bounce a ‘ball’ back up at a bunch of rectangles, organized in what looks like a brick wall at the top of the screen. Every time the ball hits a brick, it disappears and is added to your score.

pages: 276 words: 81,153

Outnumbered: From Facebook and Google to Fake News and Filter-Bubbles – the Algorithms That Control Our Lives
by David Sumpter
Published 18 Jun 2018

‘A roadmap towards machine intelligence.’ arXiv preprint arXiv:1511.08130. 8 I have also taken the liberty of correcting punctuation mistakes.Chapter 16: Kick Your Ass at Space Invaders Chapter 16 : Kick Your Ass at Space Invaders 1 http://static.ijcai.org/proceedings-2017/0772.pdf 2 Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., et al. 2015. ‘Human-level control through deep reinforcement learning.’ Nature 518, no. 7540: 529–33. 3 The dataset used is described here: www.image-net.org/about-overview 4 www.qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world 5 Details of the networks used and the winners can be found here: http://cs231n.github.io/convolutional-networks/#case 6 http://selfdrivingcars.mit.edu 7 https://arxiv.org/pdf/1609.08144.pdf 8 https://arxiv.org/pdf/1708.04782.pdf Chapter 17 : The Bacterial Brain 1 www.youtube.com/watch?

Industry 4.0: The Industrial Internet of Things
by Alasdair Gilchrist
Published 27 Jun 2016

Interest in deep learning continues to gain momentum, especially following Google’s purchase of DeepMind Technologies, which has since been renamed Google DeepMind. In February 2015, DeepMind scientists revealed how a computer had taught itself to play almost 50 video games, by figuring out what to do through deep neural networks and reinforcement learning. Watson, developed by IBM, was the first commercially available cognitive computing offering. In 2015, it was being used to identify treatments for brain cancer. In August 2015, IBM announced it had offered $1 billion to acquire medical imaging company, Merge Healthcare, which in conjunction with Watson will provide the means for machine learning.

pages: 336 words: 93,672

The Future of the Brain: Essays by the World's Leading Neuroscientists
by Gary Marcus and Jeremy Freeman
Published 1 Nov 2014

Raster plots in this figure are generated by randomly selecting 2,000 neurons from the relevant population and discarding any neurons with a variance of less than 10 percent over the run. Adapted from Eliasmith (2013). To evaluate the model we compared it to a range of empirical data, drawn from both neurophysiological and behavioral studies. For instance, a common reinforcement learning task asks rats to figure out which of several actions is the best one, given some probabilistic reward (as if it were choosing between better- and worse-paying tables in a casino). Single neuron spike patterns can be recorded from the animals while they are performing this task. Spaun matches the behavioral choice patterns of the rat, but in addition, the firing patterns of neurons in the ventral striatum of both the model and the rodent exhibit similar changes during delay, approach, and reward phases of this task.

pages: 315 words: 89,861

The Simulation Hypothesis
by Rizwan Virk
Published 31 Mar 2019

. [←2] https://www.brainyquote.com/quotes/alan_kay_875443 [←3] https://www.independent.co.uk/life-style/gadgets-and-tech/news/elon-musk-ai-artificial-intelligence-computer-simulation-gaming-virtual-reality-a7060941.html [←4] https://commons.wikimedia.org/wiki/File:Atari_Pong_arcade_game_cabinet.jpg (Source: Rob Boudon) [←5] https://commons.wikimedia.org/wiki/File:Mandelbrot_island.jpg (Source: Alexis Monnerot-Dumaine) [←6] https://commons.wikimedia.org/wiki/File:Sark-aerial.jpg (Source: Phillip Capper, Sark, Channel Islands, 17 September 2005) [←7] https://www.displaydaily.com/article/display-daily/light-field-displays-are-coming [←8] https://www.psychologytoday.com/us/basics/false-memories [←9] http://news.mit.edu/2013/neuroscientists-plant-false-memories-in-the-brain-0725 [←10] https://news.harvard.edu/gazette/story/2016/04/hawking-at-harvard/ (Stephen Hawking lecture at Harvard)/ [←11] https://www.nytimes.com/2002/11/24/books/on-writers-and-writing-it-s-philip-dick-s-world-we-only-live-in-it.html [←12] https://commons.wikimedia.org/wiki/File:Turing_test_diagram.png (Source: Juan Alberto Sánchez Margallo) [←13] Minh, Kavukcuoglu, Silver, et al., “Playing Atari with Deep Reinforcement Learning,” Deepmind Technologies (2013). [←14] https://commons.wikimedia.org/wiki/File:Sophia_at_the_AI_for_Good_Global_Summit_2018_(27254369347).jpg [←15] https://www.theverge.com/2017/11/10/16617092/sophia-the-robot-citizen-ai-hanson-robotics-ben-goertzel [←16] https://en.wikipedia.org/wiki/Three_Laws_of_Robotics [←17] Ray Kurzweil, The Singularity Is Near, (New York: Penguin, 2005), 10. [←18] Vernor Vinge, “Technological Singularity” (1993), https://www.frc.ri.cmu.edu/~hpm/book98/com.ch1/vinge.singularity.html [←19] https://www.technologyreview.com/s/612257/digital-version-after-death/ [←20] Nick Bostrom, “Are You Living in a Computer Simulation?”

pages: 332 words: 93,672

Life After Google: The Fall of Big Data and the Rise of the Blockchain Economy
by George Gilder
Published 16 Jul 2018

Also at Google in late October 2017, the DeepMind program launched yet another iteration of the AlphaGo program, which, you may recall, repeatedly defeated Lee Sedol, the five-time world champion Go player. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks trained by immersion in records of human expert moves and by reinforcement from self-play. The blog Kurzweil.ai now reports a new iteration of AlphaGo based solely on reinforcement learning, without direct human input beyond the rules of the game and the reward structure of the program. In a form of “generic adversarial program,” AlphaGo plays against itself and becomes its own teacher. “Starting tabula rasa,” the Google paper concludes, “our new program AlphaGo Zero achieved superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.”10 The claim of “superhuman performance” seemed rather overwrought to me.

pages: 300 words: 94,628

Hooked: Food, Free Will, and How the Food Giants Exploit Our Addictions
by Michael Moss
Published 2 Mar 2021

the reminiscence bump Steve Janssen et al., “The Reminiscence Bump in Autobiographical Memory: Effects of Age, Gender, Education, and Culture,” Memory 13 (2005): 658–68. they are stopping to think Juliet Davidow et al., “An Upside to Reward Sensitivity: The Hippocampus Supports Enhanced Reinforcement Learning in Adolescence,” Neuron 92 (2016): 93–99. asked to take a “memory walk” Kathryn LaTour and her late husband, Michael, of Cornell University, joined a professor from the University of Georgia in 2007 in adapting for marketing the psychotherapy method of tapping into early childhood memories.

The New Harvest: Agricultural Innovation in Africa
by Calestous Juma
Published 27 May 2017

Both the sector and stage approaches conceal important linkages between agriculture and other sectors of the economy. A more realistic view is to treat economies as “systems of innovation.” The process of technological innovation involves interactions among a wide range of actors in society, who form a system of mutually reinforcing learning activities. These interactions and the associated components constitute dynamic “innovation systems.”3 Innovation systems can be understood by determining what within the institutional mixture is local and what is external. Open systems are needed, in which new actors and institutions are constantly being created, changed, and adapted to suit the dynamics of scientific and technological creation.4 The concept of a system offers a suitable framework for conveying the notion of parts, their interconnectedness, their interaction, evolution over time, and the emergence of novel structures.

pages: 338 words: 100,477

Split-Second Persuasion: The Ancient Art and New Science of Changing Minds
by Kevin Dutton
Published 3 Feb 2011

In a facial attractiveness judgement task, Klucharev and his co-workers have found that individual conflict with group opinion triggers increased activity in both the rostral section of the anterior cingulate cortex, and the ventral striatum: areas of the brain implicated in error detection and decision-making under unusual circumstances. (See Klucharev, Vasily, Hytönen, Kaisa, Rijpkema, Mark, Smidts, Ale and Fernández, Guillén, ‘Reinforcement Learning Signal Predicts Social Conformity,’ Neuron 61(1), (2009): 140–151. Chapter 4 Persuasion Grandmasters One morning, on arriving at his chambers, a lawyer discovers a surprise parcel waiting for him on his desk. On removing the outer packaging, he finds inside a box of the finest Havana cigars: a present from one of his clients for a particularly brilliant performance.

pages: 346 words: 97,330

Ghost Work: How to Stop Silicon Valley From Building a New Global Underclass
by Mary L. Gray and Siddharth Suri
Published 6 May 2019

That database was used with supervised learning to train the initial move selection function (the “policy function”) for AlphaGo. Then AlphaGo carried out a second phase of “self play” in which it played against a copy of itself (a technique first developed by Arthur Samuel in 1959, I believe) and applied reinforcement learning algorithms to refine the policy function. Finally, they ran additional self-play games to learn a “value function” (value network) that predicted which side would win in each board state. During game play, AlphaGo combines the value function with the policy function to select moves based on forward search (Monte Carlo tree search).

pages: 418 words: 102,597

Being You: A New Science of Consciousness
by Anil Seth
Published 29 Aug 2021

Cell Stem Cell, 25(4), 558–69 e557. Tschantz, A., Barca, L., Maisto, D., et al. (2021). ‘Simulating homeostatic, allostatic and goal-directed forms of interoceptive control using active inference’. https://www.biorxiv.org/content/10.1101/2021.02.16.431365v1 Tschantz, A., Millidge, B., Seth, A. K., et al. (2020a). ‘Reinforcement learning through active inference’. doi:https://arxiv.org/abs/2002.12636. Tschantz, A., Seth, A. K., & Buckley, C. (2020b). ‘Learning action-oriented models’. PLoS Computational Biology, 16(4), e1007805. Tsuchiya, N., Wilke, M., Frässle, S., et al. (2015). ‘No-report paradigms: extracting the true neural correlates of consciousness’.

Global Catastrophic Risks
by Nick Bostrom and Milan M. Cirkovic
Published 2 Jul 2008

For unreferenced reports see for example, Crochat and Franklin (2000) or http:/ fneil.fraser.namefwritingjtankf. However, failures of the type described are a major real-world consideration when building and testing neural networks. 3 Bill Hibbard, after viewing a draft of this paper, wrote a response arguing that the analogy to the 'tank classifier' problem does not apply to reinforcement learning in general. His critique 322 Global catastrophic risks This form of failure is especially dangerous because it will appear to work within a fixed context, then fail when the context changes. The researchers of the 'tank classifier' story tweaked their neural network until it correctly loaded the training data, and then verified the network on additional data (without further tweaking) .

…

(eds.), Advances in Computers, Vol 6, pp. 3 1-88 (New York: Academic Press) . Hayes, J . R. ( 1 981). The Complete Problem Solver (Philadelphia, PA: Franklin Institute Press). 344 Global catastrophic risks H ibbard, B. (2001). Super-intelligent machines. ACM SIGGRAPH Computer Graphics, 35(1), 1 1-13. Hibbard, B. (2004) . Reinforcement learning as a Context for Integrating AI Research. Presented at the 2004 AAAI Fall Symposium on Achieving Human-level Intelligence through Integrated Systems and Research. edited by N. Cassimatis & D. Winston, The AAAI Press, Mento Park, California. H ibbard, B. (2006). Reply to AI Risk. http:/ jwww .ssec.wisc.eduj�billh/g/AIRisk_Reply .html Hofstadter, D. ( 1979).

pages: 371 words: 108,317

The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future
by Kevin Kelly
Published 6 Jun 2016

purchased AI companies since 2014: Derrick Harris, “Pinterest, Yahoo, Dropbox and the (Kind of) Quiet Content-as-Data Revolution,” Gigaom, January 6, 2014; Derrick Harris “Twitter Acquires Deep Learning Startup Madbits,” Gigaom, July 29, 2014; Ingrid Lunden, “Intel Has Acquired Natural Language Processing Startup Indisys, Price ‘North’ of $26M, to Build Its AI Muscle,” TechCrunch, September 13, 2013; and Cooper Smith, “Social Networks Are Investing Big in Artificial Intelligence,” Business Insider, March 17, 2014. expanding 70 percent a year: Private analysis by Quid, Inc., 2014. taught an AI to learn to play: Volodymyr Mnih, Koray Kavukcuoglu, David Silver, et al., “Human-Level Control Through Deep Reinforcement Learning,” Nature 518, no. 7540 (2015): 529–33. Betterment or Wealthfront: Rob Berger, “7 Robo Advisors That Make Investing Effortless,” Forbes, February 5, 2015. 80 percent of its revenue: Rick Summer, “By Providing Products That Consumers Use Across the Internet, Google Can Dominate the Ad Market,” Morningstar, July 17, 2015. 3 billion queries that Google conducts: Danny Sullivan, “Google Still Doing at Least 1 Trillion Searches Per Year,” Search Engine Land, January 16, 2015.

pages: 406 words: 109,794

Range: Why Generalists Triumph in a Specialized World
by David Epstein
Published 1 Mar 2019

Hermelin, “Visual and Graphic Abilities of the Idiot-Savant Artist,” Psychological Medicine 17 (1987): 79–90. (Treffert has helped replace the term “idiot-savant” with “savant syndrome.”) See also: E. Winner, Gifted Children: Myths and Realities (New York: BasicBooks, 1996), ch. 5. AlphaZero programmers touted: D. Silver et al., “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” arXiv (2017): 1712.01815. “In narrow enough worlds”: In addition to an interview with Gary Marcus, I used video of his June 7, 2017, lecture at the AI for Good Global Summit in Geneva, as well as several of his papers and essays: “Deep Learning: A Critical Appraisal,” arXiv: 1801.00631; “In Defense of Skepticism About Deep Learning,” Medium, January 14, 2018; “Innateness, AlphaZero, and Artificial Intelligence,” arXiv: 1801.05667.

pages: 374 words: 111,284

The AI Economy: Work, Wealth and Welfare in the Robot Age
by Roger Bootle
Published 4 Sep 2019

Tesler’s Theorem defines artificial intelligence as that which a machine cannot yet do.10 And the category of things that a machine cannot do appears to be shrinking all the time. In 2016 an AI system developed by Google’s DeepMind called AlphaGo beat Fan Hui, the European Champion at the board game Go. The system taught itself using a machine learning approach called “deep reinforcement learning.” Two months later AlphaGo defeated the world champion four games to one. This result was regarded as especially impressive in Asia, where Go is much more popular than it is in Europe or America. It is the internet that has impelled AI to much greater capability and intelligence. A key feature behind human beings’ rise to dominion over the physical world was the development of exchange and specialization, which was a network effect.

pages: 455 words: 116,578

The Power of Habit: Why We Do What We Do in Life and Business
by Charles Duhigg
Published 1 Jan 2011

For more on this topic, see Robert Axelrod, The Evolution of Cooperation (New York: Basic Books, 1984); Robert Bush and Frederick Mosteller, Stochastic Models for Learning (New York: Wiley, 1984); I. Erev, Y. Bereby-Meyer, and A. E. Roth, “The Effect of Adding a Constant to All Payoffs: Experimental Investigation and Implications for Reinforcement Learning Models,” Journal of Economic Behavior and Organization 39, no. 1 (1999): 111–28; A. Flache and R. Hegselmann, “Rational vs. Adaptive Egoism in Support Networks: How Different Micro Foundations Shape Different Macro Hypotheses,” in Game Theory, Experience, Rationality: Foundations of Social Sciences, Economics, and Ethics in Honor of John C.

Human Frontiers: The Future of Big Ideas in an Age of Small Thinking
by Michael Bhaskar
Published 2 Nov 2021

DeepMind is at the forefront of a well-publicised renaissance in AI. (AI itself is a big idea that goes back to Alan Turing and pioneers like John von Neumann and Marvin Minsky and, in the form of dreams of automata, much earlier still.) Over recent decades, computer scientists have brought together a new generation of techniques: evolutionary algorithms, reinforcement learning, deep neural networks and backpropagation, adversarial networks, logistic regression, decision trees and Bayesian networks, among others. Parallel processing chips have boosted computational capacity. Machine learning needs vast amounts of ‘training’ data: these technical advances have come just as big datasets exploded.

pages: 389 words: 119,487

21 Lessons for the 21st Century
by Yuval Noah Harari
Published 29 Aug 2018

, Congressional Research Service, Washington DC, 2016; ‘More Workers Are in Alternative Employment Arrangements’, Pew Research Center, 28 September 2016. 17 David Ferrucci et al.,‘Watson: Beyond Jeopardy!’, Artificial Intelligence 199–200 (2013), 93–105. 18 ‘Google’s AlphaZero Destroys Stockfish in 100-Game Match’, Chess.com, 6 December 2017; David Silver et al., ‘Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm’, arXiv (2017), https://arxiv.org/pdf/1712.01815.pdf; see also Sarah Knapton, ‘Entire Human Chess Knowledge Learned and Surpassed by DeepMind’s AlphaZero in Four Hours’, Telegraph, 6 December 2017. 19 Cowen, Average is Over, op. cit.; Tyler Cowen, ‘What are humans still good for?

Super Thinking: The Big Book of Mental Models
by Gabriel Weinberg and Lauren McCann
Published 17 Jun 2019

For example, in an elementary math textbook, if you teach fractions in different chapters, kids learn how fractions can be applied across different contexts and the concept is better reinforced than it would be if taught in a single chapter. To make deliberate practice work in an organizational context, you need to find a way to provide people the continual feedback and reinforcement learning they need. One way to do this is through a weekly one-on-one standing meeting with a manager, mentor, or coach. This meeting can serve as a forcing function to deliver such feedback on a regular basis (as we noted in Chapter 4). These can be relatively unstructured meetings where you discuss current projects, as well as discussions about skill development and career growth.

pages: 416 words: 112,159

Luxury Fever: Why Money Fails to Satisfy in an Era of Excess
by Robert H. Frank
Published 15 Jan 1999

See also Houses Rebitzer, James, 161 Redshirting, 154-55, 166 Red-tailed hawk, evolution of keen eyesight in, 148-49 Regulation(s) environmental pollution, 102, 208-9 labor, in Europe, 274 limiting work hours, 275 as remedy of liberals, 268 “slippery-slope” argument against, 198 of vacation time, 275 workplace safety, 169-71, 275 Reinforcement, learning and, 135-36 Relative income, 133-34 health and, 142-45 subjective well-being and, 111-20 Relative position, concerns about, 109-11, 120-45 ability signaling and, 139-40 allocation and regulation of effort, 135-37 biochemical markers of, 140-45 commitment problems in bargaining and, 137-39 conspicuous consumption and, 122 context, perception, and evaluation in, 129-32 dependence of reward on rank, 133-35, 136 human nature and, 123-28 wage distribution and, 117-19 Rent controls, 264 Reproductive fitness of individuals, natural selection for, 149-52 Resources saved with progressive consumption tax, 217-18 Restaurant Daniel, 18 Restaurant stoves, 24, 25 Retailing, on-line, 42 Retirement, savings and, 102-3 Reward dependence on rank, 133-35, 136 effort and, 228-31 Rice, Stephen K., 20 Rickard, Roger, 28-29 Ripple effects of progressive consumption tax, 222-23 Robb Report, The, 25 Robin, Vicki, 187 Robinson, John, 49-50 Robison, Jon, 86 Rockefeller, John D., 15 Roderick, D., 243 Rodgers, T.

pages: 410 words: 119,823

Radical Technologies: The Design of Everyday Life
by Adam Greenfield
Published 29 May 2017

Its two primary tools are a “policy network,” trained to predict and select the moves that the most expert human players would make from any given position on the board, and a “value network,” which plays each of the moves identified by the policy network forward to a depth of around thirty turns, and evaluates where Black and White stand in relation to one another at that juncture. These tools are supplemented by a reinforcement-learning module that allows AlphaGo to hive off slightly different versions of itself, set them against one another, and derive the underlying strategic lessons from a long succession of training games played between the two. For all its capability, DeepBlue was a machine of relatively conventional architecture.

pages: 381 words: 120,361

Sunfall
by Jim Al-Khalili
Published 17 Apr 2019

He recognized them as the two best-known words in computer science, representing the output of the most basic program anyone could write – two words that became famous decades before he was born, but which still carried significance for anyone with coding knowledge. They said simply: hello, world. Marc stared at the screen for a few seconds. Of course, even a sophisticated AI like Fabiola would operate fundamentally on deep neural net architectures using reinforcement learning. Rebooting her really did mean wiping her memory clean. His hands were now shaking uncontrollably, and he felt beads of sweat running down his temples. The deafening noise had stopped, as though a number of the machines had shut down. It had to be the scientists back in the control room.

pages: 500 words: 145,005

Misbehaving: The Making of Behavioral Economics
by Richard H. Thaler
Published 10 May 2015

Washington, DC: National Academies Press. Choi, James J., David Laibson, and Brigitte C. Madrian. 2004. “Plan Design and 401(k) Savings Outcomes.” National Tax Journal 57, no. 2: 275–98. ———, and Andrew Metrick. 2003. “Optimal Defaults.” American Economic Review 93, no. 2: 180–5. ———. 2009. “Reinforcement Learning and Savings Behavior.” Journal of Finance 64, no. 6: 2515–34. Chopra, Navin, Charles Lee, Andrei Shleifer, and Richard H. Thaler. 1993a. “Yes, Discounts on Closed-End Funds Are a Sentiment Index.” Journal of Finance 48, no. 2: 801–8. ———. 1993b. “Summing Up.” Journal of Finance 48, no. 2: 811–2.

Virtual Competition
by Ariel Ezrachi and Maurice E. Stucke
Published 30 Nov 2016

Assuming that the computers are programmed to refrain from violating the competition laws, the company may have done all that it can to ensure compliance. From a technological perspective, programming compliance may be challenging when one attempts to capture the creation of a market dynamic such as conscious parallelism. A command not to fix prices may be simple to execute, but under reinforcement learning the algorithm will experiment with solutions including, as the competition authorities recognize, the myriad possibilities of coordinated interaction, not all of which are illegal.16 Can the law credibly ask developers to instruct the algorithm not to react to market changes—to be inefficient?

pages: 420 words: 130,714

Science in the Soul: Selected Writings of a Passionate Rationalist
by Richard Dawkins
Published 15 Mar 2017

Such information is usually carried by an animal in the less literal sense that a trained observer, dissecting a new animal, can reconstruct many details of its natural environment.*5 Now, how could the information get from the environment into the animal? Lorenz argues that there are two ways, natural selection and reinforcement learning, but that these are both selective processes in the broad sense.*6 There is, in theory, an alternative method for the environment to imprint its information on the organism, and that is by direct ‘instruction’. Some theories of how the immune system works are ‘instructive’: antibody molecules are thought to be shaped directly by moulding themselves around antigen molecules.

Evil Genes: Why Rome Fell, Hitler Rose, Enron Failed, and My Sister Stole My Mother's Boyfriend
by Barbara Oakley Phd
Published 20 Oct 2008

These reactions display the same sort of tamped down neural control circuitry between the amygdala and the cingulate cortex that we saw earlier in the individuals with short serotonin transporter alleles. It's just that in this case, not only is there a damped connection between the cingulate cortex and the amygdala, there's also a damped connection between the orbitofrontal cortex and the amygdala. Weakening this latter circuit means someone might have trouble with stimulus-reinforcement learning. A typical example of this type of behavior might be the knuckleheaded kid who continues to saunter in late for classes even though he knows he'll get detention. The low-efficiency MAO-A allele appears to be particularly associated with impulsive violence, as opposed to violence as a purposeful means toward an end.

pages: 611 words: 130,419

Narrative Economics: How Stories Go Viral and Drive Major Economic Events
by Robert J. Shiller
Published 14 Oct 2019

New York: Irwin. Silber, William. 2014. When Washington Shut Down Wall Street: The Great Financial Crisis of 1914 and the Origins of America’s Monetary Supremacy. Princeton, NJ: Princeton University Press. Silver, David, et al. 2017. “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.” Cornell University, arXiv:1712.01815 [cs.AI], https://arxiv.org/abs/1712.01815. Skousen, Mark. 2001. The Making of Modern Economics. Armonk, NY: M. E. Sharpe. Slater, Michael D., David B. Buller, Emily Waters, Margarita Archibeque, & Michelle LeBlanc. 2003. “A Test of Conversational and Testimonial Messages versus Didactic Presentations of Nutrition Information.”

Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps
by Valliappa Lakshmanan , Sara Robinson and Michael Munn
Published 31 Oct 2020

A summary of the trade-off between bias and variance Problem Ensemble solution High bias (underfitting) Boosting High variance (overfitting) Bagging Other ensemble methods We’ve discussed some of the more common ensemble techniques in machine learning. The list discussed earlier is by no means exhaustive and there are different algorithms that fit with these broad categories. There are also other ensemble techniques, including many that incorporate a Bayesian approach or that combine neural architecture search and reinforcement learning, like Google’s AdaNet or AutoML techniques. In short, the Ensemble design pattern encompasses techniques that combine multiple machine learning models to improve overall model performance and can be particularly useful when addressing common training issues like high bias or high variance.

pages: 515 words: 136,938

The Brain That Changes Itself: Stories of Personal Triumph From the Frontiers of Brain Science
by Norman Doidge
Published 15 Mar 2007

Patients play a therapeutic card game. Four people play with thirty-two cards, made up of sixteen different pictures, two of each picture. A patient with a card with a rock on it must ask the others for the same picture. At first, the only requirement is that they not point to the card, so as not to reinforce learned nonuse. They are allowed to use any kind of circumlocution, as long as it is verbal. If they want a card with a picture of the sun and can’t find the word, they are permitted to say “The thing that makes you hot in the day” to get the card they want. Once they get two of a kind, they can discard them.

pages: 642 words: 141,888

Like, Comment, Subscribe: Inside YouTube's Chaotic Rise to World Domination
by Mark Bergen
Published 5 Sep 2022

Within YouTube the computer model built to do this was called the “trashy video classifier.” (Engineers working on this were part of the “trashy clickbait team.”) Still, for the most part, YouTube was satisfied with its system. By late 2017, YouTube’s recommendations were running on a new version of software from Google Brain known as Reinforce, named after a branch of AI (reinforcement learning). During a conference that year, a Google Brain researcher described Reinforce as YouTube’s most successful new service in two years. It sent overall views up nearly 1 percent, a huge sum given YouTube’s size. The New York Times later described this recommendation system as “a kind of long-term addiction machine.”

pages: 1,737 words: 491,616

Rationality: From AI to Zombies
by Eliezer Yudkowsky
Published 11 Mar 2015

But the blind idiot god isn’t that smart. Evolution is not a human programmer who can simultaneously refactor whole code architectures. Evolution is not a human programmer who can sit down and type out instructions at sixty words per minute. For millions of years before hominid consequentialism, there was reinforcement learning. The reward signals were events that correlated reliably to reproduction. You can’t ask a nonhominid brain to foresee that a child eating fatty foods now will live through the winter. So the DNA builds a protein brain that generates a reward signal for eating fatty food. Then it’s up to the organism to learn which prey animals are tastiest.

…

The protein brains plotted how to acquire calories and sex, without any explicit cognitive representation of “inclusive fitness.” A human engineer would have said, “Whoa, I’ve just invented a consequentialist! Now I can take all my previous hard-won knowledge about which behaviors improve fitness, and declare it explicitly! I can convert all this complicated reinforcement learning machinery into a simple declarative knowledge statement that ‘fatty foods and sex usually improve your inclusive fitness.’ Consequential reasoning will automatically take care of the rest. Plus, it won’t have the obvious failure mode where it invents condoms!” But then a human engineer wouldn’t have built the retina backward, either.

…

To a paperclip maximizer, the humans are just machines with pressable buttons. No need to feel what the other feels—if that were even possible across such a tremendous gap of internal architecture. How could an expected paperclip maximizer “feel happy” when it saw a human smile? “Happiness” is an idiom of policy reinforcement learning, not expected utility maximization. A paperclip maximizer doesn’t feel happy when it makes paperclips; it just chooses whichever action leads to the greatest number of expected paperclips. Though a paperclip maximizer might find it convenient to display a smile when it made paperclips—so as to help manipulate any humans that had designated it a friend.

pages: 509 words: 147,998

The Geeks Shall Inherit the Earth: Popularity, Quirk Theory, and Why Outsiders Thrive After High School
by Alexandra Robbins
Published 31 Mar 2009

“the more measurements you make”: See Berns, Gregory. deferring to the group: Ibid. “We observed the fear system”: Ibid. “unpleasant nature of standing alone”: Ibid. brain emits an error signal: See Klucharev, Vasily; Hytönen, Kaisa; Rijpkema, Mark; Smidts, Ale; and Fernández, Guillén. “Reinforcement Learning Signal Predicts Social Conformity,” Neuron, Vol. 61, Issue 1, January 2009. financial loss or social exclusion: See, for example, Landau, Elizabeth. “Why so many minds think alike,” CNN, January 15, 2009. triggers a process: See Klucharev. “Deviation from the group”: See Landau. debuted at number four: See Pomerantz, Dorothy and Rose, Lacey, eds.

pages: 479 words: 144,453

Homo Deus: A Brief History of Tomorrow
by Yuval Noah Harari
Published 1 Mar 2015

Rebecca Morelle, ‘Google Machine Learns to Master Video Games’, BBC, 25 February 2015, accessed 12 August 2015, http://www.bbc.com/news/science-environment-31623427; Elizabeth Lopatto, ‘Google’s AI Can Learn to Play Video Games’, The Verge, 25 February 2015, accessed 12 August 2015, http://www.theverge.com/2015/2/25/8108399/google-ai-deepmind-video-games; Volodymyr Mnih et al., ‘Human-Level Control through Deep Reinforcement Learning’, Nature, 26 February 2015, accessed 12 August 2015, http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html. 14. Michael Lewis, Moneyball: The Art of Winning an Unfair Game (New York: W. W. Norton, 2003). Also see the 2011 film Moneyball, directed by Bennett Miller and starring Brad Pitt as Billy Beane. 15.

pages: 506 words: 152,049

The Extended Phenotype: The Long Reach of the Gene
by Richard Dawkins
Published 1 Jan 1982

An animal that is well adapted to its environment can be regarded as embodying information about its environment, in the way that a key embodies information about the lock that it is built to undo. A camouflaged animal has been said to carry a picture of its environment on its back. Lorenz distinguished two kinds of theory for the origin of this kind of fit between organism and environment, but both his theories (natural selection and reinforcement learning) are subdivisions of what I am calling the selection theory. An initial pool of variation (genetic mutation or spontaneous behaviour) is worked upon by some kind of selection process (natural selection or reward/punishment), with the end result that only the variants fitting the environmental lock remain.

pages: 566 words: 169,013

Nexus: A Brief History of Information Networks From the Stone Age to AI
by Yuval Noah Harari
Published 9 Sep 2024

By 2016, users were indeed watching 1 billion hours every day on YouTube.5 YouTubers who were particularly intent on gaining attention noticed that when they posted an outrageous video full of lies, the algorithm rewarded them by recommending the video to numerous users and increasing the YouTubers’ popularity and income. In contrast, when they dialed down the outrage and stuck to the truth, the algorithm tended to ignore them. Within a few months of such reinforcement learning, the algorithm turned many YouTubers into trolls.6 The social and political consequences were far-reaching. For example, as the journalist Max Fisher documented in his 2022 book, The Chaos Machine, YouTube algorithms became an important engine for the rise of the Brazilian far right and for turning Jair Bolsonaro from a fringe figure into Brazil’s president.7 While there were other factors contributing to that political upheaval, it is notable that many of Bolsonaro’s chief supporters and aides had originally been YouTubers who rose to fame and power by algorithmic grace.

pages: 631 words: 177,227

The Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter
by Joseph Henrich
Published 27 Oct 2015

Chicago: University of Chicago Press. Kline, M. A., and R. Boyd. 2010. “Population size predicts technological complexity in Oceania.” Proceedings of the Royal Society B: Biological Sciences 277 (1693): 2559–2564. Klucharev, V., K. Hytonen, M. Rijpkema, A. Smidts, and G. Fernandez. 2009. “Reinforcement learning signal predicts social conformity.” Neuron 61 (1):140–151. Knauft, B. M. 1985. Good Company and Violence: Sorcery and Social Action in a Lowland New Guinea Society. Berkeley: University of California Press. Kobayashi, Y., and K. Aoki. 2012. “Innovativeness, population size and cumulative cultural evolution.”

pages: 619 words: 177,548

Power and Progress: Our Thousand-Year Struggle Over Technology and Prosperity
by Daron Acemoglu and Simon Johnson
Published 15 May 2023

The Rise and Fall of the Neoliberal Order: America and the World in the Free Market Era. New York: Oxford University Press. Gies, Frances, and Joseph Gies. 1994. Cathedral, Forge, and Waterwheel: Technology and Invention in the Middle Ages. New York: HarperCollins. Gilbert, Thomas Krendl, Sarah Dean, Nathan Lambert, Tom Zick, and Aaron Snoswell. 2022. “Reward Reports for Reinforcement Learning.” https://arxiv.org/abs/2204.10817. Gimpel, Jean. 1976. The Medieval Machine: The Industrial Revolution of the Middle Ages. New York: Penguin. Gimpel, Jean. 1983. The Cathedral Builders. New York: Grove. Goldin, Claudia, and Lawrence F. Katz. 2008. The Race Between Education and Technology.

pages: 685 words: 203,949

The Organized Mind: Thinking Straight in the Age of Information Overload
by Daniel J. Levitin
Published 18 Aug 2014

As Scott Grafton, a top neurologist at UC Santa Barbara, says, “Experience and implicit knowledge really matter. I recently did clinical rounds with two emergency room doctors who had fifty years of clinical experience between them. There was zero verbal gymnastics or formal logic of the kind that Kahneman and Tversky tout. They just recognize a problem. They have gained skill through extreme reinforcement learning, they become exceptional pattern recognition systems. This application of pattern recognition is easy to understand in a radiologist looking at X-rays. But it is also true of any great clinician. They can generate extremely accurate Bayesian probabilities based on years of experience, combined with good use of tests, a physical exam, and a patient history.”

pages: 846 words: 232,630

Darwin's Dangerous Idea: Evolution and the Meanings of Life
by Daniel C. Dennett
Published 15 Jan 1995

The way I have just described the Baldwin Effect certainly keeps Mind to {79} FIGURE 3.2 a minimum, if not altogether out of the picture; all it requires is some brute, mechanical capacity to stop a random walk when a Good Thing comes along, a minimal capacity to "recognize" a tiny bit of progress, to "learn" something by blind trial and error. In fact, I have put it in behavioristic terms. What Baldwin discovered was that creatures capable of "reinforcement learning" not only do better individually than creatures that are entirely "hard-wired"; their species will evolve faster because of its greater capacity to discover design improvements in the neighborhood.6 This is not how Baldwin described the effect he proposed. His temperament was the farthest thing from behaviorism.

pages: 798 words: 240,182

The Transhumanist Reader
by Max More and Natasha Vita-More
Published 4 Mar 2013

Archives of General Psychiatry 24/4, pp. 298–304. McCubbin, Chris (2005) Star Wars Galaxies: The Complete Guide. Roseville, CA: Prima Games. McCubbin, Chris, Ladyman, David, and Frase, Tuesday, eds. (2005) Star Wars Galaxies: The Total Experience. Roseville, CA: Prima Games. Merrick, Kathryn E. and Maher, Mary Lou (2009) Motivated Reinforcement Learning: Curious Characters for Multiuser Games. New York: Springer. Moravec, Hans (1998) “Simulation, Consciousness, Existence.” http://www.frc.ri.cmu.edu/~hpm/project.archive/general.articles/1998/SimConEx.98.html (accessed October 28, 2011). Reaves, Michael (2001) Darth Maul: Shadow Hunter.

pages: 1,535 words: 337,071

Networks, Crowds, and Markets: Reasoning About a Highly Connected World
by David Easley and Jon Kleinberg
Published 15 Nov 2010

Flammini, F. Menczer, and A. Vespignani. Topical interests and the mitigation of search engine bias. Proc. Natl. Acad. Sci. USA, 103(34):12684–12689, 2006. [165] James H. Fowler and Sangick Jeon. The authority of Supreme Court precedent. Social Networks, 30:16–30, 2008. [166] Reiner Franke. Reinforcement learning in the El Farol model. Journal of Economic Behavior and Organization, 51:367–388, 2003. [167] Linton C. Freeman. A set of measure of centrality based on betweenness. Sociometry, 40(1):35–41, 1977. [168] Linton C. Freeman. Centrality in social networks: Conceptual clarification. Social Networks, 1:215–239, 1979