reinforcement learning

back to index

description: field of machine learning

143 results

The Means of Prediction: How AI Really Works (And Who Benefits)

by Maximilian Kasy  · 15 Jan 2025  · 209pp  · 63,332 words

with different options and doing what seems best based on what has been learned from actions they have previously tried. Reinforcement learning goes one step further than multi-armed bandit algorithms. Reinforcement learning builds algorithms that learn to plan by learning how likely it is that different states of the world are favorable down

as crucial in many real-world tasks as it is in playing games. The problem of planning is taken into account in the framework of reinforcement learning. Reinforcement Learning Certain board games have long stood as symbols of intellectual challenge: chess in Europe, go in East Asia, and backgammon in the Middle East. In

algorithm of AlphaGo was much simpler. Like TD-Gammon, it used an approach called (deep) reinforcement learning. Both TD-Gammon and AlphaGo learned to play by playing a vast number of games against themselves. The term reinforcement learning comes from behaviorist ideas of how animals learn and how they can be trained. According to

, to teach a computer to play backgammon by means of selective rewards than to try the same with your dog or cat. But how does reinforcement learning work with a computer? Your computer has no innate desire for treats, after all. Let us start by again considering the multi-armed bandits. How

and AlphaGo used. They trained neural networks to predict the probability of winning, in the recursive manner described above. This all sounds pretty good: Deep reinforcement learning can teach itself by exploring the world and learning to plan for the future. Why not use this approach to address all kinds of real

need to draw conclusions about the state of the world, and you need to make predictions about its future state. Any real-world application of reinforcement learning must overcome this issue of partial observability, which implies that it needs to remember the past to predict the future. The second reason why deep

reinforcement learning is not easily applied to real-world problems is that it is a very data-hungry approach. As mentioned above, the way AlphaGo learned was

might want to do exactly that.) Actual self-driving cars, to the extent that they exist, thus need to rely on approaches other than pure reinforcement learning. There is a more general lesson here. Current deep-learning-based methods in AI need lots of data. There are some settings where data have

to solve. At one extreme, in terms of scalability, we have data generated via simulation, such as the simulated games that were used to train reinforcement learning algorithms like AlphaGo. In domains where data can be generated by simulation, there are no limits to the machine-learning-based approach, at least in

safe AI—or so some claim. One of the approaches that has been proposed for addressing the problem of value alignment is known as inverse reinforcement learning—in effect, sidestepping the problem of explicitly specifying a reward function for AI algorithms. Rather than maximizing an explicitly specified reward function, in inverse

reinforcement learning the algorithms are supposed to construct a reward function by inferring human objectives from observed human behavior. The algorithms are meant to learn their own

further, the algorithms are also supposed to learn the purpose of human actions relative to future rewards. This, then, is truly an inversion of the reinforcement learning problem. If a human waters a basil plant now, he might want to eat basil pesto in a month. If another human trains to increase

behavior for decades. In doing so, they have learned that estimating preferences is quite difficult and not always feasible, even when maintaining strong assumptions. Inverse reinforcement learning and related approaches may hold some promise, but there are fundamental limits to what these approaches can achieve. They cannot solve the multitasking problem, and

teaching to the test. This problem cannot be solved by observing educational policymakers to infer the preferences of these policymakers, which is what the inverse reinforcement learning approach would try to do. Teaching to the test occurs because standardized tests cannot measure certain important dimensions of student development. No amount of reward

behavior to infer his preferences and by deriving from these inferred preferences the correct objective for social media feed selection, which is, again, what inverse reinforcement learning would suggest. There are fundamental limits on the extent to which the value-alignment problem can be solved by reward engineering or by approaches such

as inverse reinforcement learning. Because of this, there are settings where important decisions should not be delegated to AI systems. These systems are bound to ignore unmeasured dimensions of

well-being, and the underlying agency problems prevent effective delegation. There is another, and arguably even more important, problem that needs to be solved. Inverse reinforcement learning is supposed to learn human preferences and act accordingly. But which human’s preferences? Whose values should the algorithm align with? Alignment with Whom? There

of the Web.” New Yorker, February 9, 2023. François-Lavet, V., P. Henderson, R. Islam, M. G. Bellemare, and J. Pineau. “An Introduction to Deep Reinforcement Learning.” Foundations and Trends in Machine Learning 11 no. 3–4 (2018): 219–354. Friedman, J., T. Hastie, and R. Tibshirani. The Elements of Statistical Learning

1988 Neyman Memorial Lecture: A Galtonian Perspective on Shrinkage Estimators.” Statistical Science 5, no. 1 (1990): 147–55. Sutton, R. S., and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 2018. Thompson, W. R. “On the Likelihood That One Unknown Probability Exceeds Another in View of the Evidence of Two Samples

, 45, 49–50, 63–65, 89–90; and generative AI, 53–56; how it works, 45–50; reinforcement learning and, 63; relative simplicity of, 49; self-supervised, 52–53; technical meaning of, 47 deep reinforcement learning, 63 democratic governance: and automated decision-making, 2, 7–8, 111–13, 135, 186–88; broad conception of

intelligence, concept of, 19, 21–22 intelligence explosion, 4, 22, 122 International Energy Agency, 93 interoperability requirements, 110 interpretability. See explainability interventions, 181–82 inverse reinforcement learning, 129–30 Israel, AI use in warfare by, 6, 31–32, 133 Jabirian corpus, 44, 51 Johnson, Simon, 150 jury of peers, 199 Kafka, Franz

noise, 141 randomized response method, 136–38, 137 rare earth minerals, 90 Rawls, John, 80 recidivism, 169 regularization, 40 regulation, of data collection, 145–46 reinforcement learning, 12, 60–65. See also inverse reinforcement learning; reward design representative democracy, 197–99 revealed preferences, 78, 129 reward design, 124–28. See also objectives of AI

; reinforcement learning Robinson, Joan, 154 robots, 6, 23–24, 122, 129, 131, 157, 160–61 robustness, 177–78, 184 Russell, Stuart, 4 safety. See AI safety sample

Fixed: Why Personal Finance is Broken and How to Make it Work for Everyone

by John Y. Campbell and Tarun Ramadorai  · 25 Jul 2025

rarity of hugely successful companies. Conversely, the loss-making companies that failed, which are the majority, mostly don’t make the news. The Limits of Reinforcement Learning We have argued that many people have faulty mental models and do faulty calculations when thinking about financial problems. But in many areas of human

, just as a toddler learns from a burnt finger not to touch a hot stove. Psychologists call this “reinforcement learning.” Why can’t people use this type of learning to avoid financial pain? Reinforcement learning is often effective, both for people and for computer programs such as the AlphaGo Zero program that, in 2017

, became the world champion in the game of Go by playing experimentally against itself for forty days.18 But reinforcement learning operates only as quickly as relevant experience accumulates—and this happens dangerously slowly in the financial context. An obvious example here is large and important

, but the results are not particularly surprising given the social nature of human beings. In finance, unfortunately, social learning inherits all the pitfalls of personal reinforcement learning from one’s own experience, with some extra problems thrown in.25 Delayed rewards and losses afflict social learning just as much as personal

reinforcement learning, and randomness does, too, because random shocks often affect large groups of people in the same way. A strategy of aggressively borrowing to buy a

poignant as the financial consequences of a poor decision can be far more serious than temporarily having to live in a less-than-perfect house. Reinforcement learning can all too often be driven by emotions rather than financial outcomes, causing people to focus on their prior experiences of what felt good rather

, 188–189 excess (deductible), 142, 144, 252, 296n39 exchange-traded funds (ETFs), 250 expectations of returns, 293n17 Expedia, 209 Experian, 301n4 experience: extrapolation from, 134; reinforcement learning and, 44 exponential growth, unintuitiveness of, 39, 271n11 extended warranties, 142–143, 295n35 extrapolation: danger of, 40, 134; by investors, 136 Facebook, influence of friends

, 27 intermittent financial decisions, 49–51 intuition, about exponential growth, 271n11 intuitive finance, 31, 38–47; danger of learning from others, 45–47; limits of reinforcement learning and, 43–45; uncertainty exposing weakness of human intuition, 40–43 intuitive reasoning, lack of accounting for historical information, 41–43 investment management, costs and

of fintech, 187–189; preventing from going too far, 218–220; of public health, 56 regulatory sandboxes, testing new DeFi products using, 195–196, 304n32 reinforcement learning, limits of, 43–45 reinsurance market, 296n46 rent, 58–59 rents, apartments, chonsei and, 64–65 rent seeking, 58–59 replacement rate, 159 representativeness, 43

Artificial Intelligence: A Modern Approach

by Stuart Russell and Peter Norvig  · 14 Jul 2019  · 2,466pp  · 668,761 words

and Transfer Learning 22.8Applications Summary Bibliographical and Historical Notes 23Reinforcement Learning 23.1Learning from Rewards 23.2Passive Reinforcement Learning 23.3Active Reinforcement Learning 23.4Generalization in Reinforcement Learning 23.5Policy Search 23.6Apprenticeship and Inverse Reinforcement Learning 23.7Applications of Reinforcement Learning Summary Bibliographical and Historical Notes VICommunicating, perceiving, and acting 24Natural Language Processing 24.1Language Models 24.

processes (MDPs) developed in the field of operations research. A flood of work followed connecting AI planning research to MDPs, and the field of reinforcement learning found applications in robotics and process control as well as acquiring deep theoretical foundations. One consequence of AI’s newfound appreciation for data, statistical modeling

, for example by detecting unusual patterns of behavior, but they will also contribute to the potency, survivability, and proliferation capability of malware. For example, reinforcement learning methods have been used to create highly effective tools for automated, personalized blackmail and phishing attacks. We will revisit these topics in more depth in

); economics (market-based algorithms (Dias et al., 2006)); physics (particle swarms (Li and Yao, 2012) and spin glasses (Mézard et al., 1987)); animal behavior (reinforcement learning, grey wolf optimizers (Mirjalili and Lewis, 2014)); ornithology (Cuckoo search (Yang and Deb, 2014)); entomology (ant colony (Dorigo et al., 2008), bee colony (Karaboga

of actions have probabilities associated with them): Markov decision processes, partially observable Markov decision processes, and game theory. In Chapter 23 we show that reinforcement learning allows an agent to learn how to behave from past successes and failures. Bibliographical and Historical Notes AI planning arose from investigations into state-space

models, to update its belief state, and to project forward possible action sequences. We shall return MDPs and POMDPs in Chapter 23, which covers reinforcement learning methods that allow an agent to improve its behavior from experience. Bibliographical and Historical Notes Richard Bellman developed the ideas underlying the modern approach to

). The texts by Bertsekas (1987) and Puterman (1994) provide rigorous introductions to sequential decision problems and dynamic programming. Bertsekas and Tsitsiklis (1996) include coverage of reinforcement learning. Sutton and Barto (2018) cover similar ground but in a more accessible style. Sigaud and Buffet (2010), Mausam and Kolobov (2012) and Kochenderfer (2015)

replacing the simple components with more sophisticated machine learning models. Part of problem formulation is deciding whether you are dealing with supervised, unsupervised, or reinforcement learning. The distinctions are not always so crisp. In semisupervised learning we are given a few labeled examples and use them to mine more information from

gradient descent in parameter space to minimize the loss function. •Deep learning works well for visual object recognition, speech recognition, natural language processing, and reinforcement learning in complex environments. •Convolutional networks are particularly well suited for image processing and other tasks where the data have a grid topology. •Recurrent networks are

have already seen the concept of rewards in Chapter 16 for Markov decision processes (MDPs). Indeed, the goal is the same in reinforcement learning: maximize the expected sum of rewards. Reinforcement learning differs from “just solving an MDP” because the agent is not given the MDP as a problem to solve; the agent

.6, we explore apprenticeship learning: training a learning agent using demonstrations rather than reward signals. Finally, Section 23.7 reports on applications of reinforcement learning. 23.2Passive Reinforcement Learning We start with the simple case of a fully observable environment with a small number of actions and states, in which an agent already

transition model to perform its updates. The environment itself supplies the connection between neighboring states in the form of observed transitions. Figure 23.4A passive reinforcement learning agent that learns utility estimates using temporal differences. The step-size function ∝(n) is chosen to ensure convergence. Figure 23.5The TD learning curves

update rules, even though there are good solutions in the hypothesis space. There are more sophisticated algorithms that can avoid these problems, but at present reinforcement learning with general function approximators remains a delicate art. In addition to parameters diverging to infinity, there is a more surprising problem called catastrophic forgetting.

to understand this kind of situation as a two-person assistance game, as described in Section 17.2.5. 23.7Applications of Reinforcement Learning We now turn to applications of reinforcement learning. These include game playing, where the transition model is known and the goal is to learn the utility function, and robotics,

specify. Imitation learning formulates the problem as supervised learning of a policy from the expert’s state–action pairs. Inverse reinforcement learning infers reward information from the expert’s behavior. Reinforcement learning continues to be one of the most active areas of machine learning research. It frees us from manual construction of behaviors

environment. Weighted linear combinations of features and neural networks are factored representations for function approximation. It is also possible to apply reinforcement learning to structured representations; this is called relational reinforcement learning (Tadepalli et al.,2004). The use of relational descriptions allows for generalization across complex behaviors involving different objects. Analysis of the

explore unknown environments and are guaranteed to converge on near-optimal policies with a sample complexity that is polynomial in the number of states. Bayesian reinforcement learning (Dearden et al., 1998, 1999) provides another angle on both model uncertainty and exploration. The basic idea underlying imitation learning is to apply

of temporal-difference learning; related research describes other neuroscientific and behavioral experiments (Dayan and Niv, 2008; Niv, 2009; Lee et al., 2012). Work in reinforcement learning has been accelerated by the availability of open-source simulation environments for developing and testing learning agents. The University of Alberta’s Arcade Learning Environment

simulation (Savva et al., 2019) provides a photo-realistic virtual environment for indoor robotic tasks, and their HORIZON platform (Gauci et al., 2018) enables reinforcement learning in large-scale production systems. The SYNTHIA system (Ros et al., 2016) is a simulation environment designed for improving the computer vision capabilities of self

and perform safely. Robotics brings together many of the concepts we have seen in this book, including probabilistic state estimation, perception, planning, unsupervised learning, reinforcement learning, and game theory. For some of these concepts robotics serves as a challenging example application. For other concepts this chapter breaks new ground, for instance

cost (ILQR). •Planning under uncertainty unites perception and action by online replanning (such as model predictive control) and information gathering actions that aid perception. •Reinforcement learning is applied in robotics, with techniques striving to reduce the required number of interactions with the real world. Such techniques tend to exploit models, be

function they should optimize from human input, such as demonstrations, corrections, or instruction in natural language. Alternatively, robots can imitate human behavior, and use reinforcement learning to help tackle the challenge of generalization to new states. Bibliographical and Historical Notes The word robot was popularized by Czech playwright Karel Čapek in

rational agents. A variety of different agent designs were considered, ranging from reflex agents to knowledge-based decision-theoretic agents to deep learning agents using reinforcement learning. There is also variety in the component technologies from which these designs are assembled: logical, probabilistic, or neural reasoning; atomic, factored, or structured representations

, of course, but compilation methods can be applied so that the overhead is small compared to the costs of the computations being controlled. Metalevel reinforcement learning may provide another way to acquire effective policies for controlling deliberation: in essence, computations that lead to better decisions are reinforced, while those that

Based Scheduling. Morgan Kaufmann. Abbas, A. (2018). Foundations of Multiattribute Utility. Cambridge University Press. Abbeel, P. and Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In ICML-04. Abney, S., McAllester, D. A., and Pereira, F. (1999). Relating probabilistic grammars and automata. In ACL-99. Abramson, B. (1987). The

Andre, D., Friedman, N., and Parr, R. (1998). Generalized prioritized sweeping. In NeurIPS 10. Andre, D. and Russell, S. J. (2002). State abstraction for programmable reinforcement learning agents. In AAAI-02. Andreae, P. (1985). Justified Generalisation: Learning Procedures from Examples. Ph.D. thesis, MIT. Andrieu, C., Doucet, A., and Holenstein, R. (

Gaussian Bayesian tracking. IEEE Transactions on Signal Processing, 50, 174–188. Arulkumaran, K., Deisenroth, M. P., Brundage, M., and Bharath, A. A. (2017). Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34, 26–38. Arunachalam, R. and Sadeh, N. M. (2005). The supply chain trading agent competition. Electronic Commerce

. In Automatic Control–World Congress, 1987: Selected Papers from the 10th Triennial World Congress of the International Federation of Automatic Control. Dietterich, T. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. JAIR, 13, 227–303. Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische

, 137– 144. Gao, J. (2014). Machine learning applications for data center optimization. Google Research. García, J. and Fernández, F. (2015). A comprehensive survey on safe reinforcement learning. JMLR, 16, 1437–1480. Gardner, M. (1968). Logic Machines, Diagrams and Boolean Algebra. Dover. Garey, M. R. and Johnson, D. S. (1979). Computers

using Markovian decision theory. Master’s report, Computer Science Division, University of California, Berkeley. Koenig, S. (2000). Exploring unknown environments with real–time search or reinforcement learning. In NeurIPS 12. Koenig, S. (2001). Agent-centered search. AIMag, 22, 109–131. Koenig, S. and Likhachev, M. (2002). D* Lite. AAAI-15, 15

viaGLTL. arXiv:1704.04341. Liu, B., Gemp, I., Ghavamzadeh, M., Liu, J., Mahadevan, S., and Petrik, M. (2018). Proximal gradient temporal difference learning: Stable reinforcement learning with polynomial sample complexity. JAIR, 63, 461–494. Liu, H., Simonyan, K., Vinyals, O., Fernando, C., and Kavukcuoglu, K. (2017). Hierarchical representations for efficient architecture

(Eds.). (1992). Geometric Invariance in Computer Vision. MIT Press. Munos, R., Stepleton, T., Harutyunyan, A., and Bellemare, M. G. (2017). Safe and efficient off–policy reinforcement learning. In NeurIPS 29. Murphy, K. (2002). Dynamic Bayesian Networks: Representation, Inference and Learning. Ph.D. thesis, UC Berkeley. Murphy, K. (2012). Machine Learning: A Probabilistic

Theory. Cambridge University Press. Nisan, N., Roughgarden, T., Tardos, E., and Vazirani, V. (Eds.). (2007). Algorithmic Game Theory. Cambridge University Press. Niv, Y. (2009). Reinforcement learning in the brain. Journal of Mathematical Psychology, 53, 139–154. Nivre, J., De Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajic, J., Manning, C., McDonald

. and Yannakakis, M. (1991). Shortest paths without a map. Theoretical Computer Science, 84, 127–150. Papavassiliou, V. and Russell, S. J. (1999). Convergence of reinforcement learning with general function approximators. In IJCAI-99. Parisi, G. (1988). Statistical Field Theory. Addison-Wesley. Parisi, M. M. G. and Zecchina, R. (2002). Analytic and

of temporal differences. Machine Learning, 3, 944. Sutton, R. S., McAllester, D. A., Singh, S., and Man-sour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In NeurIPS 12. Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In ICML-90

D. (1983). An abstract Prolog instruction set. Technical note, SRI International. Wasserman, L. (2004). All of Statistics. Springer. Watkins, C. J. (1989). Models of Delayed Reinforcement Learning. Ph.D. thesis, Psychology Department, Cambridge University. Watson, J. D. and Crick, F. (1953). A structure for deoxyribose nucleic acid. Nature, 171, 737. Wattenberg, M

programming of intelligent embedded systems and robotic space explorers. Proc. IEEE, 91(212–237). Williams, R. J. (1992). Simple statistical gradient following algorithms for connectionist reinforcement learning. Machine Learning, 8, 229–256. Williams, R. J. and Zipser, D. (1989). A learning algorithm for continually running fully recurrent neural networks. Neural Computation,

systematic comparison of phrase-based, hierarchical and syntax-augmented statistical MT. In COLING-08. Zoph, B. and Le, Q. V. (2016). Neural architecture search with reinforcement learning. arXiv:1611.01578. Zuse, K. (1945). The Plankalkül. Report, Gesellschaft für Mathematik und Datenverarbeitung. Zweig, G. and Russell, S. J. (1998). Speech recognition with

HMM), 43, 479, 491, 491–496, 503, 515, 795, 881 hidden variable, 443, 788 HIERARCHICAL-SEARCH, 377 hierarchical decomposition, 375 hierarchical look-ahead, 383 hierarchical reinforcement learning, 858, 1065 hierarchical structure, 1065 hierarchical task network (HTN), 375, 397, 858 Hierholzer, C., 162, 1098 high-level action, 375 higher-order logic, 273 Hilbert

., 516, 1100 intractability, 39 intrinsic property, 340 introspection, 20, 31 invariance, temporal, 811 inverse (of a matrix), 1077 inverse dynamics, 958 inverse kinematics, 947 inverse reinforcement learning, 864, 1054, 1065 inverted pendulum, 867 Ioffe, S., 837, 1100 IPL (programming language), 36 IQ test, 38 IR (information retrieval), 901, 905 Irpan, A.,

1053 regularization function, 689 Reid, D. B., 667, 1110 Reid, M., 116, 125, 1102 Reif, J., 984, 986, 1089, 1110 reification, 335 REINFORCE (reinforcement learning algorithm), 863 reinforcement, 840 reinforcement learning, 28, 210, 585, 671, 840, 789–873, 986 active, 842, 848–854 Bayesian, 851 deep, 835, 857 distributed, 636 generalization in, 854–861

The Alignment Problem: Machine Learning and Human Values

by Brian Christian  · 5 Oct 2020  · 625pp  · 167,349 words

a forty-five-year collaboration that would essentially found a new field. The field, which would cross neuroscience, behaviorist psychology, engineering, and mathematics, was dubbed “reinforcement learning”; and their names, forever linked in bibliographies of AI—“Barto & Sutton,” “Sutton & Barto”—would become synonymous with the definitive textbook of the field they

Comparative Psychology discussed “trial and error” in the context of animal behavior. For a short history of animal learning from the perspective of reinforcement learning, see Sutton and Barto, Reinforcement Learning. 9. See Thorndike, “A Theory of the Action of the After-Effects of a Connection upon It,” and Skinner, “The Rate of

-Analog Reinforcement Systems and Its Application to the Brain Model Problem” for an early example, and Chapter 15 of Sutton and Barto, Reinforcement Learning for discussion. 23. Andrew G. Barto, “Reinforcement Learning: A History of Surprises and Connections” (lecture), July 19, 2018, International Joint Conference on Artificial Intelligence, Stockholm, Sweden. 24. Andrew Barto,

personal interview, May 9, 2018. 25. The canonical text about reinforcement learning is Sutton and Barto, Reinforcement Learning, recently updated into a second edition. For a summary of the field up to the mid-1990s, see also Kaelbling, Littman, and Moore

this history, see “Michael Littman: The Reward Hypothesis” (lecture), University of Alberta, October 16, 2019, available at https://www.coursera.org/lecture/fundamentals-of-reinforcement-learning/michael-littman-the-reward-hypothesis-q6x0e. Despite the recency of this particular framing, the idea of understanding behavior as motivated, whether explicitly or implicitly, by

2018). 52. Sutton, “A Unified Theory of Expectation in Classical and Instrumental Conditioning.” 53. Sutton, “Temporal-Difference Learning” (lecture), July 3, 2017, Deep Learning and Reinforcement Learning Summer School 2017, Université de Montréal, July 3, 2017, http://videolectures.net/deeplearning2017_sutton_td_learning/. 54. Sutton, “Temporal-Difference Learning.” 55. Sutton, “Learning to

Predict by the Methods of Temporal Differences.” See also Sutton’s PhD thesis: “Temporal Credit Assignment in Reinforcement Learning.” 56. See Watkins, “Learning from Delayed Rewards” and Watkins and Dayan, “Q-Learning.” 57. Tesauro, “Practical Issues in Temporal Difference Learning.” 58. Tesauro, “TD

.” 67. Niv. 68. For a discussion of potential limitations to the TD-error theory of dopamine, see, e.g., Dayan and Niv, “Reinforcement Learning,” and O’Doherty, “Beyond Simple Reinforcement Learning.” 69. Niv, “Reinforcement Learning in the Brain.” 70. Yael Niv, personal interview, February 21, 2018. 71. Lenson, On Drugs. 72. See, e.g., Berridge, “

Formula That Predicts Happiness,” https://www.theglobeandmail.com/life/health-and-fitness/health/researchers-create-formula-that-predicts-happiness/article19919756/. 80. See Tomasik, “Do Artificial Reinforcement-Learning Agents Matter Morally?” For more on this topic, see also Schwitzgebel and Garza, “A Defense of the Rights of Artificial Intelligences.” 81. Brian Tomasik,

“Ethical Issues in Artificial Reinforcement Learning,” https://reducing-suffering.org/ethical-issues-artificial-reinforcement-learning/. 82. Daswani and Leike, “A Definition of Happiness for Reinforcement Learning Agents.” See also People for the Ethical Treatment of Reinforcement Learners: http://petrl.org. 83. Andrew Barto

CURIOSITY 1. Turing, “Intelligent Machinery.” 2. There were efforts starting in 2004 to develop standardized RL benchmarks and competitions; see Whiteson, Tanner, and White, “The Reinforcement Learning Competitions.” 3. Marc Bellemare, personal interview, February 28, 2019. 4. Bellemare et al., “The Arcade Learning Environment,” stemming originally from Naddaf, “Game-Independent AI Agents

was introduced into machine learning with Barto, Singh, and Chentanez, “Intrinsically Motivated Learning of Hierarchical Collections of Skills,” and Singh, Chentanez, and Barto, “Intrinsically Motivated Reinforcement Learning.” For a more recent overview of this literature, see Baldassarre and Mirolli, Intrinsically Motivated Learning in Natural and Artificial Systems. 13. Hobbes, Leviathan. 14. Simon

, “Learning and Satiation of Response in Intrinsically Motivated Complex Puzzle Performance by Monkeys.” 19. Scenarios of this type are described in Barto, “Intrinsic Motivation and Reinforcement Learning,” and Deci and Ryan, Intrinsic Motivation and Self-Determination in Human Behavior. 20. Berlyne, Conflict, Arousal, and Curiosity. 21. And see, for instance, Berlyne

intellectual motivation.” See Minsky, “Steps Toward Artificial Intelligence.” 30. See Sutton, “Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming” and “Reinforcement Learning Architectures for Animats.” MIT’s Leslie Pack Kaelbling devised a similar method, based on the idea of measuring an agent’s “confidence intervals” around the

approaches, which incentivize exploration by rewarding “information gain,” see, e.g., Schmidhuber, “Curious Model-Building Control Systems”; Stadie, Levine, and Abbeel, “Incentivizing Exploration in Reinforcement Learning with Deep Predictive Models”; and Houthooft et al., “VIME.” 51. Burda et al., “Large-Scale Study of Curiosity-Driven Learning.” 52. See Burda et al

30. This is a very active area of research. See, e.g., Subramanian, Isbell, and Thomaz, “Exploration from Demonstration for Interactive Reinforcement Learning”; Večerík et al., “Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards”; and Hester et al., “Deep Q-Learning from Demonstrations.” 31. In fact, many agents trained in

//thesis/PieterAbbeel_Defense_19May2008_320x180.mp4. 21. Abbeel, Coates, and Ng, “Autonomous Helicopter Aerobatics Through Apprenticeship Learning.” 22. Abbeel et al., “An Application of Reinforcement Learning to Aerobatic Helicopter Flight.” They also successfully performed a nose-in funnel and a tail-in funnel. 23. “As repeated sub-optimal demonstrations tend to

differ in their suboptimalities, together they often encode the intended trajectory.” See Abbeel, “Apprenticeship Learning and Reinforcement Learning with Application to Robotic Control,” which refers to the work in Coates, Abbeel, and Ng, “Learning for Control from Multiple Demonstrations.” 24. Abbeel, Coates,

Stanford helicopter performing the chaos, see “Stanford University Autonomous Helicopter: Chaos,” https://www.youtube.com/watch?v=kN6ifrqwIMY. 28. Ziebart et al., “Maximum Entropy Inverse Reinforcement Learning,” which leverages the principle of maximum entropy derived from Jaynes, “Information Theory and Statistical Mechanics.” See also Ziebart, Bagnell, and Dey, “Modeling Interaction via the

Problems in AI Safety,” which, in turn, references Salge, Glackin, and Polani, “Empowerment: An Introduction,” and Mohamed and Rezende, “Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning.” 50. Alexander Turner, personal interview, July 11, 2019. 51. Wiener, “Some Moral and Technical Consequences of Automation.” 52. According to Paul Christiano, “corrigibility” as

Art. 42. Turing et al., “Can Automatic Calculating Machines Be Said to Think?” ACKNOWLEDGMENTS 1. McCulloch, Finality and Form. BIBLIOGRAPHY Abbeel, Pieter. “Apprenticeship Learning and Reinforcement Learning with Application to Robotic Control.” PhD thesis, Stanford University, 2008. Abbeel, Pieter, Adam Coates, and Andrew Y. Ng. “Autonomous Helicopter Aerobatics Through Apprenticeship Learning.” International

Rebellion: Changing the Narrative.” In Thirty-First AAAI Conference on Artificial Intelligence, 2017. Akrour, Riad, Marc Schoenauer, and Michèle Sebag. “APRIL: Active Preference-Learning Based Reinforcement Learning.” In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 116–31. Springer, 2012. Akrour, Riad, Marc Schoenauer, Michèle Sebag, and Jean-Christophe

of the Cognitive Science Society, 2017. Choi, Jongwook, Yijie Guo, Marcin Moczulski, Junhyuk Oh, Neal Wu, Mohammad Norouzi, and Honglak Lee. “Contingency-Aware Exploration in Reinforcement Learning.” In International Conference on Learning Representations, 2019. Chouldechova, Alexandra. “Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments.” Big Data 5

Mental Health Crisis in Graduate Education.” Nature Biotechnology 36, no. 3 (2018): 282. Everitt, Tom, Victoria Krakovna, Laurent Orseau, Marcus Hutter, and Shane Legg. “Reinforcement Learning with a Corrupted Reward Channel.” In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), 4705–13, 2017. Eysenbach, Benjamin, Shixiang

Gu, Julian Ibarz, and Sergey Levine. “Leave No Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning.” In International Conference on Learning Representations, 2018. Fantz, Robert L. “Visual Experience in Infants: Decreased Attention to Familiar Patterns Relative to Novel Ones.” Science 146

and Don Mosenfelder. Bobby Fischer Teaches Chess. Basic Systems, 1966. Florensa, Carlos, David Held, Markus Wulfmeier, Michael Zhang, and Pieter Abbeel. “Reverse Curriculum Generation for Reinforcement Learning.” In Proceedings of the 1st Annual Conference on Robot Learning, edited by Sergey Levine, Vincent Vanhoucke, and Ken Goldberg, 482–95. PMLR, 2017. Flores, Anthony

1976): 3–46. Malik, Dhruv, Malayandi Palaniappan, Jaime Fisac, Dylan Hadfield-Menell, Stuart Russell, and Anca Drăgan. “An Efficient, Generalized Bellman Update for Cooperative Inverse Reinforcement Learning.” In Proceedings of the 35th International Conference on Machine Learning, edited by Jennifer Dy and Andreas Krause, 3394–3402. PMLR, 2018. Malone, Thomas W. “Toward

2600 Console Games.” Master’s thesis, University of Alberta, 2010. Nair, Ashvin, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. “Overcoming Exploration in Reinforcement Learning with Demonstrations.” In 2018 IEEE International Conference on Robotics and Automation (ICRA), 6292–99. IEEE, 2018. Nalisnick, Eric, Bhaskar Mitra, Nick Craswell, and Rich Caruana

.” Journal of Experimental Criminology 12, no. 3 (2016): 347–71. Saunders, William, Girish Sastry, Andreas Stuhlmueller, and Owain Evans. “Trial Without Error: Towards Safe Reinforcement Learning via Human Intervention.” In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems, 2067–99. International Foundation for Autonomous Agents and Multiagent

Hospitalized Patients.” JAMA Neurology 74, no. 12 (2017): 1419–24. Subramanian, Kaushik, Charles L. Isbell Jr., and Andrea L. Thomaz. “Exploration from Demonstration for Interactive Reinforcement Learning.” In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, 447–56. International Foundation for Autonomous Agents and Multiagent Systems, 2016. Sundararajan, Mukund

for Animats.” In From Animals to Animats: Proceedings of the First International Conference on Simulation of Adaptive Behavior, 288–96. 1991. ———. “Temporal Credit Assignment in Reinforcement Learning.” PhD thesis, University of Massachusetts, Amherst, 1984. ———. “A Unified Theory of Expectation in Classical and Instrumental Conditioning.” Bachelor’s thesis, Stanford University, 1978. Sutton,

Richard S., and Andrew G. Barto. Reinforcement Learning: An Introduction. 2nd ed. MIT Press, 2018. Sweeney, Latanya. “Discrimination in Online Ad Delivery.” Communications of the ACM 56, no. 5 (2013): 44–54. ———.

Večerík, Matej, Todd Hester, Jonathan Scholz, Fumin Wang, Olivier Pietquin, Bilal Piot, Nicolas Heess, Thomas Rothörl, Thomas Lampe, and Martin Riedmiller. “Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards.” arXiv Preprint arXiv:1707.08817, 2017. Vincent, James. “Google ‘Fixed’ Its Racist Algorithm by Removing Gorillas from Its Image

9, 12, 337n6, 346n13 alignment problem amplification/distillation and, 249 analogies and, 317 corrigibility and, 295 defined, 13 as hopeful, 327–28 inverse reinforcement learning and, 255 parenting and, 166 reinforcement learning and, 151 technical limitations and, 313, 395–96n4 thermostats and, 311–12, 313 See also value alignment Allen, Woody, 170 AlphaGo, 162

ensemble methods, 284–85, 305 equiprobabiliorism, 303 equivant (company), 337n5 ergodicity assumption, 320 Ermon, Stefano, 324 ethics actualism vs. possibilism and, 239, 379n71 in reinforcement learning, 149 See also AI safety; fairness; moral uncertainty evaluation function. See value function Evans, Owain, 386–87n55 evolution, 170, 171–74, 368n56 expectations, 138–39

, 316, 342n61 See also Google research Google Brain, 113, 167, 373n53 Google research differential privacy, 347n33 fairness, 73 feature visualization, 110 multitask learning models, 107 reinforcement learning, 167 selective classification, 390n29 value alignment, 247 word embedding, 44 Gopnik, Alison, 194, 215 gorilla tag incident, 25–26, 316, 339n24 GPT-2, 344

238–39, 379n69 OpenAI actualism vs. possibilism, 239 amplification, 248–50 corrigibility, 296 feature visualization, 112, 357n69 intrinsic motivation, 199–200, 201 inverse reinforcement learning, 263–66, 384–85n37 reinforcement learning, 365n27 word embedding, 344–45n94 open category problem, 279–81, 315, 396nn10–11 ophthalmology, 287, 389n23 optimal regressions, 95–96 optimal reward problem

Herbert, 327 Reality Is Broken (McGonigal), 175 recidivism. See risk-assessment models rectified linear output functions, 24 Reddy, Raj, 223 redundant encodings, 40, 64, 343n74 reinforcement learning (RL) actor-critic architecture, 138, 362n51 actualism vs. possibilism and, 238–40, 379n69 addiction and, 205–08, 374n65 alignment problem and, 151 Arcade Learning Environment

143, 145 Research Institute for Advanced Studies, 383n15 Revels, Hiram, 27 reversibility, 391n39 reward hypothesis, 130–31, 133–34, 360nn26, 28 See also reinforcement learning rewards. See incentives; reinforcement learning; reward hypothesis; shaping right to explanation. See transparency rigorism, 303, 304 risk-assessment models COMPAS development, 56–57, 346nn13–14 defenses of, 68, 72

The AI-First Company

by Ash Fontana  · 4 May 2021  · 296pp  · 66,815 words

the point of difference: simulations find failures in normal software but find improvements in AIs. Simulations thus present a significant opportunity to improve AIs, particularly reinforcement learning and other, agent-based learning AIs. Such models typically need to try multiple approaches, and, since existing datasets are insufficient or unavailable, simulators are useful

in a specific domain—understanding the “rules of the game,” or the principles of the system. Programmers create ABMs using techniques such as adversarial and reinforcement learning. Popular agent-based systems include some that play John Conway’s Game of Life and solve the prisoner’s dilemma. Financial and political institutions often

five major types of ML used today. Supervised and unsupervised ML are two types that differ in the degree of human involvement at every step. Reinforcement learning is a functionally different approach to supervised and unsupervised ML. Transfer and deep learning overlap with the other types. The table below shows what might

boosted types), regression, support vector machines (SVMs), and neural networks clustering (k-means, hierarchical, and others) and Gaussian mixture models* various, but all forms of reinforcement learning Bayesian networks and Markov logic networks convolutional neural networks and recurrent neural networks COMPOUNDING There are many different methods for making predictions, each one generating

much more effective than the previous models. With each automatic run of the network, GANs add features until there is no more discrimination to do. Reinforcement learning (RL): Reinforcement learning, an area of ML, involves developing agents that optimize for a reward—in other words, creating a (software) agent that has ML at its

algorithms Algorithm performance Algorithm performance Feed-forward networks Predictions None Recurrent neural networks Predictions Predictions Convolutional neural networks Features Features Generative adversarial networks Features Features Reinforcement learning Observational Observational CONCLUSION Getting started with machine learning is easier than ever, even as the frontier of cutting-edge research is shifting faster than ever

method based on available data. Supervised learning needs training and feedback data, whereas unsupervised ML just requires lots of data. Some models need an objective. Reinforcement-learned models need objectives. Other forms of ML generally do not, and they will even surface information without objectives. Learn to learn. Some AIs generate data

: plot that shows how well the model performed at different discrimination thresholds, e.g., true and false positive rates RECURSION: repeated application of a method REINFORCEMENT LEARNING: ML that learns from objectives RETURN ON INVESTMENT (ROI): calculated by dividing the return from using an asset by the investment in that asset ROI

networks in, 152, 153 inductive logic programming in, 149, 153 machine learning in, 151–52 primer for, 145–47 recurrent neural networks in, 151, 153 reinforcement learning in, 152, 153 statistical analysis in, 149, 153 machine learning models, managing, 155–86 acceptance, 157, 162–66 accountability and, 164 and augmentation versus automation

forest, 53, 64, 279 recall, 279 receiver operating characteristic (ROC) curve, 205–6, 279 recurrent neural networks (RNNs), 151, 153 recursion, 150, 279 regression, 64 reinforcement learning (RL), 103, 147–48, 152, 153, 279 relevance of data, 74–75 reliability, 175 reports, 171 research and development (R & D), 42 cost analysis, 217

Architects of Intelligence

by Martin Ford  · 16 Nov 2018  · 586pp  · 186,548 words

data. This explains why companies that control huge amounts of data, like Google, Amazon, and Facebook, have such a dominant position in deep learning technology. REINFORCEMENT LEARNING essentially means learning through practice or trial and error. Rather than training an algorithm by providing the correct, labeled outcome, the learning system is set

for itself, and if it succeeds it is given a “reward.” Imagine training your dog to sit, and if he succeeds, giving him a treat. Reinforcement learning has been an especially powerful way to build AI systems that play games. As you will learn from the interview with Demis Hassabis in this

book, DeepMind is a strong proponent of reinforcement learning and relied on it to create the AlphaGo system. The problem with reinforcement learning is that it requires a huge number of practice runs before the algorithm can succeed. For this reason, it

is primarily used for games or for tasks that can be simulated on a computer at high speed. Reinforcement learning can be used in the development of self-driving cars—but not by having actual cars practice on real roads. Instead virtual cars are trained

coming from their environments. This is how human beings learn. Young children, for example, learn languages primarily by listening to their parents. Supervised learning and reinforcement learning also play a role, but the human brain has an astonishing ability to learn simply by observation and unsupervised interaction with the environment. Unsupervised learning

LECUN A human can learn to drive a car in 15 hours of training without crashing into anything. If you want to use the current reinforcement learning methods to train a car to drive itself, the machine will have to drive off cliffs 10,000 times before it figures out how not

of learning where you don’t train for a task, you just observe the world and figure out how it works, essentially. MARTIN FORD: Would reinforcement learning, or learning by practice with a reward for succeeding, be in the category of unsupervised learning? YANN LECUN: No, that’s a different category altogether

. There are three categories essentially; it’s more of a continuum, but there is reinforcement learning, supervised learning, and self-supervised learning. Reinforcement learning is learning by trial and error, getting rewards when you succeed and not getting rewards when you don’t succeed. That form

for games, where you can try things as many times as you want, but doesn’t work in many real-world scenarios. You can use reinforcement learning to train a machine to play Go or chess. That works really well, as we’ve seen with AlphaGo, for example, but it requires a

performance, and it works really well if you can do that, but it is often impractical in the real world. If you want to use reinforcement learning to train a robot to grab objects, it will take a ridiculous amount of time to achieve that. A human can learn to drive a

car in 15 hours of training without crashing into anything. If you want to use the current reinforcement learning methods to train a car to drive itself, the machine will have to drive off cliffs 10,000 times before it figures out how not

for the fact that the kind of learning that we can do as humans is very, very different from pure reinforcement learning. It’s more akin to what people call model-based reinforcement learning. This is where you have your internal model of the world that allows you to predict that when you turn

result, you can plan ahead and not take the actions that result in bad outcomes. Learning to drive in this context is called model-based reinforcement learning, and that’s one of the things we don’t really know how to do. There is a name for it, but there’s no

lot of fundamental research and questions on machine learning, so things that have more to do with applied mathematics and optimization. We are working on reinforcement learning, and we are also working on something called generative models, which are a form of self-supervised or predictive learning. MARTIN FORD: Is Facebook working

labels. As kids, we watch how other humans do things and then we do it; so, the field is now starting to get into inverse reinforcement learning algorithms, and neuro-programming algorithms. There is a lot of new exploration, and DeepMind is doing that. Google Brain is doing that; Stanford is doing

multidimensional—and one that has the kind of learning capability that humans do, which is not only through big data but also through unsupervised learning, reinforcement learning, virtual learning, and various kinds of learning. If we use that as a definition of AGI, then I think the path to AGI is a

these general algorithms that we can apply to real-world problems. MARTIN FORD: So far, your focus has primarily been on combining deep learning with reinforcement learning. That’s basically learning by practice, where the system repeatedly attempts something, and there’s a reward function that drives it toward success. I’ve

heard you say that you believe that reinforcement learning offers a viable path to general intelligence, that it might be sufficient to get there. Is that your primary focus going forward? DEMIS HASSABIS: Going

forward, yes, it is. I think that technique is extremely powerful, but you need to combine it with other things to scale it. Reinforcement learning has been around for a long time, but it was only used in very small toy problems because it was very difficult for anyone to

did the processing of the screen, and the model of the environment you’re in. Deep learning is amazing at scaling, so combining that with reinforcement learning allowed it to scale to these large problems that we’ve now tackled in AlphaGo and DQN—all of these things that people would have

proved that first part. The reason we were so confident about it and why we backed it when we did was because in my opinion reinforcement learning will become as big as deep learning in the next few years. DeepMind is one of the few companies that take that seriously because, from

the neuroscience perspective, we know that the brain uses a form of reinforcement learning as one of its learning mechanisms, it’s called temporal difference learning, and we know the dopamine system implements that. Your dopamine neurons track the

be a viable solution to the problem of general intelligence. It may not be the only one, but from a biologically inspired standpoint, it seems reinforcement learning is sufficient once you scale it up enough. Of course, there are many technical challenges with doing that, and many of them are unsolved. MARTIN

FORD: Still, when a child learns things like language or an understanding of the world, it doesn’t really seem like reinforcement learning for the most part. It’s unsupervised learning, as no one’s giving the child labeled data the way we would do with ImageNet. Yet

their peers and they do unsupervised learning when they’re just experimenting with stuff, with no goal in mind. They also do reward learning and reinforcement learning when they do something, and they get a reward for it. We work on all three of those, and they’re all going to be

could be intrinsic rewards that could be guiding the unsupervised learning. I find that it is useful to think about intelligence in the framework of reinforcement learning. MARTIN FORD: One thing that’s obvious from listening to you is that you combine a deep interest in both neuroscience and computer science. Is

, having neuroscience as a guide can allow me to make much bigger, much stronger bets on things like that. A great example of this is reinforcement learning. I know reinforcement learning has to be scalable because the brain does scale it. If you didn’t know that the brain implemented

reinforcement learning and it wasn’t scaling, how would you know on a practical level if you should spend another two years on this? It’s very

up doing my bachelor’s at Carnegie Mellon, my master’s from MIT and a PhD, with a thesis titled, Shaping and Policy Search in Reinforcement Learning, from the University of California, Berkeley. For about the next twelve years I taught at the Stanford University Department of Computer Science and the Department

’re only just discovering the power of techniques such as deep learning and neural networks in their many forms, as well as other techniques like reinforcement learning and transfer learning. These techniques all still have enormous headroom; we’re only just scratching the surface of where they can take us. Deep learning

that learning in totally new environments or on a previously unencountered problem, over there. There are definitely some exciting new techniques coming up, whether in reinforcement learning or even simulated learning—the kinds of things that AlphaZero has begun to do—where you self-learn and self-create structures, as well start

but I think ultimately inadequate idea that we are seeing in the field right now. What we see at the moment is people doing deep reinforcement learning over pixels of, for example, the Atari game Breakout, and while you get results that look impressive, they’re incredibly fragile. DeepMind trained an AI

think that some sort of built-in template or structure should be built into an AI system so it can create causal models? DeepMind uses reinforcement learning, which is based on practice or trial and error. Perhaps that would be a way of discovering causal relationships? JUDEA PEARL: It comes into it

, but reinforcement learning has limitations, too. You can only learn actions that have been seen before. You cannot extrapolate to actions that you haven’t seen, like raising

-hanging fruits. MARTIN FORD: Looking to the future, do you think that neural networks are going to be very important? JUDEA PEARL: Neural networks and reinforcement learning will all be essential components when properly utilized in causal modeling. MARTIN FORD: So, you think it might be a hybrid system that incorporates not

of that is also focused on that problem is DeepMind, but I’m struck by how different your approach is. DeepMind is focused on deep reinforcement learning through games and simulated environments, whereas what I hear from you is that the path to intelligence is through language. DAVID FERRUCCI: Let’s restate

the end of their arm, why can’t a robot? There’s something dramatic missing. MARTIN FORD: I have seen reports that deep learning and reinforcement learning is being used to have robots learn to do things by practicing or even just by watching YouTube videos. What’s your view on this

came from people who were trying to understand how human intelligence works. That includes the basic mathematics of what we now call deep learning and reinforcement learning, but also much further back to Boole as one of the inventors of mathematical logic, or Laplace in his work on probability theory. In more

at achieving more general intelligence by modeling an evolutionary approach? JOSH TENENBAUM: Well, a number of people at DeepMind and others who follow the deep reinforcement learning ethos would say they’re thinking about evolution in a more general sense, and that’s also a part of learning. They’d say their

, machine learning and other AI-related fields, and their papers have received awards at venues across the AI landscape, including leading conferences in computer vision, reinforcement learning and decision-making, robotics, uncertainty in AI, learning and development, cognitive modeling and neural information processing. They have introduced several widely used AI tools and

contribution to AI safety, at least as valuable as worrying about the alignment problem, which ultimately is just a technical problem having to do with reinforcement learning and objective functions. So, I wouldn’t say that we’re underinvesting in being prepared for AI safety, and certainly some of the work that

The Deep Learning Revolution (The MIT Press)

by Terrence J. Sejnowski  · 27 Sep 2018

of the brain, which receive projections from the entire cerebral cortex and project back to it, solve this problem with a temporal difference algorithm and reinforcement learning. AlphaGo used the same learning algorithm that the basal ganglia evolved to evaluate sequences of The Rise of Machine Learning 17 Figure 1.8 Go

. Professional expertise is also based on learning in narrow domains. We are all professionals in the domain of language and practice it every day. The reinforcement learning algorithm used by AlphaGo can be applied to many problems. This form of learning depends only on the reward given to the winner at the

Richard Sutton,3 who had been working closely with Andrew Barto, his doctoral advisor, at the University of Massachusetts at Amherst, on difficult problems in reinforcement learning, a branch of machine learning inspired by associative learning in animal experiments (figure 10.2). Unlike a deep learning network, whose only job is to

is shown. Two dice are rolled, and the two numbers indicate how far two pieces can be moved ahead. Reward Learning 145 Figure 10.2 Reinforcement learning scenario. The agent actively explores the environment by taking actions and making observations. If an action is successful, the agent receives a reward. The goal

is to learn actions that maximize future rewards. decisions, and taking actions. Reinforcement learning is based on the observation that animals solve difficult problems in uncertain conditions by exploring the various options in the environment and learning from their

Edmonton in 2006. He taught us how to learn the route to future rewards. Rich is a cancer survivor who has remained a leader in reinforcement learning and continues to develop innovative algorithms. He is generous with his time and insights, which everyone in the field greatly values. His book with Andrew

Barto, Reinforcement Learning: An Introduction, is a classic in the field. The second edition is freely available on the Internet. Courtesy of Richard Sutton. and better. The update

expect that TD-Gammon would first learn the endgame, then the middle game, and finally the openings. This is in fact what happens in “tabular reinforcement learning,” or “tabular RL,” where there is a table of values for every state in the state space. But it’s completely different with neural networks

constant reward even though the average is the same.12 Dopamine neurons are also found in flies and have been shown to comprise several parallel reinforcement learning pathways for both short-term and long-term associative memories.13 Motivation and the Basal Ganglia Dopamine neurons constitute a core system that controls motivation

glider with a six-foot wingspan and taught it to soar and stay aloft.19 Learning How to Sing Another example of the power of reinforcement learning is the parallel between how birds learn to sing and how children learn to speak. In both cases, an initial period of auditory learning is

the motor learning phase in both humans and songbirds are in the basal ganglia, where we know that reinforcement learning takes place. In 1995, Kenji Doya, a postdoctoral fellow in my lab, developed a reinforcement learning model for the motor refinement of birdsong (figure 10.7). The model improved its performance by tweaking synapses

in babies. There are many domain-specific learning and memory systems in brains that must work together toward the acquisition of new skills and the reinforcement learning algorithm for learning birdsongs in songbirds and temporal difference learning algorithm in the reward system for monkeys, humans, bees, and other animals are only two

seeing and hearing, there are many other aspects of human intelligence where advances are needed in artificial intelligence. Representation learning in the cortex together with reinforcement learning in the basal ganglia powerfully complement each other. Can AI learning to play championship Go translate to solving other complex problems? Much of human learning

other parts of the oculomotor system. Learning also involves the basal ganglia, an important part of the vertebrate brain that learns sequences of actions through reinforcement learning.21 The difference between the expected and received reward is signaled by a transient increase in the firing rate of dopamine neurons in the midbrain

part of the world, only the part needed at any moment to carry out the task at hand.4 This also makes it easier for reinforcement learning to narrow down the number of possible sensory inputs that contribute to obtaining rewards. The apparent modularity of vision (its relative separateness from other sensory

to explain cognition. But with the symbolic approach, artificial intelligence never achieved cognitive levels of performance. B. F. Skinner was on the right track with reinforcement learning, which Chomsky derided: today’s most compelling AI applications are based on learning, not logic. Courtesy of the New York Review of Books. 250 Chapter

the problem of automatically parsing sentences, something that Chomsky’s “abstract theories” of syntax never accomplished despite strenuous efforts by computational linguists. When coupled with reinforcement learning, whose study in animals Skinner pioneered, complex problems can be solved that depend on making a sequence of choices to achieve a goal. This is

, when coupled with deep learning of the environment and a deeply learned value function honed by a lifetime of experience, a weak learning system like reinforcement learning can indeed give rise to cognitive behaviors, including language. This was not at all obvious to me in the 1980s, although I should have realized

new feature from brain architecture has boosted the functionality of deep learning networks: the hierarchy of cortical areas; the brain’s coupling of deep with reinforcement learning; working memory in recurrent cortical networks; and long-term memory of facts and events—to name just a few. There are many more computational principles

applications.” The conference was supposed to be a celebration of the progress we had made, so his rebuke stung. My talk about recent progress with reinforcement learning and the remarkable results achieved by TD-Gammon in teaching networks to play champion-level backgammon had not impressed him. He dismissed this as a

network that can be written to and read back with the same flexibility as a digital computer memory, the researchers demonstrated a network trained with reinforcement learning that could answer questions that required reasoning. For example, one such network reasoned about paths in the London Underground and another answered questions about genealogical

wide range of environments. More complex forms of intelligence are found in multicellular animals. We have seen that the temporal difference learning algorithm that underlies reinforcement learning can lead to highly complex behaviors, made still more complex in humans by deep learning in the cerebral cortex. There is a spectrum of intelligent

being optimized. In the brain, there are some innate costs that regulate behavior, such as the need for food, warmth, safety, oxygen, and procreation. In reinforcement learning, actions are taken to optimize future rewards. But beyond rewards that insure survival, a wide range of rewards can be optimized, as is apparent from

Acknowledgments 273 1989; my book The Computational Brain in 1992; and many other foundational books on machine learning, including Richard Sutton and Andrew Barto’s Reinforcement Learning: An Introduction, and the leading textbook Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville. The Press’s Robert Prior helped guide the present

(MIT Press, 2014) by Kevin P. Murphy is a compendium that covers the broader range of machine learning algorithms. Deep reinforcement learning is at the forefront of research, and the definitive textbook is Reinforcement Learning: An Introduction (MIT Press, 1998) by Richard S. Sutton and Andrew G. Barto (online draft of forthcoming second edition

based on examples. Learning algorithms are said to be “supervised” when both inputs and desired outputs are given or “unsupervised” when only inputs are given. Reinforcement learning is a special case of a supervised learning algorithm when the only feedback is a reward for good performance. logic Mathematical inference based on assumptions

Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis, Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, arXiv:1712.01815 (2017). 37. Harold Gardner, Frames of Mind: The Theory of Multiple Intelligences, 3rd ed. (New York: Basic Books, 2011). 38. J

(1970): 329–337. 15. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, et al., “Human-Level Control through Deep Reinforcement Learning,” Nature 518, no. 7540 (2015): 529–533. 16. Simon Haykin, Cognitive Dynamic System: Perception-Action Cycle, Radar, and Radio (New York: Cambridge University Press, 2012

113, no. 33 (2016): E4877–E4884. 19. G. Reddy, J. W. Ng, A. Celani, T. J. Sejnowski, and M. Vergassola, “Soaring Like a Bird via Reinforcement Learning in the Field,” submitted for publication. 20. Kenji Doya and Terrence J. Sejnowski, “A Novel Reinforcement Model of Birdsong Vocalization Learning,” in Gerald Tesauro, David

of objects, 37 of scenes, 78 Rectified linear units (ReLUs), 131–132 Recurrent neural network (RNN), 136, 159 Reddy, Gautam, 156f Regularization techniques, 119–121 Reinforcement learning scenario, 144, 145f Rekimoto, Jun, 7 Representation learning, 111b Retina, 64f, 65, 300n13 David Marr and, 53 Dynamic Vision Sensor (DVS) and, 211, 212f frog

, 301n22 Synaptic plasticity, 67–70, 158–159, 241 Hebbian, 79, 95b, 101–102, 133, 213 340 Systems Biology, Institute for, 230 Szalay, Alex, 164 Tabular reinforcement learning (tabular RL), 148 Tallal, Paula, 184, 190, 308n25 Tank, David W., 94, 96, 297n10 Taste aversion learning, 150 Tchernichovski, Ofer, 157f TD-Gammon, 34, 146

Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms: Proceedings of the Agi Workshop 2006

by Ben Goertzel and Pei Wang  · 1 Jan 2007  · 303pp  · 67,891 words

Hunting: A Thought-Experiment in Embodied Social Learning, Cognitive Process Integration, and the Dynamic Emergence of the Self 217 Ben Goertzel viii Probabilistic Logic Based Reinforcement Learning of Simple Embodied Behaviors in a 3D Simulation World Ari Heljakka, Ben Goertzel, Welter Silva, Cassio Pennachin, Andre’ Senna and Izabela Goertzel How Do We

al chapter discusses the learning of some very simple behaviors for a simulated humanoid agent in the AGISim 3D simulation world, via a pure “embodied reinforcement learning” methodology. In Piagetan terms, these are “infantile-level” tasks, but to achieve them within the Novamente architecture nevertheless requires a fairly subtle integration of various

or empirical analysis of particular problems or domains, we will study how focus management can be learned. One idea is to use state-action-reward reinforcement learning algorithms to automatically generate focus management schemes. The actions are the choice of attention fixation. The state will be both the state of the environment

be given here. Inference control in Novamente takes several forms: 1. Standard forward-chaining and backward-chaining inference heuristics (see e.g. [45]) 2. A reinforcement learning mechanism that allows inference rules to be chosen based on experience. Probabilities are tabulated regarding which inference rules have been useful in the past in

used to bias the choices of inference rules during forward or backward chaining inference 3. Application of PLN inference to the probabilities used in the reinforcement learning mechanism--enables generalization, abstraction and analogy to be used in guessing which inference rules are most useful in a given context These different approaches to

, Architectures and Algorithms B. Goertzel and P. Wang (Eds.) IOS Press, 2007 © 2007 The authors and IOS Press. All rights reserved. 253 Probabilistic Logic Based Reinforcement Learning of Simple Embodied Behaviors in a 3D Simulation World Ari HELJAKKA*, Ben GOERTZEL, Welter SILVA, Cassio PENNACHIN, Andre’ SENNA and Izabela GOERTZEL Novamente LLC * heljakka

such as theorem-proving and linguistic semantics. In the Novamente AGI architecture, however, probabilistic logic is used for a wider variety of purposes, including simple reinforcement learning of infantile behaviors, which are primarily concerned with perception and action rather than abstract cognition. This paper reports some simple experiments designed to validate the

viability of this approach, via using the PLN probabilistic logic framework, implemented within the Novamente AGI architecture, to carry out reinforcement learning of simple embodied behaviors in a 3D simulation world (AGISim). The specific experiment focused upon involves teaching Novamente to play the game of “fetch” using

reinforcement learning based on repeated partial rewards. Novamente is an integrative AGI architecture involving considerably more than just PLN; however, in this “fetch” experiment, the only cognitive

environment). And, where learning human language is concerned, embodiment gives the opportunity for robust symbol grounding [3]. 254 A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning On the other hand, perhaps the biggest drawback of embodiment from the AGI developer’s perspective is pragmatic in nature. Building, maintaining and using robots

an AI controls in AGISim is designed to bear suffi1 2 sourceforge.net/projects/agisim. crystal.sourceforge.net. A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning 255 cient resemblance to a simple humanoid robot to enable the fairly straightforward porting of control routines learned in AGISim to a physical robot. Our

one narrow domain or range of cognitive functions. The NAIE integrates aspects of prior AI projects and approaches, including symbolic, neural-network, evolutionary programming and reinforcement learning. The existing codebase is being applied in bioinformatics, NLP and other domains. To save space, some of the discussion in this paper will assume a

to find it. Human babies less than 9 months of age will often look in location A, 256 A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning but older babies look in B. The NAIE learns through interactive experience to look in location B – it learns that objects exist even when unobserved

role of the dog. Figure 1. Dogs playing fetch, correctly (bottom) and incorrectly (middle). (Illustrations by Zebulon Goertzel). A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning 257 Figure 2. A screenshot of Novamente and its teacher playing fetch in the AGISim simulation world. Here Novamente is returning the ball to the

be made to learn similar behaviour in a richer environment and with a larger set of possible actions. A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning 258 3. Learning Fetch Within the Novamente Architecture Novamente is an integrative AGI architecture, in which the highest levels of intelligence are intended to be

the current incomplete version of the integrated Novamente cognition. From a Novamente point of view, it is interesting mostly as a “smoke test” for embodied reinforcement learning, to indicate that the basic mechanisms required for cognitive interaction with AGISim are integrated adequately and working correctly. As noted above, fetch can be learned

an external vision-processing module that supplies either voxel or polygon vision inputs of the sort described above. A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning 259 The object-recognition problem is restricted to the first two cases. We have not dealt with the voxel vision case so far, but have

via much simpler methods than anything in Novamente) but on what it teaches us about the use of probabilistic inference in the context of embodied reinforcement learning. 4. Pattern Mining in Novamente The “pattern mining” step mentioned above has not been discussed in previous publications on Novamente. We mention it briefly here

very simple, but may be made more sophisticated in future. The inputs of pattern mining are, in general, A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning 260 • • • Atoms denoting raw outputs of the “sensors” that Novamente possesses in the AGISim world Atoms indicating actions Novamente has taken, e.g. in the

in full-fledged Novamente Node/Link notation, SequentialAND SimultaneousAND EvaluationLink holding ball EvaluationLink near ListLink (me, teacher) Reward A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning 261 The predicates required here are near() and holding(), as well as the primitive “sensation” of Reward. Given these predicates as primitives, the mining of

PLN backward chainer is to find some way to prove that if some actionable predicates become true, then A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning 262 Evaluation (Reward) becomes true. This inference is possible by assuming that trying out actions is always possible, ie. the actions are considered to be

y – termination of x) For example, in many cases one may use w(x,y) = k/(diff+k) A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning 263 where k is an adjustable parameter. For more details on temporal links and the event calculus variant we use, see [18]. In order to

. Firstly, the ModusPonensRule is unsurprisingly a probabilistic version of modus ponens, i.e. Implication A B A |B A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning 264 Modus Ponens can also be applied to PredictiveImplications, insofar as the system keeps track of the structure of the proof tree so as to

and schemata. The learned plan discussed here is a representative one that lends itself relatively well to discussion. A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning 265 Firstly, we define the specific predicates used as primitives for this learning experiment: • • • • • • Reward – a built-in sensation corresponding to the Novamente agent getting

EvaluationLink goto ball etc. The two stages of the reward function are: Stage 1 PredImp holding ball Reward A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning 266 Stage 2 PredImp SeqAnd holding ball done goto teacher done drop ball Reward 6.2. Knowledge Gained via Pattern Mining Next, in the course

of this simple task of fetch, PLN can do the learning job quite efficiently with support only from A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning 267 pattern mining, without needing support from more sophisticated pattern recognition tools like MOSES. The final inference trajectory follows. First of all, the inference target

] "try":PredicateNode <0,0> [6552272] ExecutionLink [7505792] "drop":GroundedSchemaNode <0,0> [6564640] "Ball":ConceptNode <0,0> [6559856] 268 A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning Graphically, the previous two link constructs would be denoted Figure 3. and Figure 4. Respectively. In the rest of this discussion we will often substitute

produced by applying SimpleANDRule to its three child EvaluationLinks. The EvaluationLink [104300720] was produced by applying ModusPonensRule to: A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning 269 Figure 5. which was mined from perception data, and to Figure 6. The SequentialANDLink [104307776] was produced by applying SimpleANDRule to its two child

EvaluationLinks. The EvaluationLink [72926800] was produced by applying RewritingRule to: Figure 7. 270 A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning and Figure 8. The EvaluationLink [72916304], as well as all other try statements, were considered axiomatic, and technically produced by applying CrispUnificationRule to: Figure 9

, and technically produced by applying CrispUnificationRule to: Figure 10. The EvaluationLink [72913264] was produced by applying RewritingRule to: A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning 271 Figure 11. and Figure 12. Returning to the first PredictiveImplicationLink’s children, EvaluationLink [72895584] was produced by applying RewritingRule to: Figure 13. and Figure

14. A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning 272 which were both axiomatic. QED! For illustration, we finally present here an example of a plan the agent formed during the partial reward stage

, 0> [104296656] [6565264] [104296656] was produced by applying ModusPonensRule to: Figure 15. On the other hand, Figure 16. A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning 273 This causes the system to come up with the following plan which works but contains obvious redundancy: ExecutionLink <0,0.00062> [6888032] "goto":GroundedSchemaNode

to deal with a wide variety of learning tasks corresponding to the full range of levels of 274 A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning cognitive development. Making all this “general infrastructure” work together to yield a simple behavior like fetch is a lot of work – the infrastructure doesn’t

Maia and others), and earlier versions of PLN (mainly Guilherme Lamacie, the late Jeff Pressing, and Pei Wang). A. Heljakka et al. / Probabilistic Logic Based Reinforcement Learning 275 References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] Guha, R. V. and Lenat

The Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter

by Joseph Henrich  · 27 Oct 2015  · 631pp  · 177,227 words

Oceania.” Proceedings of the Royal Society B: Biological Sciences 277 (1693): 2559–2564. Klucharev, V., K. Hytonen, M. Rijpkema, A. Smidts, and G. Fernandez. 2009. “Reinforcement learning signal predicts social conformity.” Neuron 61 (1):140–151. Knauft, B. M. 1985. Good Company and Violence: Sorcery and Social Action in a Lowland New

Human Compatible: Artificial Intelligence and the Problem of Control

by Stuart Russell  · 7 Oct 2019  · 416pp  · 112,268 words

money that provide eventual reward rather than immediate reward. One reason we understand the brain’s reward system is that it resembles the method of reinforcement learning developed in AI, for which we have a very solid theory.4 From an evolutionary point of view, we can think of the brain’s

’t design decision procedures that work only for Go. Instead, they made improvements to two fairly general-purpose techniques—lookahead search to make decisions and reinforcement learning to learn how to evaluate positions—so that they were sufficiently effective to play Go at a superhuman level. Those improvements are applicable to many

enormous and the reward comes only at the end of the game, lookahead search won’t work. Instead, AI researchers have developed a method called reinforcement learning, or RL for short. RL algorithms learn from direct experience of reward signals in the environment, much as a baby learns to stand up from

states (or sometimes the value of actions). This estimator can be combined with relatively myopic lookahead search to generate highly competent behavior. The first successful reinforcement learning system was Arthur Samuel’s checkers program, which created a sensation when it was demonstrated on television in 1956. The program learned essentially from scratch

the program defeated the world’s top professional Dota 2 team.64 Games such as Go and Dota 2 are a good testing ground for reinforcement learning methods because the reward function comes with the rules of the game. The real world is less convenient, however, and there have been dozens of

search, the vehicle has to find a trajectory that optimizes some combination of safety and progress. Some projects are trying more direct approaches based on reinforcement learning (mainly in simulation, of course) and supervised learning from recordings of hundreds of human drivers, but these approaches seem unlikely to reach the required level

Boston Dynamics for some of the more complex parts of their Atlas humanoid robot. Robot manipulation skills are advancing rapidly, thanks in part to deep reinforcement learning.20 The final push—putting all this together into something that begins to approximate the awesome physical skills of movie robots—is likely to come

’s important to understand that I’m not asking whether we can train a robot to stand up, which can be done simply by applying reinforcement learning with a reward for the robot’s head being farther away from the ground.46 Training a robot to stand up requires that the human

money (or to coerce behavior, if the goal is political control or espionage). The extraction of money works as the perfect reward signal for a reinforcement learning algorithm, so we can expect AI systems to improve rapidly in their ability to identify and profit from misbehavior. Early in 2015, I suggested to

a computer security expert that automated blackmail systems, driven by reinforcement learning, might soon become feasible; he laughed and said it was already happening. The first blackmail bot to be widely publicized was Delilah, identified in July

, methods of control can be direct if a government is able to implement rewards and punishments based on behavior. Such a system treats people as reinforcement learning algorithms, training them to optimize the objective set by the state. The temptation for a government, particularly one with a top-down, engineering mind-set

with you for hours every day, controls your access to information, and provides much of your entertainment through games, TV, movies, and social interaction. The reinforcement learning algorithms that optimize social-media click-through have no capacity to reason about human behavior—in fact, they do not even know in any meaningful

more or less any objective to pursue—including maximizing the number of paperclips or the number of known digits of pi. This is just how reinforcement learning systems and other kinds of reward optimizers work: the algorithms are completely general and accept any reward signal. For engineers and computer scientists operating within

-crawlies and had built a little treadmill for cockroaches to see how their gait changed with speed. We thought it might be possible to use reinforcement learning to train a robotic or simulated insect to reproduce these complex behaviors. The problem we faced was that we didn’t know what reward signal

to use. What were the flies and cockroaches optimizing? Without that information, we couldn’t apply reinforcement learning to train the virtual insect, so we were stuck. One day, I was walking down the road that leads from our house in Berkeley to

and planting them less stiffly because of the unpredictable ground level. As I pondered these mundane observations, I realized we had got it backwards. While reinforcement learning generates behavior from rewards, we actually wanted the opposite: to learn the rewards given the behavior. We already had the behavior, as produced by the

flies and cockroaches; we wanted to know the specific reward signal being optimized by this behavior. In other words, we needed algorithms for inverse reinforcement learning, or IRL.4 (I did not know at the time that a similar problem had been studied under the perhaps less wieldy name of structural

-circuit normal behavior in favor of direct stimulation of their own reward system is called wireheading. Could something similar happen to machines that are running reinforcement learning algorithms, such as AlphaGo? Initially, one might think this is impossible, because the only way that AlphaGo can gain its +1 reward for winning is

board and nothing else—because there is nothing else in AlphaGo’s model of the world. This setup corresponds to the abstract mathematical model of reinforcement learning, in which the reward signal arrives from outside the universe. Nothing AlphaGo can do, as far as it knows, has any effect on the code

-signal maximizer will wirehead. The AI safety community has discussed wireheading as a possibility for several years.25 The concern is not just that a reinforcement learning system such as AlphaGo might learn to cheat instead of mastering its intended task. The real issue arises when humans are the source of the

reward signal. If we propose that an AI system can be trained to behave well through reinforcement learning, with humans giving feedback signals that define the direction of improvement, the inevitable result is that the AI system works out how to control the

of pointless self-delusion on the part of the AI system, and you’d be right. But it’s a logical consequence of the way reinforcement learning is defined. The process works fine when the reward signal comes from “outside the universe” and is generated by some process that can never be

we avoid this kind of self-delusion? The problem comes from confusing two distinct things: reward signals and actual rewards. In the standard approach to reinforcement learning, these are one and the same. That seems to be a mistake. Instead, they should be treated separately, just as they are in assistance games

that AlphaGo “has the purpose of winning” is an oversimplification. A better description would be that AlphaGo is the result of an imperfect training process—reinforcement learning with self-play—for which winning was the reward. The training process is imperfect in the sense that it cannot produce a perfect Go player

(in which present-day reality turns out to be an illusion produced by a computer simulation) and recent work on the self-delusion problem in reinforcement learning.14 These examples, and more, convince me that the AI community should pay careful attention to the thrusts and counterthrusts of philosophical and economic debates

remained fairly constant, however, so the remaining preferences must arise from cultural and family influences. Quite possibly, children are constantly running some form of inverse reinforcement learning to identify the preferences of parents and peers in order to explain their behavior; children then adopt these preferences as their own. Even as adults

methods. Taken together, these are three of the most important application areas for AI. Deep learning has also played an important role in applications of reinforcement learning—for example, in learning the evaluation function that AlphaGo uses to estimate the desirability of possible future positions, and in learning controllers for complex robotic

may not have this effect, unless it is wrapped within an A/B testing framework (as is common in online marketing settings). Bandit algorithms and reinforcement learning algorithms will have this effect if they operate with an explicit representation of user state or an implicit representation in terms of the history of

face many important moral issues for which we are largely unprepared. 4. The following paper was among the first to make a clear connection between reinforcement learning algorithms and neurophysiological recordings: Wolfram Schultz, Peter Dayan, and P. Read Montague, “A neural substrate of prediction and reward,” Science 275 (1997): 1593–99. 5

for bots,” Wired, January 25, 2019. 49. AlphaZero is described by David Silver et al., “Mastering chess and shogi by self-play with a general reinforcement learning algorithm,” arXiv:1712.01815 (2017). 50. Optimal paths in graphs are found using the A* algorithm and its many descendants: Peter Hart, Nils Nilsson, and

as absolute rewards; by fixing the value of material to be positive; however, the program generally tended to work towards winning. 61. The application of reinforcement learning to produce a world-class backgammon program: Gerald Tesauro, “Temporal difference learning and TD-Gammon,” Communications of the ACM 38 (1995): 58–68. 62. The

DQN system that learns to play a wide variety of video games using deep RL: Volodymyr Mnih et al., “Human-level control through deep reinforcement learning,” Nature 518 (2015): 529–33. 63. Bill Gates’s remarks on Dota 2 AI: Catherine Clifford, “Bill Gates says gamer bots from Elon Musk-backed

control using generalized advantage estimation,” arXiv:1506.02438 (2015). A video demonstration is available at youtube.com/watch?v=SHLuf2ZBQSw. 47. A description of a reinforcement learning system that learns to play a capture-the-flag video game: Max Jaderberg et al., “Human-level performance in first-person multiplayer games with population

-based deep reinforcement learning,” arXiv:1807.01281 (2018). 48. A view of AI progress over the next few years: Peter Stone et al., “Artificial intelligence and life in 2030

under rational expectations,” Journal of Political Economy 86 (1978): 1009–44. 6. The first algorithms for IRL: Andrew Ng and Stuart Russell, “Algorithms for inverse reinforcement learning,” in Proceedings of the 17th International Conference on Machine Learning, ed. Pat Langley (Morgan Kaufmann, 2000). 7. Better algorithms for inverse RL: Pieter Abbeel and

Andrew Ng, “Apprenticeship learning via inverse reinforcement learning,” in Proceedings of the 21st International Conference on Machine Learning, ed. Russ Greiner and Dale Schuurmans (ACM Press, 2004). 8. Understanding inverse RL as Bayesian

updating: Deepak Ramachandran and Eyal Amir, “Bayesian inverse reinforcement learning,” in Proceedings of the 20th International Joint Conference on Artificial Intelligence, ed. Manuela Veloso (AAAI Press, 2007). 9. How to teach helicopters to fly and

of the brain in man,” American Journal of Psychiatry 120 (1963): 571–77. 25. A first mathematical treatment of wireheading, showing how it occurs in reinforcement learning agents: Mark Ring and Laurent Orseau, “Delusion, survival, and intelligent agents,” in Artificial General Intelligence: 4th International Conference, ed. Jürgen Schmidhuber, Kristinn Thórisson, and Moshe

Looks (Springer, 2011). One possible solution to the wireheading problem: Tom Everitt and Marcus Hutter, “Avoiding wireheading with value reinforcement learning,” arXiv:1605.03143 (2016). 26. How it might be possible for an intelligence explosion to occur safely: Benja Fallenstein and Nate Soares, “Vingean reflection: Reliable

–21. 8. A generalization of Harsanyi’s social aggregation theorem to the case of unequal prior beliefs: Andrew Critch, Nishant Desai, and Stuart Russell, “Negotiable reinforcement learning for Pareto optimal sequential decision-making,” in Advances in Neural Information Processing Systems 31, ed. Samy Bengio et al. (2018). 9. The sourcebook for ideal

values; see, for example, Pietro Carrera, Il gioco degli scacchi (Giovanni de Rossi, 1617). 2. A report describing Samuel’s heroic research on an early reinforcement learning algorithm for checkers: Arthur Samuel, “Some studies in machine learning using the game of checkers,” IBM Journal of Research and Development 3 (1959): 210–29

–34 exponential complexity of problems and, 38–39 halting problem and, 37–38 lookahead search, 47, 49–50, 260–61 propositional logic and, 268–70 reinforcement learning, 55–57, 105 subroutines within, 34 supervised learning, 58–59, 285–93 Alibaba, 250 AlphaGo, 6, 46–48, 49–50, 55, 91, 92, 206–7

and, 128–30 decisions affecting people, use of machines in, 126–28 robots built in humanoid form and, 124–26 intractable problems, 38–39 inverse reinforcement learning, 191–93 IQ, 48 Ishiguro, Hiroshi, 125 is-ought problem, 167 “it’s complicated” argument, 147–48 “it’s impossible” argument, 149–50 “it’s

, 86–87, 288–93 as evolutionary accelerator, 18–20 from experience, 285–93 explanation-based learning, 294–95 feature engineering and, 84–85 inverse reinforcement learning, 191–93 reinforcement learning, 17, 47, 55–57, 105, 190–91 supervised learning, 58–59, 285–93 from thinking, 293–95 LeCun, Yann, 47, 165 legal profession, 119

and Persons (Parfit), 225 Recombinant DNA Advisory Committee, 155 recombinant DNA research, 155–56 recursive self-improvement, 208–10 redlining, 128 reflex agents, 57–59 reinforcement learning, 17, 47, 55–57, 105, 190–91 remembering self, and preferences, 238–40 Repugnant Conclusion, 225 reputation systems, 108–9 “research can’t be controlled

Possible Minds: Twenty-Five Ways of Looking at AI

by John Brockman  · 19 Feb 2019  · 339pp  · 94,769 words

Model Thinker: What You Need to Know to Make Data Work for You

by Scott E. Page  · 27 Nov 2018  · 543pp  · 153,550 words

Robot Rules: Regulating Artificial Intelligence

by Jacob Turner  · 29 Oct 2018  · 688pp  · 147,571 words

Rule of the Robots: How Artificial Intelligence Will Transform Everything

by Martin Ford  · 13 Sep 2021  · 288pp  · 86,995 words

The Road to Conscious Machines

by Michael Wooldridge  · 2 Nov 2018  · 346pp  · 97,890 words

The Brain That Changes Itself: Stories of Personal Triumph From the Frontiers of Brain Science

by Norman Doidge  · 15 Mar 2007  · 515pp  · 136,938 words

Artificial Intelligence: A Guide for Thinking Humans

by Melanie Mitchell  · 14 Oct 2019  · 350pp  · 98,077 words

Four Battlegrounds

by Paul Scharre  · 18 Jan 2023

The Price of Tomorrow: Why Deflation Is the Key to an Abundant Future

by Jeff Booth  · 14 Jan 2020  · 180pp  · 55,805 words

High-Frequency Trading

by David Easley, Marcos López de Prado and Maureen O'Hara  · 28 Sep 2013

Cognitive Gadgets: The Cultural Evolution of Thinking

by Cecilia Heyes  · 15 Apr 2018

The Unwritten Rules of Social Relationships: Decoding Social Mysteries Through the Unique Perspectives of Autism

by Temple Grandin and Sean Barron  · 30 Sep 2012  · 347pp  · 123,884 words

I, Warbot: The Dawn of Artificially Intelligent Conflict

by Kenneth Payne  · 16 Jun 2021  · 339pp  · 92,785 words

The Coming Wave: Technology, Power, and the Twenty-First Century's Greatest Dilemma

by Mustafa Suleyman  · 4 Sep 2023  · 444pp  · 117,770 words

Framers: Human Advantage in an Age of Technology and Turmoil

by Kenneth Cukier, Viktor Mayer-Schönberger and Francis de Véricourt  · 10 May 2021  · 291pp  · 80,068 words

The Singularity Is Nearer: When We Merge with AI

by Ray Kurzweil  · 25 Jun 2024

On the Edge: The Art of Risking Everything

by Nate Silver  · 12 Aug 2024  · 848pp  · 227,015 words

Psychopathy: An Introduction to Biological Findings and Their Implications

by Andrea L. Glenn and Adrian Raine  · 7 Mar 2014

The Linguist: A Personal Guide to Language Learning

by Steve Kaufmann  · 15 Jan 2003

Machine, Platform, Crowd: Harnessing Our Digital Future

by Andrew McAfee and Erik Brynjolfsson  · 26 Jun 2017  · 472pp  · 117,093 words

Evil Genes: Why Rome Fell, Hitler Rose, Enron Failed, and My Sister Stole My Mother's Boyfriend

by Barbara Oakley Phd  · 20 Oct 2008

Hands-On Machine Learning With Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

by Aurélien Géron  · 13 Mar 2017  · 1,331pp  · 163,200 words

The Age of AI: And Our Human Future

by Henry A Kissinger, Eric Schmidt and Daniel Huttenlocher  · 2 Nov 2021  · 194pp  · 57,434 words

The Creativity Code: How AI Is Learning to Write, Paint and Think

by Marcus Du Sautoy  · 7 Mar 2019  · 337pp  · 103,522 words

Escape From Model Land: How Mathematical Models Can Lead Us Astray and What We Can Do About It

by Erica Thompson  · 6 Dec 2022  · 250pp  · 79,360 words

Virtual Competition

by Ariel Ezrachi and Maurice E. Stucke  · 30 Nov 2016

Networks, Crowds, and Markets: Reasoning About a Highly Connected World

by David Easley and Jon Kleinberg  · 15 Nov 2010  · 1,535pp  · 337,071 words

What Algorithms Want: Imagination in the Age of Computing

by Ed Finn  · 10 Mar 2017  · 285pp  · 86,853 words

Succeeding With AI: How to Make AI Work for Your Business

by Veljko Krunic  · 29 Mar 2020

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

by Pedro Domingos  · 21 Sep 2015  · 396pp  · 117,149 words

Radical Technologies: The Design of Everyday Life

by Adam Greenfield  · 29 May 2017  · 410pp  · 119,823 words

Hands-On Machine Learning With Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

by Aurelien Geron  · 14 Aug 2019

The Smartphone Society

by Nicole Aschoff

Like, Comment, Subscribe: Inside YouTube's Chaotic Rise to World Domination

by Mark Bergen  · 5 Sep 2022  · 642pp  · 141,888 words

Demystifying Smart Cities

by Anders Lisdorf

Power and Progress: Our Thousand-Year Struggle Over Technology and Prosperity

by Daron Acemoglu and Simon Johnson  · 15 May 2023  · 619pp  · 177,548 words

Superintelligence: Paths, Dangers, Strategies

by Nick Bostrom  · 3 Jun 2014  · 574pp  · 164,509 words

Prediction Machines: The Simple Economics of Artificial Intelligence

by Ajay Agrawal, Joshua Gans and Avi Goldfarb  · 16 Apr 2018  · 345pp  · 75,660 words

AI Superpowers: China, Silicon Valley, and the New World Order

by Kai-Fu Lee  · 14 Sep 2018  · 307pp  · 88,180 words

Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again

by Eric Topol  · 1 Jan 2019  · 424pp  · 114,905 words

Rationality: From AI to Zombies

by Eliezer Yudkowsky  · 11 Mar 2015  · 1,737pp  · 491,616 words

Darwin's Dangerous Idea: Evolution and the Meanings of Life

by Daniel C. Dennett  · 15 Jan 1995  · 846pp  · 232,630 words

The New Harvest: Agricultural Innovation in Africa

by Calestous Juma  · 27 May 2017

Artificial Whiteness

by Yarden Katz

Genius Makers: The Mavericks Who Brought A. I. To Google, Facebook, and the World

by Cade Metz  · 15 Mar 2021  · 414pp  · 109,622 words

Narrative Economics: How Stories Go Viral and Drive Major Economic Events

by Robert J. Shiller  · 14 Oct 2019  · 611pp  · 130,419 words

Seeking SRE: Conversations About Running Production Systems at Scale

by David N. Blank-Edelman  · 16 Sep 2018

A Hacker's Mind: How the Powerful Bend Society's Rules, and How to Bend Them Back

by Bruce Schneier  · 7 Feb 2023  · 306pp  · 82,909 words

Outnumbered: From Facebook and Google to Fake News and Filter-Bubbles – the Algorithms That Control Our Lives

by David Sumpter  · 18 Jun 2018  · 276pp  · 81,153 words

Cogs and Monsters: What Economics Is, and What It Should Be

by Diane Coyle  · 11 Oct 2021  · 305pp  · 75,697 words

The Transhumanist Reader

by Max More and Natasha Vita-More  · 4 Mar 2013  · 798pp  · 240,182 words

Mindware: Tools for Smart Thinking

by Richard E. Nisbett  · 17 Aug 2015  · 397pp  · 109,631 words

Army of None: Autonomous Weapons and the Future of War

by Paul Scharre  · 23 Apr 2018  · 590pp  · 152,595 words

Behave: The Biology of Humans at Our Best and Worst

by Robert M. Sapolsky  · 1 May 2017  · 1,261pp  · 294,715 words

The Autonomous Revolution: Reclaiming the Future We’ve Sold to Machines

by William Davidow and Michael Malone  · 18 Feb 2020  · 304pp  · 80,143 words

The Loop: How Technology Is Creating a World Without Choices and How to Fight Back

by Jacob Ward  · 25 Jan 2022  · 292pp  · 94,660 words

Empire of AI: Dreams and Nightmares in Sam Altman's OpenAI

by Karen Hao  · 19 May 2025  · 660pp  · 179,531 words

Handbook of Modeling High-Frequency Data in Finance

by Frederi G. Viens, Maria C. Mariani and Ionut Florescu  · 20 Dec 2011  · 443pp  · 51,804 words

WTF?: What's the Future and Why It's Up to Us

by Tim O'Reilly  · 9 Oct 2017  · 561pp  · 157,589 words

The Optimist: Sam Altman, OpenAI, and the Race to Invent the Future

by Keach Hagey  · 19 May 2025  · 439pp  · 125,379 words

Making Sense of Chaos: A Better Economics for a Better World

by J. Doyne Farmer  · 24 Apr 2024  · 406pp  · 114,438 words

Text Analytics With Python: A Practical Real-World Approach to Gaining Actionable Insights From Your Data

by Dipanjan Sarkar  · 1 Dec 2016

What to Think About Machines That Think: Today's Leading Thinkers on the Age of Machine Intelligence

by John Brockman  · 5 Oct 2015  · 481pp  · 125,946 words

The Organized Mind: Thinking Straight in the Age of Information Overload

by Daniel J. Levitin  · 18 Aug 2014  · 685pp  · 203,949 words

Super Thinking: The Big Book of Mental Models

by Gabriel Weinberg and Lauren McCann  · 17 Jun 2019

These Strange New Minds: How AI Learned to Talk and What It Means

by Christopher Summerfield  · 11 Mar 2025  · 412pp  · 122,298 words

The Precipice: Existential Risk and the Future of Humanity

by Toby Ord  · 24 Mar 2020  · 513pp  · 152,381 words

Human + Machine: Reimagining Work in the Age of AI

by Paul R. Daugherty and H. James Wilson  · 15 Jan 2018  · 523pp  · 61,179 words

21 Lessons for the 21st Century

by Yuval Noah Harari  · 29 Aug 2018  · 389pp  · 119,487 words

Range: Why Generalists Triumph in a Specialized World

by David Epstein  · 1 Mar 2019  · 406pp  · 109,794 words

Global Catastrophic Risks

by Nick Bostrom and Milan M. Cirkovic  · 2 Jul 2008

A World Without Work: Technology, Automation, and How We Should Respond

by Daniel Susskind  · 14 Jan 2020  · 419pp  · 109,241 words

Know Thyself

by Stephen M Fleming  · 27 Apr 2021

Whiplash: How to Survive Our Faster Future

by Joi Ito and Jeff Howe  · 6 Dec 2016  · 254pp  · 76,064 words

If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All

by Eliezer Yudkowsky and Nate Soares  · 15 Sep 2025  · 215pp  · 64,699 words

Supremacy: AI, ChatGPT, and the Race That Will Change the World

by Parmy Olson  · 284pp  · 96,087 words

The Long History of the Future: Why Tomorrow's Technology Still Isn't Here

by Nicole Kobie  · 3 Jul 2024  · 348pp  · 119,358 words

The Rationalist's Guide to the Galaxy: Superintelligent AI and the Geeks Who Are Trying to Save Humanity's Future

by Tom Chivers  · 12 Jun 2019  · 289pp  · 92,714 words

Data Mining: Concepts, Models, Methods, and Algorithms

by Mehmed Kantardzić  · 2 Jan 2003  · 721pp  · 197,134 words

The Big Nine: How the Tech Titans and Their Thinking Machines Could Warp Humanity

by Amy Webb  · 5 Mar 2019  · 340pp  · 97,723 words

Science in the Soul: Selected Writings of a Passionate Rationalist

by Richard Dawkins  · 15 Mar 2017  · 420pp  · 130,714 words

Luxury Fever: Why Money Fails to Satisfy in an Era of Excess

by Robert H. Frank  · 15 Jan 1999  · 416pp  · 112,159 words

Scary Smart: The Future of Artificial Intelligence and How You Can Save Our World

by Mo Gawdat  · 29 Sep 2021  · 259pp  · 84,261 words

Co-Intelligence: Living and Working With AI

by Ethan Mollick  · 2 Apr 2024  · 189pp  · 58,076 words

Nexus: A Brief History of Information Networks From the Stone Age to AI

by Yuval Noah Harari  · 9 Sep 2024  · 566pp  · 169,013 words

Futureproof: 9 Rules for Humans in the Age of Automation

by Kevin Roose  · 9 Mar 2021  · 208pp  · 57,602 words

Top Dog: The Science of Winning and Losing

by Po Bronson and Ashley Merryman  · 19 Feb 2013  · 407pp  · 109,653 words

Misbehaving: The Making of Behavioral Economics

by Richard H. Thaler  · 10 May 2015  · 500pp  · 145,005 words

Autonomous Driving: How the Driverless Revolution Will Change the World

by Andreas Herrmann, Walter Brenner and Rupert Stadler  · 25 Mar 2018

The Revolution That Wasn't: GameStop, Reddit, and the Fleecing of Small Investors

by Spencer Jakab  · 1 Feb 2022  · 420pp  · 94,064 words

Amateurs!: How We Built Internet Culture and Why It Matters

by Joanna Walsh  · 22 Sep 2025  · 255pp  · 80,203 words

The Economic Singularity: Artificial Intelligence and the Death of Capitalism

by Calum Chace  · 17 Jul 2016  · 477pp  · 75,408 words

The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future

by Kevin Kelly  · 6 Jun 2016  · 371pp  · 108,317 words

Split-Second Persuasion: The Ancient Art and New Science of Changing Minds

by Kevin Dutton  · 3 Feb 2011  · 338pp  · 100,477 words

Thinking Machines: The Inside Story of Artificial Intelligence and Our Race to Build the Future

by Luke Dormehl  · 10 Aug 2016  · 252pp  · 74,167 words

Artificial Unintelligence: How Computers Misunderstand the World

by Meredith Broussard  · 19 Apr 2018  · 245pp  · 83,272 words

Robots Will Steal Your Job, But That's OK: How to Survive the Economic Collapse and Be Happy

by Pistono, Federico  · 14 Oct 2012  · 245pp  · 64,288 words

Why Machines Learn: The Elegant Math Behind Modern AI

by Anil Ananthaswamy  · 15 Jul 2024  · 416pp  · 118,522 words

Life After Google: The Fall of Big Data and the Rise of the Blockchain Economy

by George Gilder  · 16 Jul 2018  · 332pp  · 93,672 words

The Myth of Artificial Intelligence: Why Computers Can't Think the Way We Do

by Erik J. Larson  · 5 Apr 2021

Our Final Invention: Artificial Intelligence and the End of the Human Era

by James Barrat  · 30 Sep 2013  · 294pp  · 81,292 words

Future Politics: Living Together in a World Transformed by Tech

by Jamie Susskind  · 3 Sep 2018  · 533pp

Applied Artificial Intelligence: A Handbook for Business Leaders

by Mariya Yao, Adelyn Zhou and Marlene Jia  · 1 Jun 2018  · 161pp  · 39,526 words

Sunfall

by Jim Al-Khalili  · 17 Apr 2019  · 381pp  · 120,361 words

The Geeks Shall Inherit the Earth: Popularity, Quirk Theory, and Why Outsiders Thrive After High School

by Alexandra Robbins  · 31 Mar 2009  · 509pp  · 147,998 words

Homo Deus: A Brief History of Tomorrow

by Yuval Noah Harari  · 1 Mar 2015  · 479pp  · 144,453 words

The Ages of Globalization

by Jeffrey D. Sachs  · 2 Jun 2020

The Science and Technology of Growing Young: An Insider's Guide to the Breakthroughs That Will Dramatically Extend Our Lifespan . . . And What You Can Do Right Now

by Sergey Young  · 23 Aug 2021  · 326pp  · 88,968 words

The Thinking Machine: Jensen Huang, Nvidia, and the World's Most Coveted Microchip

by Stephen Witt  · 8 Apr 2025  · 260pp  · 82,629 words

The AI Economy: Work, Wealth and Welfare in the Robot Age

by Roger Bootle  · 4 Sep 2019  · 374pp  · 111,284 words

Human Frontiers: The Future of Big Ideas in an Age of Small Thinking

by Michael Bhaskar  · 2 Nov 2021

The Power of Habit: Why We Do What We Do in Life and Business

by Charles Duhigg  · 1 Jan 2011  · 455pp  · 116,578 words

Learn Descriptive Cataloging Second North American Edition

by Mary Mortimer  · 1 Jan 1999  · 282pp  · 28,394 words

The Extended Phenotype: The Long Reach of the Gene

by Richard Dawkins  · 1 Jan 1982  · 506pp  · 152,049 words

Learn Algorithmic Trading

by Sebastien Donadio  · 7 Nov 2019

Talk to Me: How Voice Computing Will Transform the Way We Live, Work, and Think

by James Vlahos  · 1 Mar 2019  · 392pp  · 108,745 words

The Future Is Faster Than You Think: How Converging Technologies Are Transforming Business, Industries, and Our Lives

by Peter H. Diamandis and Steven Kotler  · 28 Jan 2020  · 501pp  · 114,888 words

The Perfect Bet: How Science and Math Are Taking the Luck Out of Gambling

by Adam Kucharski  · 23 Feb 2016  · 360pp  · 85,321 words

Hooked: Food, Free Will, and How the Food Giants Exploit Our Addictions

by Michael Moss  · 2 Mar 2021  · 300pp  · 94,628 words

The Future of the Brain: Essays by the World's Leading Neuroscientists

by Gary Marcus and Jeremy Freeman  · 1 Nov 2014  · 336pp  · 93,672 words

Ghost Work: How to Stop Silicon Valley From Building a New Global Underclass

by Mary L. Gray and Siddharth Suri  · 6 May 2019  · 346pp  · 97,330 words

Learning Scikit-Learn: Machine Learning in Python

by Raúl Garreta and Guillermo Moncecchi  · 14 Sep 2013  · 122pp  · 29,286 words

Mastering Machine Learning With Scikit-Learn

by Gavin Hackeling  · 31 Oct 2014

Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps

by Valliappa Lakshmanan, Sara Robinson and Michael Munn  · 31 Oct 2020

Algorithms to Live By: The Computer Science of Human Decisions

by Brian Christian and Tom Griffiths  · 4 Apr 2016  · 523pp  · 143,139 words

Bandit Algorithms for Website Optimization

by John Myles White  · 10 Dec 2012  · 94pp  · 22,435 words

Being You: A New Science of Consciousness

by Anil Seth  · 29 Aug 2021  · 418pp  · 102,597 words

This Is for Everyone: The Captivating Memoir From the Inventor of the World Wide Web

by Tim Berners-Lee  · 8 Sep 2025  · 347pp  · 100,038 words

Industry 4.0: The Industrial Internet of Things

by Alasdair Gilchrist  · 27 Jun 2016

Surviving AI: The Promise and Peril of Artificial Intelligence

by Calum Chace  · 28 Jul 2015  · 144pp  · 43,356 words

How to Predict the Unpredictable

by William Poundstone  · 267pp  · 71,941 words

The Simulation Hypothesis

by Rizwan Virk  · 31 Mar 2019  · 315pp  · 89,861 words

The Complete Book of Home Organization: 336 Tips and Projects

by Abowlfulloflemons.com and Toni Hammersley  · 5 Jan 2016  · 278pp  · 42,509 words