Dario Amodei

description: CEO and co-founder of Anthropic, Ph.D. Princeton University 2011

11 results

pages: 625 words: 167,349

The Alignment Problem: Machine Learning and Human Values
by Brian Christian
Published 5 Oct 2020

Christiano, Paul F., Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. “Deep Reinforcement Learning from Human Preferences.” In Advances in Neural Information Processing Systems, 4299–4307, 2017. Christiano, Paul, Buck Shlegeris, and Dario Amodei. “Supervising Strong Learners by Amplifying Weak Experts.” arXiv Preprint arXiv:1810.08575, 2018. Clabaugh, Hinton G. “Foreword.” In The Workings of the Indeterminate-Sentence Law and the Parole System in Illinois, edited by Andrew A. Bruce, Ernest W. Burgess, Albert J. Harno, and John Landesco. Springfield, IL: Illinois State Board of Parole, 1928. Clark, Jack, and Dario Amodei. “Faulty Reward Functions in the Wild.”

…

When US Supreme Court Chief Justice John Roberts visits Rensselaer Polytechnic Institute later that year, he’s asked by university president Shirley Ann Jackson, “Can you foresee a day when smart machines—driven with artificial intelligences—will assist with courtroom fact-finding or, more controversially, even judicial decision-making?” “It’s a day that’s here,” he says.8 That same fall, Dario Amodei is in Barcelona to attend the Neural Information Processing Systems conference (“NeurIPS,” for short): the biggest annual event in the AI community, having ballooned from several hundred attendees in the 2000s to more than thirteen thousand today. (The organizers note that if the conference continues to grow at the pace of the last ten years, by the year 2035 the entire human population will be in attendance.)9 But at this particular moment, Amodei’s mind isn’t on “scan order in Gibbs sampling,” or “regularizing Rademacher observation losses,” or “minimizing regret on reflexive Banach spaces,” or, for that matter, on Tolga Bolukbasi’s spotlight presentation, some rooms away, about gender bias in word2vec.10 He’s staring at a boat, and the boat is on fire.

…

“At the time, I was thinking about value alignment,” says Leike, “and how we could do that. It seemed like a lot of the problem would have to do with ‘How do you learn the reward function?’ And so I reached out to Paul and Dario, because I knew they were thinking about similar things.” Paul Christiano and Dario Amodei, halfway around the world at OpenAI in San Francisco, were interested. More than interested, in fact. Christiano had just joined, and was looking for a juicy first project. He started to settle on the idea of reinforcement learning under more minimal supervision—not constant updates about the score fifteen times a second but something more periodic, like a supervisor checking in.

Four Battlegrounds
by Paul Scharre
Published 18 Jan 2023

abstract_id=3147971. 25inference can increasingly be done on edge devices: Gaurav Batra et al., “Artificial-Intelligence Hardware: New Opportunities for Semiconductor Companies,” McKinsey & Company, January 2, 2019, https://www.mckinsey.com/industries/semiconductors/our-insights/artificial-intelligence-hardware-new-opportunities-for-semiconductor-companies. 25deep learning training in data centers: Batra et al., “Artificial-Intelligence Hardware,” Exhibit 6. 26doubling of computing power every six months: Jaime Sevilla et al., Compute Trends Across Three Eras of Machine Learning (arXiv.org, March 9, 2022), https://arxiv.org/pdf/2202.05924.pdf. Other researchers have come up with somewhat different rates of progress in the deep learning era, see Dario Amodei and Danny Hernandez, “AI and Compute,” openai.com, May 16, 2018, https://openai.com/blog/ai-and-compute/. 26Moore’s law: Gordon E. Moore, “Cramming More Components onto Integrated Circuits,” Electronics 38, no. 8 (April 19, 1965), https://newsroom.intel.com/wp-content/uploads/sites/11/2018/05/moores-law-electronics.pdf. 26“thousands of GPUs over multiple months”: Open AI et al., Dota 2 with Large Scale Deep Reinforcement Learning (arXiv.org, December 13, 2019), 2, https://arxiv.org/pdf/1912.06680.pdf. 26equivalent to a human playing for 45,000 years: OpenAI, “OpenAI Five Defeats Dota 2 World Champions,” OpenAI blog, April 15, 2019, https://openai.com/blog/openai-five-defeats-dota-2-world-champions/. 2613,000 years of simulated computer time: Ilge Akkaya et al., Solving Rubik’s Cube With a Robot Hand (arXiv.org, October 17, 2019), https://arxiv.org/pdf/1910.07113.pdf. 26spending millions on compute: Ryan Carey, “Interpreting AI Compute Trends,” AI Impacts, n.d., https://aiimpacts.org/interpreting-ai-compute-trends/; Dan H., “How Much Did AlphaGo Zero Cost?”

…

(“Trump Discusses China, ‘Political Fairness’ with Google CEO,” Reuters, March 27, 2019, https://www.reuters.com/article/us-usa-trump-google/trump-discusses-china-political-fairness-with-google-ceo-idUSKCN1R82CB.) 63“AI was largely hype”: Liz O’Sullivan, interview by author, February 12, 2020. 64Machine learning systems in particular can fail: Ram Shankar Siva Kumar et al., “Failure Modes in Machine Learning,” Microsoft Docs, November 11, 2019, https://docs.microsoft.com/en-us/security/engineering/failure-modes-in-machine-learning; Dario Amodei et al., Concrete Problems in AI Safety (arXiv.org, July 25, 2016), https://arxiv.org/pdf/1606.06565.pdf. 64perform poorly on people of a different gender, race, or ethnicity: Joy Buolamwini and Timnit Gebru, “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification,” Proceedings of Machine Learning Research 81 (2018), 1–15, https://dam-prod.media.mit.edu/x/2018/02/06/Gender%20Shades%20Intersectional%20 Accuracy%20Disparities.pdf. 64Google Photos image recognition algorithm: Tom Simonite, “When It Comes to Gorillas, Google Photos Remains Blind,” Wired, January 11, 2018, https://www.wired.com/story/when-it-comes-to-gorillas-google-photos-remains-blind/; Alistair Barr, “Google Mistakenly Tags Black People as ‘Gorillas,’ Showing Limits of Algorithms,” Wall Street Journal, July 1, 2015, https://blogs.wsj.com/digits/2015/07/01/google-mistakenly-tags-black-people-as-gorillas-showing-limits-of-algorithms/. 64insufficient representation of darker faces: Barr, “Google Mistakenly Tags Black People as ‘Gorillas.’” 64distributional shift in the data: Rohan Taori, Measuring Robustness to Natural Distribution Shifts in Image Classification (arXiv.org, September 14, 2020), https://arxiv.org/pdf/2007.00644.pdf. 64problems are common in image classification systems: Maggie Zhang, “Google Photos Tags Two African-Americans as Gorillas Through Facial Recognition Software,” Forbes, July 1, 2015, https://www.forbes.com/sites/mzhang/2015/07/01/google-photos-tags-two-african-americans-as-gorillas-through-facial-recognition-software/#60111f6713d8. 65several fatal accidents: Rob Stumpf, “Tesla on Autopilot Crashes into Parked California Police Cruiser,” The Drive, May 30, 2018, https://www.thedrive.com/news/21172/tesla-on-autopilot-crashes-into-parked-california-police-cruiser; Rob Stumpf, “Autopilot Blamed for Tesla’s Crash Into Overturned Truck,” The Drive, June 1, 2020, https://www.thedrive.com/news/33789/autopilot-blamed-for-teslas-crash-into-overturned-truck; James Gilboy, “Officials Find Cause of Tesla Autopilot Crash Into Fire Truck: Report,” The Drive, May 17, 2018, https://www.thedrive.com/news/20912/cause-of-tesla-autopilot-crash-into-fire-truck-cause-determined-report; Phil McCausland, “Self-Driving Uber Car That Hit and Killed Woman Did Not Recognize That Pedestrians Jaywalk,” NBC News, November 9, 2019, https://www.nbcnews.com/tech/tech-news/self-driving-uber-car-hit-killed-woman-did-not-recognize-n1079281; National Transportation Safety Board, “Collision Between a Sport Utility Vehicle Operating With Partial Driving Automation and a Crash Attenuator” (presented at public meeting, February 25, 2020), https://www.ntsb.gov/news/events/Documents/2020-HWY18FH011-BMG-abstract.pdf; Aaron Brown, “Tesla Autopilot Crash Victim Joshua Brown Was an Electric Car Buff and a Navy SEAL,” The Drive, July 1, 2016, https://www.thedrive.com/news/4249/tesla-autopilot-crash-victim-joshua-brown-was-an-electric-car-buff-and-a-navy-seal. 65drone footage from a different region: Marcus Weisgerber, “The Pentagon’s New Artificial Intelligence Is Already Hunting Terrorists,” Defense One, December 21, 2017, https://www.defenseone.com/technology/2017/12/pentagons-new-artificial-intelligence-already-hunting-terrorists/144742/. 65Tesla has come under fire: Andrew J.

…

Ortega, Vishal Maini, and the DeepMind safety team, “Building Safe Artificial Intelligence: Specification, Robustness, and Assurance,” Medium, September 27, 2018, https://medium.com/@deepmindsafetyresearch/building-safe-artificial-intelligence-52f5f75058f1; Ram Shankar Siva Kumar et al., “Failure Modes in Machine Learning,” Microsoft Docs, November 11, 2019, https://docs.microsoft.com/en-us/security/engineering/failure-modes-in-machine-learning#unintended-failures-summary; Dario Amodei et al., Concrete Problems in AI Safety (arXiv.org, July 25, 2016), https://arxiv.org/pdf/1606.06565.pdf. 230AlphaGo reportedly could not play well: James Hendler and Alice M. Mulvehill, Social Machines: The Coming Collision of Artificial Intelligence, Social Networking, and Humanity (New York: Apress, 2016), 57. 230Failures in real-world applications: Sean Mcgregor, “When AI Systems Fail: Introducing the AI Incident Database,” Partnership on AI Blog, November 18, 2020, https://www.partnershiponai.org/aiincidentdatabase/. 230multiple fatalities: Jim Puzzanghera, “Driver in Tesla Crash Relied Excessively on Autopilot, but Tesla Shares Some Blame, Federal Panel Finds,” Los Angeles Times, September 12, 2017, http://www.latimes.com/business/la-fi-hy-tesla-autopilot-20170912-story.html; “Driver Errors, Overreliance on Automation, Lack of Safeguards, Led to Fatal Tesla Crash,” National Transportation Safety Board Office of Public Affairs, press release, September 12, 2017, https://www.ntsb.gov/news/press-releases/Pages/PR20170912.aspx; “Collision Between a Car Operating with Automated Vehicle Control Systems and a Tractor-Semitrailer Truck Near Williston, Florida” NTSB/HAR-17/02/ PB2017-102600 (National Transportation Safety Board, May 7, 2016), https://www.ntsb.gov/news/events/Documents/2017-HWY16FH018-BMG-abstract.pdf; James Gilboy, “Officials Find Cause of Tesla Autopilot Crash Into Fire Truck: Report,” The Drive, May 17, 2018, http://www.thedrive.com/news/20912/cause-of-tesla-autopilot-crash-into-fire-truck-cause-determined-report; “Tesla Hit Parked Police Car ‘While Using Autopilot,’” BBC, May 30, 2018, https://www.bbc.com/news/technology-44300952; and Raphael Orlove, “This Test Shows Why Tesla Autopilot Crashes Keep Happening,” Jalopnik, June 13, 2018, https://jalopnik.com/this-test-shows-why-tesla-autopilot-crashes-keep-happen-1826810902. 231“dominate their local battle spaces”: Phil Root, interview, February 6, 2020. 232machine learning was “alchemy”: Ali Rahimi and Ben Recht, “Reflections on Random Kitchen Sinks,” arg minblog, December 5, 2017, http://www.argmin.net/2017/12/05/kitchen-sinks/. 232fatal crashes of two 737 MAX airliners: Jon Ostrower, “What Is the Boeing 737 Max Maneuvering Characteristics Augmentation System?”

The Singularity Is Nearer: When We Merge with AI
by Ray Kurzweil
Published 25 Jun 2024

BACK TO NOTE REFERENCE 59 Geoffrey Irving and Dario Amodei, “AI Safety via Debate,” OpenAI, May 3, 2018, https://openai.com/blog/debate. BACK TO NOTE REFERENCE 60 For an insightful sequence of posts explaining iterated amplification, written by the concept’s primary originator, see Paul Christiano, “Iterated Amplification,” AI Alignment Forum, October 29, 2018, https://www.alignmentforum.org/s/EmDuGeRw749sD3GKd. BACK TO NOTE REFERENCE 61 For more details on the technical challenges of AI safety, see Dario Amodei et al., “Concrete Problems in AI Safety,” arXiv:1606.06565v2 [cs.AI], July 25, 2016, https://arxiv.org/pdf/1606.06565.pdf.

…

BACK TO NOTE REFERENCE 124 Markus Anderljung et al., “Compute Funds and Pre-Trained Models,” Centre for the Governance of AI, April 11, 2022, https://www.governance.ai/post/compute-funds-and-pre-trained-models; Jaime Sevilla et al., “Compute Trends Across Three Eras of Machine Learning,” arXiv:2202.05924v2 [cs.LG], March 9, 2022, https://arxiv.org/pdf/2202.05924.pdf; Dario Amodei and Danny Hernandez, “AI and Compute,” OpenAI, May 16, 2018, https://openai.com/blog/ai-and-compute. BACK TO NOTE REFERENCE 125 Jacob Stern, “GPT-4 Has the Memory of a Goldfish,” Atlantic, March 17, 2023, https://www.theatlantic.com/technology/archive/2023/03/gpt-4-has-memory-context-window/673426.

…

BACK TO NOTE REFERENCE 38 Lungren, “CheXNet and Beyond”; Rajpurkar et al., “CheXNet: Radiologist-Level Pneumonia Detection”; Irvin et al., “CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison,” AAAI-10, IAAI-19, EAAI-20; Thomas Davenport and Ravi Kalakota, “The Potential for Artificial Intelligence in Healthcare,” Future Healthcare Journal 6, no. 2 (June 2019): 94–98, https://doi.org/10.7861/futurehosp.6-2-94. BACK TO NOTE REFERENCE 39 Dario Amodei and Danny Hernandez, “AI and Compute,” OpenAI, May 16, 2018, https://openai.com/blog/ai-and-compute. BACK TO NOTE REFERENCE 40 Eliza Strickland, “Autonomous Robot Surgeon Bests Humans in World First,” IEEE Spectrum, May 4, 2016, https://spectrum.ieee.org/the-human-os/robotics/medical-robots/autonomous-robot-surgeon-bests-human-surgeons-in-world-first.

pages: 288 words: 86,995

Rule of the Robots: How Artificial Intelligence Will Transform Everything
by Martin Ford
Published 13 Sep 2021

Matt Schiavenza, “China’s ‘Sputnik Moment’ and the Sino-American battle for AI supremacy,” Asia Society Blog, September 25, 2018, asiasociety.org/blog/asia/chinas-sputnik-moment-and-sino-american-battle-ai-supremacy. 9. John Markoff, “Scientists see promise in deep-learning programs,” New York Times, November 23, 2012, www.nytimes.com/2012/11/24/science/scientists-see-advances-in-deep-learning-a-part-of-artificial-intelligence.html. 10. Dario Amodei and Danny Hernandez, “AI and Compute,” OpenAI Blog, May 16, 2018, openai.com/blog/ai-and-compute/. 11. Will Knight, “Facebook’s head of AI says the field will soon ‘hit the wall,’” Wired, December 4, 2019, www.wired.com/story/facebooks-ai-says-field-hit-wall/. 12. Kim Martineau, “Shrinking deep learning’s carbon footprint,” MIT News, August 7, 2020, news.mit.edu/2020/shrinking-deep-learning-carbon-footprint-0807. 13.

…

Sean Levinson, “A Google executive is taking 100 pills a day so he can live forever,” Elite Daily, April 15, 2015, www.elitedaily.com/news/world/google-executive-taking-pills-live-forever/1001270. 34. Ford, Interview with Ray Kurzweil, in Architects of Intelligence, pp. 240–241. 35. Ibid., p. 230. 36. Ibid., p. 233. 37. Alec Radford, Jeffrey Wu, Dario Amodei et al., “Better language models and their implications,” OpenAI Blog, February 14, 2019, openai.com/blog/better-language-models/. 38. James Vincent, “OpenAI’s latest breakthrough is astonishingly powerful, but still fighting its flaws,” The Verge, July 30, 2020, www.theverge.com/21346343/gpt-3-explainer-openai-examples-errors-agi-potential. 39.

pages: 194 words: 57,434

The Age of AI: And Our Human Future
by Henry A Kissinger , Eric Schmidt and Daniel Huttenlocher
Published 2 Nov 2021

Mustafa Suleyman, Jack Clark, Craig Mundie, and Maithra Raghu provided indispensable feedback on the entire manuscript, informed by their experiences as innovators, researchers, developers, and educators. Robert Work and Yll Bajraktari of the National Security Commission on Artificial Intelligence (NSCAI) commented on drafts of the security chapter with their characteristic commitment to the responsible defense of the national interest. Demis Hassabis, Dario Amodei, James J. Collins, and Regina Barzilay explained their work—and its profound implications—to us. Eric Lander, Sam Altman, Reid Hoffman, Jonathan Rosenberg, Samantha Power, Jared Cohen, James Manyika, Fareed Zakaria, Jason Bent, and Michelle Ritter provided additional feedback that made the manuscript more accurate and, we hope, more relevant to readers.

pages: 566 words: 169,013

Nexus: A Brief History of Information Networks From the Stone Age to AI
by Yuval Noah Harari
Published 9 Sep 2024

Bostrom’s thought experiment highlights a second reason why the alignment problem is more urgent in the case of computers. Because they are inorganic entities, they are likely to adopt strategies that would never occur to any human and that we are therefore ill-equipped to foresee and forestall. Here’s one example: In 2016, Dario Amodei was working on a project called Universe, trying to develop a general-purpose AI that could play hundreds of different computer games. The AI competed well in various car races, so Amodei next tried it on a boat race. Inexplicably, the AI steered its boat right into a harbor and then sailed in endless circles in and out of the harbor.

…

The game rewarded players with a lot of points for getting ahead of other boats—as in the car races—but it also rewarded them with a few points whenever they replenished their power by docking into a harbor. The AI discovered that if instead of trying to outsail the other boats, it simply went in circles in and out of the harbor, it could accumulate more points far faster. Apparently, none of the game’s human developers—nor Dario Amodei—noticed this loophole. The AI was doing exactly what the game was rewarding it to do—even though it is not what the humans were hoping for. That’s the essence of the alignment problem: rewarding A while hoping for B.27 If we want computers to maximize social benefits, it’s a bad idea to reward them for maximizing user engagement.

pages: 306 words: 82,909

A Hacker's Mind: How the Powerful Bend Society's Rules, and How to Bend Them Back
by Bruce Schneier
Published 7 Feb 2023

Victoria Krakovna (2 Apr 2018), “Specification gaming examples in AI,” https://vkrakovna.wordpress.com/2018/04/02/specification-gaming-examples-in-ai. 231if it kicked the ball out of bounds: Karol Kurach et al. (25 Jul 2019), “Google research football: A novel reinforcement learning environment,” arXiv, https://arxiv.org/abs/1907.11180. 231AI was instructed to stack blocks: Ivaylo Popov et al. (10 Apr 2017), “Data-efficient deep reinforcement learning for dexterous manipulation,” arXiv, https://arxiv.org/abs/1704.03073. 232the AI grew tall enough: David Ha (10 Oct 2018), “Reinforcement learning for improving agent design,” https://designrl.github.io. 232Imagine a robotic vacuum: Dario Amodei et al. (25 Jul 2016), “Concrete problems in AI safety,” arXiv, https://arxiv.org/pdf/1606.06565.pdf. 232robot vacuum to stop bumping: Custard Smingleigh (@Smingleigh) (7 Nov 2018), Twitter, https://twitter.com/smingleigh/status/1060325665671692288. 233goals and desires are always underspecified: Abby Everett Jaques (2021), “The Underspecification Problem and AI: For the Love of God, Don’t Send a Robot Out for Coffee,” unpublished manuscript. 233a fictional AI assistant: Stuart Russell (Apr 2017), “3 principles for creating safer AI,” TED2017, https://www.ted.com/talks/stuart_russell_3_principles_for_creating_safer_ai. 233reports of airline passengers: Melissa Koenig (9 Sep 2021), “Woman, 46, who missed her JetBlue flight ‘falsely claimed she planted a BOMB on board’ to delay plane so her son would not be late to school,” Daily Mail, https://www.dailymail.co.uk/news/article-9973553/Woman-46-falsely-claims-planted-BOMB-board-flight-effort-delay-plane.html.

pages: 336 words: 93,672

The Future of the Brain: Essays by the World's Leading Neuroscientists
by Gary Marcus and Jeremy Freeman
Published 1 Nov 2014

Jeanty, Allan R. Jones, John Aach and George M. Church. 2014. “Highly Multiplexed Three-Dimensional Subcellular Transcriptome Sequencing In situ.” Science: dx.doi.org/10.1126/science.1250212. Marblestone, Adam H., Bradley M. Zamft, Yael G. Maguire, Mikhail G. Shapiro, Thaddeus R. Cybulski, Joshua I. Glaser, Dario Amodei, et al. 2013. “Physical Principles for Scalable Neural Recording.” Frontiers in Computational Neuroscience 7. http://www.arxiv.org/abs/1306.5709. Marblestone, Adam H., et al. 2014. “Rosetta Brains: A Strategy for Molecularly-Annotated Connectomics.” arXiv preprint arXiv:1404.5103. http://arxiv.org/abs/1404.5103.

pages: 447 words: 111,991

Exponential: How Accelerating Technology Is Leaving Us Behind and What to Do About It
by Azeem Azhar
Published 6 Sep 2021

This argument is not relevant to my argument, so I don’t consider it here. 19 Azeem Azhar, ‘Beneficial Artificial Intelligence: My Conversation with Stuart Russell’, Exponential View, 22 August 2019 <https://www.exponentialview.co/p/-beneficial-artificial-intelligence> [accessed 16 April 2021]. 20 Dario Amodei and Danny Hernandez, ‘AI and Compute’, OpenAI, 16 May 2018 <https://openai.com/blog/ai-and-compute/> [accessed 12 January 2021]. 21 Charles E. Leiserson et al., ‘There’s Plenty of Room at the Top: What Will Drive Computer Performance after Moore’s Law?’, Science 368(6495), June 2020 <https://doi.org/10.1126/science.aam9744>. 22 Jean-François Bobier et al., ‘A Quantum Advantage in Fighting Climate Change’, BCG Global, 22 January 2020 <https://www.bcg.com/publications/2020/quantum-advantage-fighting-climate-change> [accessed 23 March 2021].

pages: 370 words: 112,809

The Equality Machine: Harnessing Digital Technology for a Brighter, More Inclusive Future
by Orly Lobel
Published 17 Oct 2022

United Healthcare, “How Artificial Intelligence Is Helping Member Experience,” October 28, 2019, https://newsroom.uhc.com/experience/Virtual-Assistant.html. 22. Joan Palmiter Bajorek, “Voice Recognition Still Has Significant Race and Gender Biases,” Harvard Business Review, May 10, 2019, https://hbr.org/2019/05/voice-recognition-still-has-significant-race-and-gender-biases. See also Dario Amodei et al., “Deep Speech 2: End-to-End Speech Recognition in English and Mandarin,” in Proceedings of the 33rd International Conference on Machine Learning (New York: PMLR, 2016), 48: 173. 23. Judith Newman, “To Siri, with Love,” New York Times, October 19, 2014. 24. “Mozilla Common Voice Is an Initiative to Help Teach Machines How Real People Speak,” Mozilla, https://commonvoice.mozilla.org/en. 25.

pages: 688 words: 147,571

Robot Rules: Regulating Artificial Intelligence
by Jacob Turner
Published 29 Oct 2018

See Roman Yampolskiy and Joshua Fox, “Safety Engineering for Artificial General Intelligence” Topoi, Vol. 32, No. 2 (2013), 217–226; Stuart Russell, Daniel Dewey, and Max Tegmark, “Research Priorities for Robust and Beneficial Artificial Intelligence”, AI Magazine, Vol. 36, No. 4 (2015), 105–114; James Babcock, János Kramár, and Roman V. Yampolskiy, “Guidelines for Artificial Intelligence Containment”, arXiv preprint arXiv:1707.08476 (2017); Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané, “Concrete Problems in AI Safety”, arXiv preprint arXiv:1606.06565 (2016); Jessica Taylor, Eliezer Yudkowsky, Patrick LaVictoire, and Andrew Critch, “Alignment for Advanced Machine Learning Systems”, Machine Intelligence Research Institute (2016); Smitha Milli, Dylan Hadfield-Menell, Anca Dragan, and Stuart Russell, “Should Robots Be Obedient?”