description: hypothetical artificial general intelligence that would have a positive effect on humanity
36 results
by James Barrat · 30 Sep 2013 · 294pp · 81,292 words
. I intended to ask the AI Box Experiment creator, Eliezer Yudkowsky. Besides originating that thought experiment, I’d been told that he knew more about Friendly AI than anyone else in the world. Chapter Four The Hard Way With the possible exception of nanotechnology being released upon the world there is nothing
…
the universe. That puts Yudkowsky at the dead center of his own cosmology. I had come here to learn more about Friendly AI, a term he coined. According to Yudkowsky, Friendly AI is the kind that will preserve humanity and our values forever. It doesn’t annihilate our species or spread into the universe
…
like a planet-eating space plague. But what is Friendly AI? How do you create it? I also wanted to hear about the AI Box Experiment. I especially wanted to know, as he role-played the
…
the problem. The source of the problem is that even when well-intentioned people set out to create AIs they are not very concerned with Friendly AI issues. They themselves assume that if they are good-intentioned people the AIs they make are automatically good intentioned, and this is not true. It
…
very difficult mathematical and engineering problem. I think most of them are just insufficiently good at thinking of uncomfortable thoughts. They started out not thinking, ‘Friendly AI is a problem that will kill you.’” Yudkowsky said that AI makers are infected by the idea of a blissful AI-enhanced future that lives
…
their field, getting tenure, publishing, getting rich, and so on. In fact, not many AI makers, in contrast to AI theorists, are concerned with building Friendly AI. With one exception, none of the dozen or so AI makers I’ve spoken with are worried enough to work on
…
Friendly AI or any other defensive measure. Maybe the thinkers overestimate the problem, or maybe the makers’ problem is not knowing what they don’t know. In
…
accretion of working algorithms, with the researchers having no deep understanding of how the combined system works. [italics mine] Not knowing how to build a Friendly AI is not deadly, of itself.… It’s the mistaken belief that an AI will be friendly which implies an obvious path to global catastrophe. Assuming
…
. The assumption becomes even more dangerous after the AGI’s intelligence rockets past ours, and it becomes ASI—artificial superintelligence. So how do you create friendly AI? Or could you impose friendliness on advanced AIs after they’re already built? Yudkowsky has written a book-length online treatise about these questions entitled
…
Creating Friendly AI: The Analysis and Design of Benevolent Goal Architectures. Friendly AI is a subject so dense yet important it exasperates its chief proponent himself, who says about it, “it only takes one
…
error for a chain of reasoning to end up in Outer Mongolia.” Let’s start with a simple definition. Friendly AI is AI that has a positive rather than a negative impact on mankind. Friendly AI pursues goals, and it takes action to fulfill those goals. To describe an AI’s success at achieving
…
about “transforming first all of earth and then increasing portions of space into paper clip manufacturing facilities.” Friendly AI would make only as many paper clips as was compatible with human values. Another tenet of Friendly AI is to avoid dogmatic values. What we consider to be good changes with time, and any AI
…
welfare to archaic values like racial inequality and slaveholding, gender inequality, shoes with buckles, and worse. We don’t want to lock specific values into Friendly AI. We want a moving scale that evolves with us. Yudkowsky has devised a name for the ability to “evolve” norms—Coherent Extrapolated Volition. An AI
…
we would want if we “knew more, thought faster, and were more the people we thought we were.” CEV would be an oracular feature of friendly AI. It would have to derive from us our values as if we were better versions of ourselves, and be democratic about it so that humankind
…
reasons for that. First, I’m giving you a highly summarized account of Friendly AI and CEV, concepts you can read volumes about online. And second, the whole topic of Friendly AI is incomplete and optimistic. It’s unclear whether or not Friendly AI can be expressed in a formal, mathematical sense, and so there may
…
intelligent enough to take part in these high-level judgments. In the future, we survive the intelligence explosion! In fact, we thrive. God bless you, Friendly AI! * * * Now that most (but not all) AI makers and theorists have recognized Asimov’s Three Laws of Robotics for what they were meant to be
…
—tools for drama, not survival—Friendly AI may be the best concept humans have come up with for planning their survival. But besides not being ready yet, it’s got other big
…
. Too many organizations in too many countries are working on AGI and AGI-related technologies for them all to agree to mothball their projects until Friendly AI is created, or to include in their code a formal friendliness module, if one could be made. And few are even taking part in the
…
public dialogue about the necessity for Friendly AI. Some of the AGI contestants include: IBM (with several AGI-related projects), Numenta, AGIRI, Vicarious, Carnegie Mellon’s NELL and ACT-R, SNERG, LIDA, CYC
…
installed in battlefield robots. In fact, robots may be the platforms for embodied machine learning that will help create advanced AI to begin with. When Friendly AI is available, if ever, why would privately run robot-making companies install it in machines designed to kill humans? Shareholders wouldn’t like that one
…
bit. Another problem with Friendly AI is this—how will friendliness survive an intelligence explosion? That is, how will Friendly AI stay friendly even after its IQ has grown by a thousand times? In his writing and lectures, Yudkowsky
…
initial surprise, wouldn’t we just do whatever we wanted? “It’s very clear why one would be suspicious of that,” Yudkowsky said. “But creating Friendly AI is not like giving instructions to a human. Humans have their own goals already, they have their own emotions, they have their own enforcers. They
…
apply only to a super-intelligence that came from human stock.” * * * I’d find in my ongoing inquiry that lots of experts took issue with Friendly AI, for reasons different from mine. The day after meeting Yudkowsky I got on the phone with Dr. James Hughes, chairman of the Department of Philosophy
…
Emerging Technologies (IEET). Hughes probed a weakness in the idea that an AI’s utility function couldn’t change. “One of the dogmas of the Friendly AI people is that if you are careful you can design a superintelligent being with a goal set that will become unchanging. And they somehow have
…
for propaganda, and good for raising support. But no lesson at all about going up against real AI in the real world. Now, back to Friendly AI. If it seems unlikely, does that mean an intelligence explosion is inevitable? Is runaway AI a certainty? If you, like me, thought computers were inert
…
AI scenario so catastrophic that it begs for closer scrutiny. We’ve investigated a promising idea about how to construct AI to defuse the danger—Friendly AI—and found that it is incomplete. In fact, the general idea of coding an intelligent system with permanently safe goals or evolvable safe goal-generating
…
motivations and goals that we may not share. As Eliezer Yudkowsky says, it may have other uses for our atoms. And as we’ve seen, Friendly AI, which would ensure the good behavior of the first AGI and all its progeny, is a concept that’s a long way from being ready
…
. Kurzweil doesn’t give much time to the concept of Friendly AI. “We can’t just say, ‘we’ll put in this little software code subroutine in our AIs, and that’ll keep them safe,’” he said
…
suggests surprise, as if an AGI could one day just show up, leaving us insufficiently prepared for “normal” accidents, and certainly lacking safeguards like formal, Friendly AI. It’s kind of like saying, “If we walk long enough in the woods we’ll find the hungry bears.” Eliezer Yudkowsky has similar fears
…
than natural selection. But again, as Yudkowsky cites, there’s a giant, galaxywide problem if someone achieves AGI before he or other researchers figure out Friendly AI or some way to reliably control AGI. If AGI comes about from incremental engineering in a fortuitous intersection of effort and accident, as Goertzel proposes
…
unbound likely to kill us all? “AGI is the ticking clock,” said Yudkowsky, “the deadline by which we’ve got to build Friendly AI, which is harder. We need Friendly AI. With the possible exception of nanotechnology being released upon the world, there is just nothing in that whole catalogue of disasters that is
…
science for understanding and controlling self-aware, self-improving systems, that is, AGI and ASI. And because of the challenges of developing an antidote like Friendly AI before AGI has been created, development of that science must happen roughly in tandem. Then, when AGI comes into being, its control system already exists
…
31, 2006, http://intelligence.org/files/AIPosNegFactor.pdf (accessed February 28, 2013). it only takes one error: Baez, “Interview with Eliezer Yudkowsky.” Friendly AI pursues goals: Yudkowsky, Eliezer, “Creating Friendly AI 1.0: The Analysis and Design of Benevolent Goal Architectures,” 2001, http://intelligence.org/files/CFAI.pdf (accessed March 4, 2013). transforming first
…
tools in definition of drives in, see drives as dual use technology emotional qualities in as entertainment examples of explosive, see intelligence explosion friendly, see Friendly AI funding for jump to AGI from Joy on risks of, see risks of artificial intelligence Singularity and, see Singularity tight coupling in utility function of
…
ecophagy efficiency Einstein, Albert emotions energy grid Enigma Enron Eurisko evil extropians Fastow, Andrew Ferrucci, David financial scandals financial system Flame Foreign Affairs Freidenfelds, Jason Friendly AI Coherent Extrapolated Volition and definition of intelligence explosion and SyNAPSE and Future of Humanity Institute genetic algorithms genetic engineering genetic programming George, Dileep global warming
…
, see nuclear weapons Mind Children (Moravec) Minsky, Marvin Mitchell, Tom mobile phones see also iPhone Monster Cat Moore, Gordon Moore’s Law morality see also Friendly AI Moravec, Hans Moravec’s Paradox mortality, see immortality mortgage crisis Mutually Assured Destruction (MAD) nano assemblers nanotechnology “gray goo” problem and natural language processing (NLP
by Tom Chivers · 12 Jun 2019 · 289pp · 92,714 words
computer science does, indeed, seem to take orthogonality seriously. Russell and Norvig’s aforementioned Artificial Intelligence: A Modern Approach cites Yudkowsky’s 2008 paper5 on friendly AI and dedicates three and a half pages to the risks of AI behaving in unwanted ways. It also cites another 2008 paper,6 by the
…
be blackmailing you from the future, threatening to punish you for not working hard enough to make it exist. As I said, it’s a friendly AI! So it wouldn’t torture just anybody. It would have no incentive to torture people who’d never heard of it. The punishment/incentive only
…
ridiculous to keep it locked up. Once you have the AI, you’re going to want to use it. ‘If you think you have a friendly AI, if the AI turns to you and says, “OK, hey, I’m friendly, I want to achieve the things you want to achieve,” then what
…
agree that human lives are net-positive in the universe. There’s also the discussion of what morals you instil in the AI itself: a ‘friendly AI’ that acts morally in the universe according to ‘morals’ that revolve around maximising happiness will be very different from one whose ‘morals’ revolve around maximising
…
’, ArXiv https://arxiv.org/pdf/1705.08807.pdf?_sp=c803ec8d-9f8f-4843-a81e-3284733403a0.1500631875031 6. David McAllester, ‘Friendly AI and the servant mission’, Machine Thoughts blog, 2014 https://machinethoughts.wordpress.com/2014/08/10/friendly-ai-and-the-servant-mission/ 7. Luke Muelhauser, ‘Eliezer Yudkowsky: Becoming a rationalist’, Conversations from the Pale Blue
by Calum Chace · 4 Feb 2014 · 345pp · 104,404 words
for most Americans. ‘That game is far from over. And I wouldn’t be surprised if a coalition of the institutes which are researching human-friendly AI algorithms announced a breakthrough this spring. There’s been a lot of unusually cordial traffic between several of the bigger US-based ones during the
by Jeanette Winterson · 15 Mar 2021 · 256pp · 73,068 words
you live inside. It will come … So, I don’t want to get into the question of surveillance here. We know the price of home-friendly AI: it’s our personal data. Anyone who has voice-activated Alexa is being listened to all of the time. We’re told this isn’t
by Max More and Natasha Vita-More · 4 Mar 2013 · 798pp · 240,182 words
the mix. I don’t think there are any magic bullets to resolve the dilemmas of AGI ethics. There will almost surely be no provably Friendly AI, in spite of the wishes of Eliezer Yudkowsky (2008) and some others. Nor, in my best guess, will there be an Artilect War in which
by Calum Chace · 28 Jul 2015 · 144pp · 43,356 words
depend on its decisions and its actions. Would that be a good thing or a bad thing? In other words, would a superintelligence be a “Friendly AI”? (Friendly AI, or FAI, denotes an AGI that is beneficial for humans rather than one that seeks social approbation and company. It also refers to the project
…
that the event is a positive one for ourselves and our descendants. We should be taking steps to ensure that the first AGI is a friendly AI. PART FOUR: FAI Friendly Artificial Intelligence CHAPTER 8 CAN WE ENSURE THAT SUPERINTELLIGENCE IS SAFE? As we saw in the last chapter
…
, Friendly AI (FAI) is the project of ensuring that the world’s superintelligences are safe and useful for humans. The central argument of this book is that
…
becomes evident that AGI is getting close – and that this warning sounds comfortably before AGI actually arrives. We could then take stock of progress towards Friendly AI, and if the latter was insufficiently advanced we could impose the ban on further AI research at that point. With luck we might be able
…
in the twenty-first century. Nick Bostrom calls the idea “indirect normativity”. Computer scientist Steve Omohundro proposes what he calls a “scaffolding” approach to developing Friendly AI. Once the first AGI is proven to be Friendly it is tasked with building its own (smart) successor, with the constraint that it also be
…
his personal money to the institute. 8.6 – Conclusion We do not yet have a foolproof way to ensure that the first AGI is a Friendly AI. In fact we don’t yet know how best to approach the problem. But we have only just begun, and the resources allocated to the
…
problem are small: Nick Bostrom estimated in 2014 that only six people in the world are working full-time on the Friendly AI problem, whereas many thousands of people work full-time on projects that could well contribute to the creation of the first AGI. (50) He argued
…
me in spring 2015 that Bostrom’s estimate was significantly too low, and that many more AI researchers spend much of their time thinking about Friendly AI as part of their everyday jobs. Even if that is correct, I suspect Bostrom is still right about the imbalance, and the truth will emerge
…
if we have the kind of debate I argue for in the next, concluding chapter. Bostrom recently observed that the Friendly AI problem is less difficult than creating the first AGI. (51) We should all hope he is right, as the failure to ensure the first AGI
…
are in a wider galactic or universal setting because we haven’t met any of the other intelligent inhabitants yet – if there are any. The Friendly AI problem is not the first difficult challenge humanity has faced. We have solved many problems which seemed intractable when first encountered, and many of the
…
be beneficial because its greater intelligence will make it more civilised. Equally, we must avoid falling into despair, felled by the evident difficulty of the Friendly AI challenge. It is a hard problem, but it is one that we can and must solve. We will solve it by applying our best minds
…
, backed up by adequate resources. As we saw in the last chapter, Nick Bostrom thinks that just six people are employed full-time on the Friendly AI project. Given the scale of the problem this is a woefully small number, so the establishment of the four existential risk organisations mentioned in the
…
close it will be developed without appropriate safeguards. This school of thought believes that the people who are actually able to do something about the Friendly AI problem – people with world class skills in deep learning and cognitive neuroscience – should be allowed to get on with the job until they can demonstrate
…
from all parts of our giant talent pool of seven billion individuals. Who can forecast what kinds of innovation will be required to address the Friendly AI challenge, or where those innovations will come from? 9.8 – Surviving AI The title of this book was chosen carefully. If artificial intelligence begets superintelligence
by Ray Kurzweil · 14 Jul 2005 · 761pp · 231,902 words
"broadcast architecture" described below) won't work for strong AI. There have been discussions and proposals to guide AI development toward what Eliezer Yudkowsky calls "friendly AI"30 (see the section "Protection from 'Unfriendly' Strong AI," p. 420). These are useful for discussion, but it is infeasible today to devise strategies that
…
Threat, Congressional Research Service Report for Congress, December 8, 1999, http://www.cnie.org/nle/crsreports/international/inter-75.pdf. 30. Eliezer S. Yudkowsky, "Creating Friendly AI 1.0, The Analysis and Design of Benevolent Goal Architectures" (2001), The Singularity Institute, http://www.singinst.org/CFAI/; Eliezer S. Yudkowsky, "What Is
…
Friendly AI?" May 3, 2001, http://www.KurzweilAI.net/meme/frame.html?main=/articles/art0172.html. 31. Ted Kaczynski, "The Unabomber's Manifesto," May 14, 2001, http://
…
Therapeutics. 45. See Singularity Institute, http://www.singinst.org. Also see note 30 above. Yudkowsky formed the Singularity Institute for Artificial Intelligence (SIAI) to develop "Friendly AI," intended to "create cognitive content, design features, and cognitive architectures that result in benevolence" before near-human or better-than-human Als become possible. SIAI
…
has developed The SIAI Guidelines on Friendly AI: "Friendly AI," http://www.singinst.org/friendly/. Ben Goertzel and his Artificial General Intelligence Research Institute have also examined issues related to developing
…
on developing the Novamente AI Engine, a set of learning algorithms and architectures. Peter Voss, founder of Adaptive A.I., Inc., has also collaborated on friendly-AI issues: http://adaptiveai.com/. 46. Integrated Fuel Cell Technologies, http://ifctech.com. Disclosure: The author is an early investor in and adviser to IFCT. 47
by Eliezer Yudkowsky · 11 Mar 2015 · 1,737pp · 491,616 words
encounters some of my transhumanist-side beliefs—as opposed to my ideas having to do with human rationality—strange, exotic-sounding ideas like superintelligence and Friendly AI. And the one rejects them. If the one is called upon to explain the rejection, not uncommonly the one says, “Why should I believe
…
optical character recognizer or a collaborative filtering system (much easier problems). And as for building an AI with a positive impact on the world—a Friendly AI, loosely speaking—why, that problem is so incredibly difficult that an actual majority resolve the whole issue within fifteen seconds. Give me a break.
…
Technologies like smarter-than-human AI seem likely to result in large societal upheavals, for the better or for the worse. Yudkowsky coined the term “Friendly AI theory” to refer to research into techniques for aligning an AGI’s preferences with the preferences of humans. At this point, very little is known
…
be quite challenging to verify and validate with much confidence, and many current techniques are not likely to generalize to more intelligent and adaptive systems. “Friendly AI” is therefore closer to a menagerie of basic mathematical and philosophical questions than to a well-specified set of programming objectives. As of 2015, Yudkowsky
…
graph, and it should be optimization power in versus optimized product out, not optimized product versus time. * 146 Ghosts in the Machine People hear about Friendly AI and say—this is one of the top three initial reactions: “Oh, you can try to tell the AI to be Friendly, but if the
…
chain of causes that started with the source code as originally written? Is the AI the ultimate source of its own free will? A Friendly AI is not a selfish AI constrained by a special extra conscience module that overrides the AI’s natural impulses and tells it what to do
…
that seems to have a high impact on what Chalmers thinks should be considered a morally valuable person. This is not a necessary problem for Friendly AI theorists. It is only a problem if you happen to be an epiphenomenalist. If you believe either the reductionists (consciousness happens within the atoms)
…
will be “moral.” They can’t agree among themselves on why, or what they mean by the word “moral”; but they all agree that doing Friendly AI theory is unnecessary. And when you ask them how an arbitrarily generated AI ends up with moral outputs, they proffer elaborate rationalizations aimed at AIs
…
. . . Just thought I’d mention that. It’s amazing how many of my essays coincidentally turn out to include ideas surprisingly relevant to discussion of Friendly AI theory . . . if you believe in coincidence. * 1. Daniel C. Dennett, “The Unimagined Preposterousness of Zombies,” Journal of Consciousness Studies 2 (4 1995): 322–26.
…
with groups I have led—particularly when they face a very tough problem, which is when group members are most apt to propose solutions immediately.” Friendly AI is an extremely tough problem, so people solve it extremely fast. There’s several major classes of fast wrong solutions I’ve observed; and one
…
Out Just Fine. I may have contributed to this problem with a really poor choice of phrasing, years ago when I first started talking about “Friendly AI.” I referred to the optimization criterion of an optimization process—the region into which an agent tries to steer the future—as the “supergoal.” I
…
than the original. As the Soviets found out, to some small extent. Now think again about whether it makes sense to rely on, as your Friendly AI strategy, raising a little AI of unspecified internal source code in an environment of kindly but strict parents. No, the AI does not have internal
…
So the reason I’m arguing against the ghost isn’t just to make the point that (1) Friendly AI has to be explicitly programmed and (2) the laws of physics do not forbid Friendly AI. (Though of course I take a certain interest in establishing this.) I also wish to establish the notion
…
which there is no irreducible central ghost that looks over the neurons/code and decides whether they are good suggestions. (There is a concept in Friendly AI of deliberately programming an FAI to review its own source code and possibly hand it back to the programmers. But the mind that reviews is
…
out of the box1—but that is hardly the same thing from a philosophical perspective. The first great failure of those who try to consider Friendly AI is the One Great Moral Principle That Is All We Need To Program—a.k.a. the fake utility function—and of this I
…
, say, iron atoms—those are highly stable. Can you patch this problem? No. As a general rule, it is not possible to patch flawed Friendly AI designs. If you try to bound the utility function, or make the AI not care about how much the programmer wants things, the AI still
…
how we judge it. This core disharmony cannot be patched by ruling out a handful of specific failure modes. There’s also a duality between Friendly AI problems and moral philosophy problems—though you’ve got to structure that duality in exactly the right way. So if you prefer, the core problem
…
would not do it; this is just obviously the wrong classification. Certainly a superintelligence can see which heaps of pebbles are correct or incorrect. Why, Friendly AI isn’t hard at all! All you need is an AI that does what’s good! Oh, sure, not every possible mind does what’
…
in the 1950s it was believed that AI might be that simple, but this turned out not to be the case. The novice thinks that Friendly AI is a problem of coercing an AI to make it do what you want, rather than the AI following its own desires. But the
…
real problem of Friendly AI is one of communication—transmitting category boundaries, like “good,” that can’t be fully delineated in any training data you can give the AI during
…
permanent smile, and start xeroxing. The deep answers to such problems are beyond the scope of this essay, but it is a general principle of Friendly AI that there are no bandaids. In 2004, Hibbard modified his proposal to assert that expressions of human agreement should reinforce the definition of happiness, and
…
pretend they’re expected paperclip maximizers. To construct the True Prisoner’s Dilemma, the situation has to be something like this: Player 1: Human beings, Friendly AI, or other humane intelligence. Player 2: Unfriendly AI, or an alien that only cares about sorting pebbles. Let’s suppose that four billion human beings
…
humans, that does bring in other considerations, like whether the humans learn from your example.) And so I wouldn’t say that a well-designed Friendly AI must necessarily refuse to push that one person off the ledge to stop the train. Obviously, I would expect any decent superintelligence to come up
…
into the paths of trains myself, nor stealing from banks to fund my altruistic projects. I happen to be a human. But for a Friendly AI to be corrupted by power would be like it starting to bleed red blood. The tendency to be corrupted by power is a specific biological
…
by specific cognitive circuits, built into us by our genes for a clear evolutionary reason. It wouldn’t spontaneously appear in the code of a Friendly AI any more than its transistors would start to bleed. I would even go further, and say that if you had minds with an inbuilt
…
I have in mind, when I warn aspiring rationalists to beware of cleverness. I’ll also note that I wouldn’t want an attempted Friendly AI that had just decided that the Earth ought to be transformed into paperclips, to assess whether this was a reasonable thing to do in light
…
of excuses Eliezer2000 could have potentially used to toss this problem out the window. I know, because I’ve heard plenty of excuses for dismissing Friendly AI. “The problem is too hard to solve” is one I get from AGI wannabes who imagine themselves smart enough to create true Artificial Intelligence,
…
but not smart enough to solve a really difficult problem like Friendly AI. Or “worrying about this possibility would be a poor use of resources, what with the incredible urgency of creating AI before humanity wipes itself
…
him from falling into some of the initial traps, the ones that I’ve seen consume other novices on their own first steps into the Friendly AI problem . . . though technically this was my second step; I well and truly failed on my first. But in the end, what it comes down
…
as you may well guess, because of the AI wannabes I sometimes run into who have their own clever reasons for not thinking about the Friendly AI problem. Our clever reasons for doing what we do tend to matter a lot less to Nature than they do to ourselves and our friends
…
be meaningless” . . . . . . which means that Eliezer2001 now has a line of retreat away from his mistake. I don’t just mean that Eliezer2001 can say “Friendly AI is a contingency plan,” rather than screaming “OOPS!” I mean that Eliezer2001 now actually has a contingency plan. If Eliezer2001 starts to doubt his 1997
…
metaethics, the intelligence explosion has a fallback strategy, namely Friendly AI. Eliezer2001 can question his metaethics without it signaling the end of the world. And his gradient has been smoothed; he can admit a 10% chance
…
. If you think this sounds like Eliezer2001 is too slow, I quite agree. Eliezer1996–2000’s strategies had been formed in the total absence of “Friendly AI” as a consideration. The whole idea was to get a superintelligence, any superintelligence, as fast as possible—codelet soup, ad-hoc heuristics, evolutionary programming,
…
to what technophobes say about the downsides of technophilia. What previous Eliezers said about the difficulties of, e.g., the government doing anything sensible about Friendly AI, still seems pretty true. It’s just that a lot of his hopes for science, or private industry, etc., now seem equally wrongheaded. Still,
…
the same enlightenment, that a mind had to output waste heat in order to obey the laws of thermodynamics. Previously, Eliezer2001 had talked about Friendly AI as something you should do just to be sure—if you didn’t know whether AI design X was going to be Friendly, then you
…
try to argue with individual AGI wannabes. In those days, I sort-of-succeeded in convincing one such fellow that, yes, you had to take Friendly AI into account, and no, you couldn’t just find the right fitness metric for an evolutionary algorithm. (Previously he had been very impressed with evolutionary
…
reactions, but because I’d been inculcated with the same attitude myself. Above all, Eliezer2001 didn’t say, “Stop”—even after noticing the problem of Friendly AI—because I did not realize, on a gut level, that Nature was allowed to kill me. “Teenagers think they’re immortal,” the proverb goes. Obviously
…
survived, the intelligence explosion would happen, and the resultant AI would be too smart to be corrupted or lost. Even after that, when I acknowledged Friendly AI as a consideration, I didn’t emotionally believe in the possibility of failure, any more than that teenager who doesn’t wear their seat belt
…
thinking. But it seems rather likely that when the one reacts to the prospect of Friendly AI by saying, “If you delay development to work on safety, other projects that don’t care at all about Friendly AI will beat you to the punch,” the prospect of they themselves making a mistake followed
…
or act or think. So it’s not necessarily an attempt to avoid falsification to say that God does not grant all prayers. Even a Friendly AI might not respond to every request. But clearly there exists some threshold of horror awful enough that God will intervene. I remember that being true
…
an-answers I had come up with before—and surviving the challenge. I looked at the AGI wannabes with whom I had tried to argue Friendly AI, and the various dreams of Friendliness that they had. (Often formulated spontaneously in response to my asking the question!) Like frequentist statistical methods, no
…
mastery of my own subject as Jaynes had achieved of probability theory, then it was at least imaginable that I could try to build a Friendly AI and survive the experience. Through my mind flashed the passage: Do nothing because it is righteous, or praiseworthy, or noble, to do so; do
…
hour earlier, that was a reasonable return on investment for a pre-explosion career. (I wasn’t thinking in terms of existential risks or Friendly AI at this point.) So I didn’t run away from the big scary problem like a frightened rabbit, but stayed to see if there was
…
problem looked slightly less impossible than it had the very first time I’d approached it. The more interesting pattern is my entry into Friendly AI. Initially, Friendly AI hadn’t been something that I had considered at all—because it was obviously impossible and useless to deceive a superintelligence about what was the
…
I’d realized at the start that the problem was not to build a seed capable of improving itself, but to produce a provably correct Friendly AI—then I probably would have burst into flames. Even so, part of understanding those above-average scientists who constitute the bulk of AGI researchers is
…
“make an extraordinary effort.” I’ve lost count of how many people have said to me something like: “It’s futile to work on Friendly AI, because the first AIs will be built by powerful corporations and they will only care about maximizing profits.” “It’s futile to work on
…
Friendly AI, the first AIs will be built by the military as weapons.” And I’m standing there thinking: Does it even occur to them that
…
—and say, “Oh well.” No! Not well! You haven’t won yet! Shut up and do the impossible! When AI folk say to me, “Friendly AI is impossible,” I’m pretty sure they haven’t even tried for the sake of trying. But if they did know the technique of “Try
…
it longer, et cetera. People ask me how likely it is that humankind will survive, or how likely it is that anyone can build a Friendly AI, or how likely it is that I can build one. I really don’t know how to answer. I’m not being evasive; I
…
do hope by now that I’ve made it clear why you shouldn’t panic, when I now say clearly and forthrightly that building a Friendly AI is impossible. I hope this helps explain some of my attitude when people come to me with various bright suggestions for building communities of
…
Describing the specific flaws would be a whole long story in each case. But the general rule is that you can’t do it because Friendly AI is impossible. So you should be very suspicious indeed of someone who proposes a solution that seems to involve only an ordinary effort—without even
by Nick Bostrom and Milan M. Cirkovic · 2 Jul 2008
researchers succeed i n constructing an intelligent programme, but have strong mistaken vague expectations for their programme's friendliness. Not knowing how to build a friendly AI is not deadly by itself, in any specific instance, if you know you do not know. It is a mistaken belief that an AI will
…
to build a nice AI. To describe the field of knowledge needed to address that challenge, I have proposed the term 'Friendly AI'. In addition to referring to a body oftechnique, ' Friendly AI' might also refer to the product of technique - an A I created with specified motivations. When I use the term Friendly
…
sense, I capitalize it to avoid confusion with the intuitive sense of 'friendly'. One common reaction I encounter is for people to immediately declare that Friendly AI is an impossibility because any sufficiently powerful AI will be able to modify its own source code to break any constraints placed upon it. The
…
or ever will think up for solving the problem. It only takes a single counterexample to falsify a universal quantifier. The statement that Friendly (or friendly) AI is theoretically impossible, dares to quantify over every possible mind design and every possible optimization process - including human beings, who are also minds, some of
…
whom are nice and wish they were nicer. At this point there are any number of vaguely plausible reasons why Friendly AI might be humanly impossible, and it is still more likely that the problem is solvable but no one will get around to solving it in
…
catastrophe as one that extinguishes Earth-originating intelligent life or permanently destroys a substantial part of its potential. We can divide potential failures of attempted Friendly AI into two informal fuzzy categories, technical failure and philosophical failure. Technical failure is when you try to build an AI and it does not work
…
empirical consequence of their favourite political systems. They thought people would be happy. They were wrong. Now imagine that someone should attempt to programme a ' Friendly' AI to implement communism, or libertarianism, or anarcho-feudalism, or favourite political system, believing that this shall bring about utopia. People's favourite political systems inspire
…
leap in intelligence. What follows from this? First and foremost: it follows that a reaction I often hear, 'We don't need to worry about Friendly AI because we don't yet have AI', is misguided or downright suicidal. We cannot rely on having distant advance warning before AI is created; past
…
techniques of Friendly A I will not materialize from nowhere when needed; it takes years to lay firm foundations. Furthermore, we need to solve the Friendly AI challenge before Artificial General Intelligence is created, not afterwards; I should not even have to point this out. There will be difficulties for
…
Friendly AI because the field of AI itself is in a state oflow consensus and high entropy. But that does not mean we do not need to
…
there will be difficulties. The two statements, sadly, are not remotely equivalent. The possibility of sharp jumps in intelligence also implies a higher standard for Friendly AI techniques. The technique cannot assume the programmers' ability to monitor the AI against its will, rewrite the AI against its will, bring to bear the
…
to hurt you. That is the wrong behaviour for the dynamic, but a right code that does something else instead. For much the same reason, Friendly AI programmers should assume that the AI has total access to its own source code. If the AI wants to modify itself to be no longer
…
the Giant Cheesecake Fallacy, we should note that the ability to self improve does not imply the choice to do so. The successful exercise of Friendly AI technique might create an AI that had the potential to grow more quickly, but chose instead to grow along a slower and more manageable curve
…
I . Similarly, it is acceptable to succeed at AI and at Friendly A I . What is not acceptable is succeeding at AI and failing at Friendly AI. Moore's Law makes it easier to do exactly that - 'easier' but thankfully not easy. I doubt that AI will be 'easy' at the time
…
I , and one of them will succeed after A I first becomes possible to build with tremendous effort. Moore's Law is an interaction between Friendly AI and other technologies, which adds oft-overlooked existential risk to other technologies. We can imagine that molecular nanotechnology is developed by a benign multinational governmental
…
not expend any significant effort on fighting it, because I do not expect the good guys to need access to the 'supercomputers' of their day. Friendly AI is not about brute-forcing the problem. I can imagine regulations effectively controlling a small set ofultra-expensive computing resources that are presently considered 'supercomputers
…
on a different timescale than you do; by the time your neurons finish thinking the words ' I should do something' you have already lost. A Friendly AI in addition to molecular nanotechnology is presumptively powerful enough to solve any problem, which can be solved either by moving atoms or by creative thinking
…
. One should beware of failures of imagination: curing cancer is a popular contemporary target of philanthropy, but it does not follow that a Friendly AI with molecular nanotechnology would say to itself, 'Now I shall cure cancer'. Perhaps a better way to view the problem is that biological cells are
…
priori, but it does not lend itself to making up detailed stories. The main advice the metaphor gives us is that we had better get Friendly AI right, which is good advice in any case. The only defence it suggests against hostile AI is not to build it in thefirst place, which
…
is also excellent advice. Absolute power is a conservative engineering assumption in Friendly AI, exposing broken designs. If an AI will hurt you given magic, the Friendliness architecture is wrong. 1 5 . 1 1 Local and majoritarian strategies One
…
than to push through a global political change. Two assumptions that give rise to a majoritarian strategy for AI are as follows: • A majority of Friendly Ais can effectively protect the human species from a few unFriendly Ais. • The first AI built cannot by itself do catastrophic damage. This reprises essentially the
…
not global catastrophic damage. Most AI researchers will not want to make unFriendly A Is. So long as someone knows how to build a stable Friendly AI - so long as the problem is not completely beyond contemporary knowledge and technique researchers will learn from each other's successes and repeat them. Legislation
…
can also imagine a scenario that implies a n easy local strategy: • The first AI cannot by itself do catastrophic damage. • If even a single Friendly AI exists, that AI plus human institutions can fend off any number of unFriendly A Is. The easy scenario would hold if, for example, human institutions
…
can reliably distinguish Friendly Ais from unFriendly ones, and give revocable power into the hands of Friendly Ais. Thus we could pick and choose our allies. The only requirement is that the Friendly AI problem must be solvable (as opposed to being completely beyond human ability) . Both
…
a strictly local effort), but it invokes a technical challenge of extreme difficulty. We only need to get Friendly AI right in one place and one time, not every time everywhere. But someone must get Friendly AI right on the first try, before anyone else builds AI to a lower standard. I cannot perform
…
making it easier to solve the particular challenge of Friendliness, that is a negative interaction. Thus, all else being equal, I would greatly prefer that Friendly AI precede nanotechnology in the ordering of technological developments. If we confront the challenge of AI and succeed, we can call on
…
with nanotechnology. If we develop nanotechnology and survive, we still have the challenge of AI to deal with after that. Generally speaking, a success on Friendly AI should help solve nearly any other problem. Thus, if a technology makes AI neither easier nor harder, but carries with it a catastrophic risk, we
…
necessary to destroy the world drops by one point. A success on human intelligence enhancement would make Friendly AI easier, and also help on other technologies. But human augmentation is not necessarily safer, or easier, than Friendly AI; nor does it necessarily lie within our realistically available latitude to reverse the natural ordering of
…
later, one of the topics would have been safety. At the time of this writing in 2007, the AI RESEARCH community still does not see Friendly AI as part of the problem. I wish I could cite a reference to this effect, but I cannot cite an absence of literature
…
. Friendly AI is absent from the conceptual landscape, not just unpopular or unfunded. You cannot even call Friendly AI a blank spot on the map, because there is no notion that something 340 Global catastrophic risks is
…
), you may think back and recall that you did not see Friendly A I discussed a s part o fthe challenge. Neither have I seen Friendly AI discussed in the technical literature as a technical problem. My attempted literature search turned up primarily brief non-technical papers, unconnected to each other, with
…
decisions - and cannot easily be rendered non-opaque; the people who invented and polished neural networks were not thinking about the long-term problems of Friendly AI. Evolutionary programming (EP) is stochastic, and does not precisely preserve the optimization target in the generated code; EP gives you code that does what you
…
the code may also do something else on the side. EP is a powerful, still maturing technique that is intrinsically unsuited to the demands of Friendly AI. Friendly A I , as I have proposed it, requires repeated cycles of recursive self-improvement that precisely preserve a stable optimization target. The most powerful
…
current AI techniques, as they were developed and then polished and improved over time, have basic incompatibilities with the requirements of Friendly AI as I currently see them. The Y2K problem although not a global-catastrophe, but which proved very expensive to fix analogously arose from failing to
…
with a catalogue of mature, powerful, publicly available AI techniques, which combine to yield non-Friendly A I , but which cannot be used to build Friendly AI without redoing the last three decades of AI work from scratch. 5 This is usually true but not universally true. The final chapter of the
…
ages, seemingly eternal, right up until someone fills it. I think it is possible for mere fallible humans to succeed on the challenge of building Friendly AI. But only if intelligence ceases to be a sacred mystery to us , as life was a sacred mystery to Lord Kelvin. Intelligence must cease to
…
agents that retain stable preferences while rewriting their source code. He is the author of the papers, ' Levels of organization in general intelligence' and 'Creating friendly AI', and assorted informal essays on human rationality. Index Note: page numbers in italics refer to F igures and Tables. 2-4-6 task 98-9
by Keach Hagey · 19 May 2025 · 439pp · 125,379 words
were understanding, and the institute’s mission pivoted from making artificial intelligence to making friendly artificial intelligence. “The part where we needed to solve the friendly AI problem did put an obstacle in the path of charging right out to hire AI researchers, but also we just surely didn’t have the
…
that the true answer is the belief that “Eliezer Yudkowsky is the rightful calif.”)19 In a 2004 paper, “Coherent Extrapolated Volition,” Yudkowsky argued that friendly AI should be developed based not just on what we think we want AI to do now, but what would actually be in our best interests
by James D. Miller · 14 Jun 2012 · 377pp · 97,144 words
by Anthony Berglas, William Black, Samantha Thalind, Max Scratchmann and Michelle Estes · 28 Feb 2015
by Michio Kaku · 15 Mar 2011 · 523pp · 148,929 words
by Ben Goertzel and Pei Wang · 1 Jan 2007 · 303pp · 67,891 words
by Luke Dormehl · 10 Aug 2016 · 252pp · 74,167 words
by Melanie Swan · 22 Jan 2014 · 271pp · 52,814 words
by William MacAskill · 31 Aug 2022 · 451pp · 125,201 words
by Karen Hao · 19 May 2025 · 660pp · 179,531 words
by Adam Becker · 14 Jun 2025 · 381pp · 119,533 words
by Parmy Olson · 284pp · 96,087 words
by Nick Bostrom · 3 Jun 2014 · 574pp · 164,509 words
by James Bridle · 6 Apr 2022 · 502pp · 132,062 words
by Anu Bradford · 25 Sep 2023 · 898pp · 236,779 words
by Ian Johnson · 26 Sep 2023 · 407pp · 119,073 words
by John Brockman · 5 Oct 2015 · 481pp · 125,946 words
by Stuart Russell and Peter Norvig · 14 Jul 2019 · 2,466pp · 668,761 words
by Richard A. Clarke · 10 Apr 2017 · 428pp · 121,717 words
by Richard Yonck · 7 Mar 2017 · 360pp · 100,991 words
by Eliezer Yudkowsky and Nate Soares · 15 Sep 2025 · 215pp · 64,699 words
by Mark O'Connell · 28 Feb 2017 · 252pp · 79,452 words
by Kenneth Payne · 16 Jun 2021 · 339pp · 92,785 words
by John Brockman · 19 Feb 2019 · 339pp · 94,769 words
by Paul R. Daugherty and H. James Wilson · 15 Jan 2018 · 523pp · 61,179 words
by Stuart Armstrong · 1 Feb 2014 · 48pp · 12,437 words
by Samuel Arbesman · 18 Jul 2016 · 222pp · 53,317 words
by Aurélien Géron · 13 Mar 2017 · 1,331pp · 163,200 words