OpenAI makes humanity less safe

If there's anything we can do now about the risks of superintelligent AI, then OpenAI makes humanity less safe.

Once upon a time, some good people were worried about the possibility that humanity would figure out how to create a superintelligent AI before they figured out how to tell it what we wanted it to do.  If this happened, it could lead to literally destroying humanity and nearly everything we care about. This would be very bad. So they tried to warn people about the problem, and to organize efforts to solve it.

Specifically, they called for work on aligning an AI’s goals with ours - sometimes called the value alignment problem, AI control, friendly AI, or simply AI safety - before rushing ahead to increase the power of AI.

Some other good people listened. They knew they had no relevant technical expertise, but what they did have was a lot of money. So they did the one thing they could do - throw money at the problem, giving it to trusted parties to try to solve the problem. Unfortunately, the money was used to make the problem worse. This is the story of OpenAI.

Before I go on, two qualifiers:

  1. This post will be much easier to follow if you have some familiarity with the AI safety problem. For a quick summary you can read Scott Alexander’s Superintelligence FAQ. For a more comprehensive account see Nick Bostrom’s book Superintelligence.
  2. AI is an area in which even most highly informed people should have lots of uncertainty. I wouldn't be surprised if my opinion changes a lot after publishing this post, as I learn relevant information. I'm publishing this because I think this process should go on in public.

The story of OpenAI

Before OpenAI, there was DeepMind, a for-profit venture working on "deep learning” techniques. It was widely regarded as the advanced AI research organization. If any current effort was going to produce superhuman intelligence, it was DeepMind.

Elsewhere, industrialist Elon Musk was working on more concrete (and largely successful) projects to benefit humanity, like commercially viable electric cars, solar panels cheaper than ordinary roofing, cheap spaceflight with reusable rockets, and a long-run plan for a Mars colony. When he heard the arguments people like Eliezer Yudkowsky and Nick Bostrom were making about AI risk, he was persuaded that there was something to worry about - but he initially thought a Mars colony might save us. But when DeepMind’s head, Demis Hassabis, pointed out that this wasn't far enough to escape the reach of a true superintelligence, he decided he had to do something about it:

Hassabis, a co-founder of the mysterious London laboratory DeepMind, had come to Musk’s SpaceX rocket factory, outside Los Angeles, a few years ago. […] Musk explained that his ultimate goal at SpaceX was the most important project in the world: interplanetary colonization.

Hassabis replied that, in fact, he was working on the most important project in the world: developing artificial super-intelligence. Musk countered that this was one reason we needed to colonize Mars—so that we’ll have a bolt-hole if A.I. goes rogue and turns on humanity. Amused, Hassabis said that A.I. would simply follow humans to Mars.


Musk is not going gently. He plans on fighting this with every fiber of his carbon-based being. Musk and Altman have founded OpenAI, a billion-dollar nonprofit company, to work for safer artificial intelligence.

OpenAI’s primary strategy is to hire top AI researchers to do cutting-edge AI capacity research and publish the results, in order to ensure widespread access. Some of this involves making sure AI does what you meant it to do, which is a form of the value alignment problem mentioned above.

Intelligence and superintelligence

No one knows exactly what research will result in the creation of a general intelligence that can do anything a human can, much less a superintelligence - otherwise we’d already know how to build one. Some AI research is clearly not on the path towards superintelligence - for instance, applying known techniques to new fields. Other AI research is more general, and might plausibly be making progress towards a superintelligence. It could be that the sort of research DeepMind and OpenAI are working on is directly relevant to building a superintelligence, or it could be that their methods will tap out long before then. These are different scenarios, and need to be evaluated separately.

What if OpenAI and DeepMind are working on problems relevant to superintelligence?

If OpenAI is working on things that are directly relevant to the creation of a superintelligence, then its very existence makes an arms race with DeepMind more likely. This is really bad! Moreover, sharing results openly makes it easier for other institutions or individuals, who may care less about safety, to make progress on building a superintelligence.

Arms races are dangerous

One thing nearly everyone thinking seriously about the AI problem agrees on, is that an arms race towards superintelligence would be very bad news. The main problem occurs in what is called a “fast takeoff” scenario. If AI progress is smooth and gradual even past the point of human-level AI, then we may have plenty of time to correct any mistakes we make. But if there’s some threshold beyond which an AI would be able to improve itself faster than we could possibly keep up with, then we only get one chance to do it right.

AI value alignment is hard, and AI capacity is likely to be easier, so anything that causes an AI team to rush makes our chances substantially worse; if they get safety even slightly wrong but get capacity right enough, we may all end up dead. But you’re worried that the other team will unleash a potentially dangerous superintelligence first, then you might be willing to skip some steps on safety to preempt them. But they, having more reason to trust themselves than you, might notice that you’re rushing ahead, get worried that your team will destroy the world, and rush their (probably safe but they’re not sure) AI into existence.

OpenAI promotes competition

DeepMind used to be the standout AI research organization. With a comfortable lead on everyone else, they would be able to afford to take their time to check their work if they thought they were on the verge of doing something really dangerous. But OpenAI is now widely regarded as a credible close competitor. However dangerous you think DeepMind might have been in the absence of an arms race dynamic, this makes them more dangerous, not less. Moreover, by sharing their results, they are making it easier to create other close competitors to DeepMind, some of whom may not be so committed to AI safety.

We at least know that DeepMind, like OpenAI, has put some resources into safety research. What about the unknown people or organizations who might leverage AI capacity research published by OpenAI?

For more on how openly sharing technology with extreme destructive potential might be extremely harmful, see Scott Alexander’s Should AI be Open?, and Nick Bostrom’s Strategic Implications of Openness in AI Development.

What if OpenAI and DeepMind are not working on problems relevant to superintelligence?

Suppose OpenAI and DeepMind are largely not working on problems highly relevant to superintelligence. (Personally I consider this the more likely scenario.) By portraying short-run AI capacity work as a way to get to safe superintelligence, OpenAI’s existence diverts attention and resources from things actually focused on the problem of superintelligence value alignment, such as MIRI or FHI.

I suspect that in the long-run this will make it harder to get funding for long-run AI safety organizations. The Open Philanthropy Project just made its largest grant ever, to Open AI, to buy a seat on OpenAI’s board for Open Philanthropy Project executive director Holden Karnofsky. This is larger than their recent grants to MIRI, FHI, FLI, and the Center for Human-Compatible AI all together.

But the problem is not just money - it’s time and attention. The Open Philanthropy Project doesn’t think OpenAI is underfunded, and could do more good with the extra money. Instead, it seems to think that Holden can be a good influence on OpenAI. This means that of the time he's allocating to AI safety, a fair amount has been diverted to OpenAI.

This may also make it harder for organizations specializing in the sort of long-run AI alignment problems that don't have immediate applications to attract top talent. People who hear about AI safety research and are persuaded to look into it will have a harder time finding direct efforts to solve key long-run problems, since an organization focused on increasing short-run AI capacity will dominate AI safety's public image.

Why do good inputs turn bad?

OpenAI was founded by people trying to do good, and has hired some very good and highly talented people. It seems to be doing genuinely good capacity research. To the extent to which this is not dangerously close to superintelligence, it’s better to share this sort of thing than not – they could create a huge positive externality. They could construct a fantastic public good. Making the world richer in a way that widely distributes the gains is very, very good.

Separately, many people at OpenAI seem genuinely concerned about AI safety, want to prevent disaster, and have done real work to promote long-run AI safety research. For instance, my former housemate Paul Christiano, who is one of the most careful and insightful AI safety thinkers I know of, is currently employed at OpenAI. He is still doing AI safety work – for instance, he coauthored Concrete Problems in AI Safety with, among others, Dario Amodei and John Schulman, other OpenAI researchers.

Unfortunately, I don’t see how those two things make sense jointly in the same organization. I’ve talked with a lot of people about this in the AI risk community, and they’ve often attempted to steelman the case for OpenAI, but I haven’t found anyone willing to claim, as their own opinion, that OpenAI as conceived was a good idea. It doesn’t make sense to anyone, if you’re worried at all about the long-run AI alignment problem.

Something very puzzling is going on here. Good people tried to spend money on addressing an important problem, but somehow the money got spent on the thing most likely to make that exact problem worse. Whatever is going on here, it seems important to understand if you want to use your money to better the world.

(Cross-posted at LessWrong)

55 thoughts on “OpenAI makes humanity less safe

  1. PDV

    Largely due to the influence of AI Safety-inclined people willing to work at OpenAI, and now Holden on the board, OpenAI is not OpenAI as originally conceived. Everyone I've talked to who was at all close to the issue agrees that the "Open" in the name is in name only, and has been relegated to a background value to be respected only as long as it is safe, given lip service because it's still an applause light and signal of prosocial programming to people who have thought less deeply.

    (I also find it odd that you'd call Paul careful, as he is by far the most optimistic/panglossian and least careful of serious AI safety researchers working today, from what I can tell from his writing, frequently making large unstated assumptions in frameworks he's proposing would be safe if they could be achieved.)

    1. Person

      This seems consistent with many things I've seen, except for one: In the recent panel discussion at Asilomar that involved Elon Musk, in his opening statement, his two points were (1) that AI needs to be democratised, and (2) we need to create a neural lace.

      This signalled to me that Elon's main concern still surrounds democratisation of AI, and I feel that he will still influence the thinking and mission of all employees at OpenAI. Which made me update that OpenAI is more likely to be damaging.

    2. Benquo Post author

      Have you published specific criticisms of Paul's safety work yet? If not, I'd encourage you to. I think Paul would too.

          1. Jacob Steinhardt

            I'll disagree and say that I enjoy Paul's writing (as do several others I know). It's definitely not targeted at non-experts in ML though.

          2. Buck

            I think Paul's writing is pretty easy to read; I guess my only complaint is that it's sometimes to find the post where he explains something.

        1. Zack M. Davis

          Um, surely you can't expect to be taken more seriously by claiming that you know better than Paul and then refusing to provide any arguments whatsoever when someone explicitly asks for them? Are you consciously bluffing here and expecting to get away with it?

        2. Zack M. Davis

          Let me unpack what I'm trying to get at in more detail: if it's not worth the costs to write up an argument for a claim because people won't believe you, then why bother making the claim? Making a claim without an argument makes sense if the audience contains people who trust you enough to believe your claim without an argument. But if the audience consists of people who won't believe you even with an argument, then they certainly won't believe you without one!

          1. Anon the 2nd

            Zach, I perceive you as operating under an "arguments as soldiers" frame. You say: why bother send your soldiers out if they don't have rifles? They will be easily defeated. Consider also the "conversation as information transfer" frame. In this frame, maybe I put forth some effort to transmit information to you. I was not obliged to do this, and I'm also not obliged to provide further information. Maybe if Paul responded to PDV and said "I promise to read whatever criticism you write of me and take it seriously" then PDV would see an incentive to put forth the effort to formulate & transmit additional info.
            (Separately, I think PDV is making a mistake and underestimating the number of people who are reading the comments he writes in this thread/the number of people who will take his comments seriously, despite the fact that he isn't a big name, if his comments seem well reasoned.)

    3. Benquo Post author

      I also think it's worth responding to OpenAI's marketing at face value and pointing out that if you take it literally it's harmful in expectation, regardless of whether they have a secret plan to do better. It seems like something is going pretty badly wrong if, when people make public promises to do things experts think would be harmful, there aren't many public complaints.

      ETA: I actually believe that winks by insiders are of limited value compared with institutional incentives and public coordination. (See this piece by Matt Yglesias on why politicians keep most policy promises for some of the reasoning.)

      1. Miles

        Re: OpenAI's "marketing," I think there is more nuance than you give them credit for. Sure, Musk says lots of things, but Sutskever and Brockman (each a senior person) have both explicitly said they'd not release dangerous stuff in the future, and this is in fact discussed in the current mission statement on the website.

        Lots of people talk about democratizing AI outside OpenAI, and there is vastly more code/pseudocode released by other organizations, including a big fraction of DeepMind's intellectual work. It is fair to ask if this overall level of openness is too high, which may depend on one's assumptions about timelines. But I don't see a massive difference between DeepMind and OpenAI in terms of philosophy arouns ipenness, other than the name (personally, BeneficialAI or GoodAI seem better to me but the latter is taken 🙂 ).

        Lastly, I agree that arms races are very important vut am skeptical of the OpenAI-->arms race theory. If there is any effect, it's a matter of degree, and I agree with someone else's point that it's not just about DeepMind and OpenAI. Other things cause arms races besides new organizations. To be fair in your analysis, you should probably also consider the fact that AlphaGo was directly cited as an accelerant of a Korean AI investment and brought tons of attention to AI in Asia more broadly.

        1. Benquo Post author

          This doesn't seem like nuance to me - it seems more like strategic ambiguity. Perhaps we should just believe that OpenAI is a bid for the "Good Guys AI" brand, and distrust all specific promises because they blatantly contradict each other. That might just be the right attitude for anyone who isn't personally involved, and most people who are.

    4. silver

      Does that mean you support the grant, and how confident are you that this [OpenAI not really being open] is effectively true?

  2. taion

    This article seems to derive from a misunderstanding of the state of machine learning research.

    It wasn't just the DeepMind show before OpenAI started, and DeepMind and OpenAI still aren't the only players in the field.

    Consider FAIR, Google Brain, &c. Heck, OpenAI just lost one of their most prominent ML researchers (back) to Google Brain a few months ago.

    It makes almost no sense to describe the state of the field as a two-party arms race between DeepMind and OpenAI. That's really just a factually inaccurate premise.

    1. Benquo Post author

      "DeepMind is just not that especially big a deal" does seem plausible - possibly I have been misled and Demis Hassabis is just especially good at self-promotion. (Open Phil also seems to think DeepMind is the only other serious institutional player, so if I'm mistaken, I'm not the only one.)

      1. taion

        To be more concise, there's no good way to define the group of institutions such that it's just DeepMind and OpenAI. If OpenAI counts, then at least FAIR and Google Brain count as well, and probably also Microsoft and Baidu – and those 4 other groups all predate OpenAI.

  3. taion

    I do think DeepMind are the top group in reinforcement learning specifically, but there are plenty of other large, prestigious, advanced industry research groups. Most of them probably don't explicitly claim to be directly working on human-level intelligence, but frankly saying you're working on AGI mostly just sounds silly and pretentious to people in the field.

    But compare FAIR,
    "Facebook Artificial Intelligence Researchers (FAIR) seek to understand and develop systems with human level intelligence by advancing the longer-term academic problems surrounding AI."

    Or Google Brain,
    "Make machines intelligent. Improve people’s lives."

    Or Microsoft AI,
    "At Microsoft, researchers in artificial intelligence are harnessing the explosion of digital data and computational power with advanced algorithms to enable collaborative and natural interactions between people and machines that extend the human ability to sense, learn and understand. The research infuses computers, materials and systems with the ability to reason, communicate and perform with humanlike skill and agility."

    You'd need a painfully contorted definition of criterion (a) to end up with just DeepMind and OpenAI – basically by reading more into PR than into mission statements or actual research.

    And I think Vicarious is generally regarded as somewhere between a joke and a scam.

    1. Noah

      Looks like OpenAI is just another company alongside these many other research groups. So maybe the OP is better cast as "AI research organizations make humanity less safe". I may well be missing something, but it doesn't seem like Ben's original points rely much on openness.

      Also, why focus on research groups rather than individual actors? After all, so much ML/AI research is publicly available on arXiv, blogs, etc. Probably, influencing research groups is a better strategy for people who want power over the future. Groups have more power than individuals since there are just more of them doing research (though maybe groupthink could create problems), and influencing each of a group of N people maybe doesn't take N times more effort than influencing a person the same amount.

  4. Andrew Schreiber

    The hypothesis here seems to be that OpenAI is making humanity less safe by fueling an arms race. It's not the first time I've heard it, and I've come to strongly disagree.

    First, some common ground: an arms race is brewing.

    OpenAI's role there is massively dwarfed by AlphaGo.

    When AlphaGo upset Lee Sedol 4-1, then proceeded to wipe the floor with the rest of the Go community pros (60-0), it hardly went unnoticed in Asia. The game is thousands of years old, a far deeper part of Korean, Japanese, and Chinese culture than chess is here. Their top scientists and government officials will not let DeepMind humiliate them again so easily.

    China will have a the worlds most powerful supercomputer up this year - 70 petaflops. Japan is building a 120 petaflops supercomputer dedicated specifically to ML research. We all know how much China likes losing face to Japan; expect bigger supercomputers.

    The AI Safety community skews hard Anglo-American. OpenAI and DeepMind have offices less than 50 miles apart. It's easy to forget forget that Asia has highly talented ML researchers. A dominant first place in 2016 ImageNet went to CUImage, second place Hikvision. That isn't Carnegie Mellon and Harvard.

    Culturally, Asia is much more amenable to AGI. There is no Cartesian "consciousness" or "soul" reservation we have in the West. There has never been a Chinese AI Winter. Their national strategists can calculate the power of AGI just as lucidly as we can.

    Don't expect the arms race to slow down. The world is becoming more nationalistic and less cooperative. I'd bet attempts to slow international progress to increase safety will be viewed in bad faith in China, because Andrew Ng has used his prestige there to mollify safety concerns on state TV with the "overpopulation on Mars" line.

    As far as I can see, the arms race is ON. OpenAI and DeepMind are far from the only players. Humanity's hope doesn't lie in trying to sneak the AGI cat back in the bag, but rather progressing in AI Safety as rapidly as possible. If we can open-source a robust AGI Safety testing suite, it might not matter who gets there first. To that end, OpenAI is a massive boon.

  5. Daniel Eth

    "It seems to be doing genuinely good capacity research... Separately, many people at OpenAI seem genuinely concerned about AI safety... Unfortunately, I don’t see how those two things make sense jointly in the same organization."

    Actually, I think there is a large benefit here. Many people will take AI Safety much more seriously if it's being proposed by an organization that is doing great capacity research as well. MIRI has often had a lot of difficulty getting people to listen to them, while if Facebook or Google were proposing similar ideas, they would be taken more seriously.

    1. Benquo Post author

      They don't just need to treat the words as prestigious words to say, some of them would need to actually do AI safety research. It's a bit harder for me to see how OpenAI increases the latter on net - obviously it increases the former, but at the cost of substantial watering-down (e.g. implying that ensuring widespread access is a "safety" measure).

  6. Abram Demski

    In order for AI alignment research to make AI safer, I think there has to be active collaboration or at least open lines of communication between the AI alignment community and whoever is at the forefront of AI (at least, this needs to be the case around the time human-level AI becomes possible). I think this may tip the balance in favor of openAI being good for safety.

  7. Paul Christiano

    You make three separate charges here, which I want to briefly respond to. I'm obviously speaking entirely for myself in this post.

    1. OpenAI's work is probably a distraction from the main business of aligning AI. I argued [here]( that we should work on alignment for ML, and you didn't really engage with that argument. I do agree that today OpenAI is not investing much in alignment.

    2. OpenAI's existence makes AI development more competitive and less cooperative. I agree that in general it's harder to coordinate people if they are spread across N+1 groups than if they are spread across N groups (though I think this article significantly overstates the effect). To the extent that we are all in this to make the world better and make credible commitments to that effect, we are free to talk and coordinate arbitrarily closely. In general I think it's nearly as plausible that adding an (N+1)st sufficiently well-intentioned group would improve rather than harm coordination. So I suspect the real disagreement between you and the OpenAI founders is whether OpenAI will really have a stronger commitment to making AI go well.

    Put more sharply: supposing that you were in Elon's position and thought that Google and DeepMind were likely to take destructive actions, would you then reason "but adding an (N+1)st player would make things worse all else equal, so I guess I'll leave it to them and hope for the best"? If not, then it seems like you are focusing on the wrong part of the disagreement here.

    I do think that it's important that OpenAI get along well with all of the established players, especially conditioned on OpenAI being an important player and others also being willing to play ball regarding credible commitments to pro-social behavior.

    3. OpenAI's openness makes AI development more competitive and less cooperative. I do agree that helping more people do AI research will make coordination harder, all else equal, and that openness makes it easier for more people to become involved in AI. (Though this is an ironic departure from the theme of your recent writings.) The point of openness is to do other good things, e.g. to improve welfare of existing people.

    I think that current openness has a pretty small impact on alignment, and the effect on other concerns is larger. If you share my view, then this isn't a good place for someone interested in alignment to ask for a concession (compared to pushing for more investment in alignment or closer cooperation amongst competitors).

    Some quick arguments against the effect being big: the prospect of a monopoly on AI development has always been extremely remote; limited access to 2017 AI results won't be an important driver of participation in cutting edge AI research in the future (as compared to access to computing hardware and researchers); and there is a compensating effect where openness amongst competitors would make the situation more cooperative and less competitive (if it were actually done).

    An unconditional commitment to publishing everything could certainly become problematic. I think that OpenAI's strongest commitments are to broad access to the benefits of AI and broad input into decision-making about AI. Those aren't controversial positions, but I'm sure that Elon doesn't expect DeepMind to live up to them. I would certainly have preferred that OpenAI have communicated about this differently.

    For what it's worth, I think that the discussion of this topic by EA's is often unhelpful: if everyone agrees that there is a desirable conception of openness, and an organization has "open" in it's name, then it seems like you should be supporting efforts to adopt the desirable interpretation rather than trying to argue that the original interpretation was problematic / trying to make people feel bad about sloppy messaging in the past.

    1. Benquo Post author

      I owe you a response on (1).

      On (2), if OpenAI's not going to be a standout player with one to very few rivals, then its main effect* is eating up unjustified buzz. That seems like it would slow down both AI and AI safety, but slow down AI safety more because not all AI research institutions are safety-branded.

      On (3) maybe OpenAI might try persuading Elon Musk first that its safety plan isn't just AI for everybody. If he's not persuaded of that, then I don't see why I should be, since I have far less control over and access to OpenAI than he does. Overall I am not very willing to assume that if I hear both X and Y and prefer Y, that Y is true.

      I think our substantive disagreement on (3) depends on (1). It's imaginable to me that prosaic AI safety is enough, but in that world "AI Safety" doesn't really need to be a thing, because it's just part of capacity research. I put substantial probability (>50%) on MIRI being right because AGI is qualitatively different in ways that need qualitatively different safety work. In that scenario it's bad to conflate AI safety measures with weak AI capacity sharing, since then people will work on the easier problem and call it the harder one.

      Separately, I think creating weak AI capacity and sharing it with the world is probably really good, and I'm glad people are doing it, and I'm glad people are working on making it not stupid. I just don't think that needs the term "AI Safety" or its various synonyms.

      *The main effect of the organization itself. The researchers would presumably just be doing AI research somewhere else.

      1. Paul Christiano

        > but in that world "AI Safety" doesn't really need to be a thing, because it's just part of capacity research

        Why is this true?

        > I just don't think that needs the term "AI Safety" or its various synonyms.

        OpenAI describes its mission is described as "build safe AGI, and ensure AGI's benefits are as widely and evenly distributed as possible" ( Those are two different things with different benefits.

        > On (3) maybe OpenAI might try persuading Elon Musk first that its safety plan isn't just AI for everybody.

        I think Elon's view is that democratization of AI is important to avoiding some undesirable situations. I don't think he expects openness to resolve the alignment problem, which he recognizes as a problem. (I disagree with his overall view of alignment, but that's a separate discussion.) Those are just two different steps to obtaining a good outcome, both of which are necessary.

  8. Jeffrey Ladish

    In light of the huge uncertainty about timelines and capabilities, having an organization guided by people close to the AI Safety field seems wise. Furthermore, AI Safety gains a lot of credibility by making real progress on ML problems. That kind of cultural influence is both difficult and potentially highly effective. Whatever AI researchers end up being the ones to build AGI, it's vital that they have taken seriously the research from the AI Safety research community. This kind of cultural change can't come from MIRI or FHI, because they can't produce anything as credible (to AI researchers, grad students, etc.) as actual advances in the field.

  9. Sarah Constantin

    There's an alternative reason to oppose the OpenAI/OpenPhil grant, though it's a longer and looser chain of reasoning.

    Eliezer Yudkowsky's original model of "AI Safety" entails gaining a fundamental understanding of how to ensure provable safety of even a superintelligent, rapidly self-improving artificial intelligence.

    This is a hard problem -- it is hugely underspecified, for one thing -- so it is a very long-term project. Given that I think strong AI will not be here for a long time, I think this is fine.

    Paul Christiano, the leading safety researcher at OpenAI, has a somewhat different model of "AI safety" that involves working on more tractable problems of bounding and aligning the actions of "prosaic AIs" like, for instance, a reinforcement learner that functions as a virtual corporation. Christiano's hypothetical "prosaic AIs" are weaker than Yudkowsky's notion of "strong AI" -- for example, they need not (and, I think, probably would not) be recursively self-improving. They would not even have to be "general intelligences" to fit Christiano's criterion of "can replicate human behavior".

    The methods for dealing with "prosaic AIs" that I've heard about are qualitatively different from the thinking that was common in MIRI/SingInst in the old days. There, people thought largely about game theory and decision theory -- one assumes an agent *can* do whatever it wants to do, ignoring implementation details, and one thinks about aligning its incentives so that it chooses to do desirable things. In a machine-learning paradigm, by contrast, one makes assumptions about how the AI learns and responds to information, and imagines building it to have certain safeguards in its learning process. In other words, it's solving a much easier problem, about a *known* (if poorly interpretable) machine, rather than an arbitrarily advanced and self-improving machine. Most of the early debates on LessWrong were about asking whether one could "just" put safeguards into an AI (limiting the scope of its behavior, sometimes called "tool AI" or "oracle AI"), and Yudkowsky's answer was "no." Like Christiano, Holden Karnofsky believed (as of 2012) that tool AI was likely to be a feasible approach to safety.

    I think there are probably *many* qualitatively different classes of "strong AI" of differing "strength". Some AIs which already exist (e.g. AlphaGo and image-recognition deep learning algorithms) are "human-level" in that they solve challenging cognitive tasks better than the best humans. But these are "narrow AIs", trained on a single task. One could imagine a future AI that was "general", more like a human toddler (or even a dog), which could learn a variety of behaviors and adapt to a range of environments without requiring correspondingly more training data. One could imagine AIs that are "conceptual" (that develop robust abstractions/concepts invariant over irrelevant properties) or "logical" (capable of drawing inferences over properties of agglomerative processes like grammar or computable functions). And there are recursively self-improving AIs, which rewrite their own source code in order to better achieve goals.

    It seems very likely that defenses against the risks of "weaker" AIs will not work against "stronger" AIs, and Christiano's "prosaic AI" is among the weaker types of "strong AI."

    This is fine in itself -- there's nothing wrong with working on an easy problem before tackling a hard one.

    However, I think Karnofsky and Christiano are incorrect in believing (or promoting the idea) that this easy problem is the *whole* AI safety problem, or the bulk of it.

    And I think, given that OpenAI is the biggest and most visible institution working on "AI Safety", this grant will lead to the belief, within the (rather large) community of technical people interested in AI, that the "easy problem" of prosaic AI control is the whole of the problem of AI safety.

    It also gives the impression that working on AI safety is easily contiguous with being a conventional machine learning researcher -- getting a PhD in ML, working at software companies with big research divisions, and so on. You can go from that world to AI safety, and return from AI safety to the world of ML, entirely painlessly and with no cost to career capital.

    I think that the meat of the AI safety problem will involve creating entirely new fields of mathematics or computer science -- it's that hard a problem -- and thus will *not* be nicely contiguous with a career as an ML researcher/engineer. But people have strong incentives to prefer to believe in a world where solving the most important problems requires no sacrifice of professional success, so they're incentivized to believe that AI safety is relatively easy and tractable with the toolkit of already-existing ML.

    "AI safety is basically like ML" is a *dangerously seductive meme*, and, I believe, untrue. "We already know how to model the mind; it works like a neural net" is also a dangerously seductive meme, for somewhat different reasons (it's flattering to people who know how to build deep learning networks if their existing toolkit explains all of human thought), and I also believe it's untrue. Promoting those memes among the very set of people who are best equipped to work on AI safety, or other fields such as cognitive science or pure machine learning research, is harmful to scientific progress as well as to safety.

    1. Paul Christiano

      > This is fine in itself -- there's nothing wrong with working on an easy problem before tackling a hard one.

      As far as I can tell, the MIRI view is that my work is aimed at problem which is *not possible,* not that it is aimed at a problem which is too easy. The MIRI view is not "If we just wanted to align a human-level consequentialist produced by evolution, that would be no problem. We're concerned about the challenge posed by *real* AI."

      One part of this is the disagreement about whether the overall approach I'm taking could possibly work, with my position being "something like 50-50" the MIRI position being "obviously not" (and normal ML researchers' positions being skepticism about our perspective on the problem).

      There is a broader disagreement about whether any "easy" approach can work, with my position being "you should try the easy approaches extensively before trying to rally the community behind a crazy hard approach" and the MIRI position apparently being something like "we have basically ruled out the easy approaches, but the argument/evidence is really complicated and subtle."

      1. Sarah Constantin

        >As far as I can tell, the MIRI view is that my work is aimed at problem which is *not possible,* not that it is aimed at a problem which is too easy.

        This surprises me, and I think I haven't heard about this.

        Are you saying that they believe that you *can't*, in principle, constrain a reinforcement learner with things like adversarial examples or human feedback?

        1. Paul Christiano

          > Are you saying that they believe that you *can't*, in principle, constrain a reinforcement learner with things like adversarial examples or human feedback?

          I think that all the MIRI researchers believe this will be exceptionally hard, and most believe it won't be possible for humans to do, if you want a solution that will work for arbitrarily powerful RL systems. (Note that we more or less know that model-free RL can get you to human-level consequentialism, if you are willing to spend as much computation time as evolution did and use an appropriate multi-agent environment.) I'm not sure about their views on "in principle" and it may depend on how you read that phrase.

          1. Anonymous

            > Note that we more or less know that model-free RL can get you to human-level consequentialism, if you are willing to spend as much computation time as evolution did and use an appropriate multi-agent environment.

            How do we know this?

          2. Jessica Taylor

            > Note that we more or less know that model-free RL can get you to human-level consequentialism, if you are willing to spend as much computation time as evolution did and use an appropriate multi-agent environment.

            How do we know this? Also how is this compatible with the Atari games not being solved yet?

          3. Jeff Kaufman

            >> Note that we more or less know that model-free RL can get you to human-level consequentialism, if you are willing to spend as much computation time as evolution did and use an appropriate multi-agent environment.

            > How do we know this?

            I interpreted this to be a reference to the evolution of humans.

        2. Zvi Mowshowitz

          I would say that the Christiano approach is *both* far easier and probably impossible. Conditional on the approach being practical/possible, it represents a far easier and more practical path. However, there is a good chance (Paul says 50/50 here, MIRI says something approaching 1) that the approach is not workable at all.

          I am in the middle but much closer to MIRI, and think it is unlikely that a sufficiently strong reinforcement learner could be constrained, even in principle, by adversarial examples or human feedback, while allowing it to be anything approaching maximally useful (as per Paul's condition that safety not be too expensive, which in context seems right).

          I don't even think we have shown that we can in principle contain HUMANS via adversarial examples or human feedback while still allowing them to be anything approaching maximally useful. I haven't even heard reasonably plausible ideas for doing so!

          1. Benquo Post author

            The most important thing about creating value-aligned automation systems made of humans probably isn't that it's intrinsically easier than building something right from scratch. The problem is that if we don't have value-aligned institutions, and we try to build value-aligned AGI, we're in the position of instructing an unsafe weak AI to make a safe strong AI.

            However, it should also probably be easier to get to safety on systems that are made of humans, most of whose actuators are humans, and where a human is in the loop on nearly every high-level decision they make, than on systems that can go much faster than humans in ways that quickly become entirely opaque to us.

      2. Rob Bensinger

        I think Paul's characterization is right, except I think Nate wouldn't say "we've ruled out all the prima facie easy approaches," but rather something like "part of the disagreement here is about which approaches are prima facie 'easy.'" I think his model says that the proposed alternatives to MIRI's research directions by and large look more difficult than what MIRI's trying to do, from a naive traditional CS/Econ standpoint. E.g., I expect the average game theorist would find a utility/objective/reward-centered framework much less weird than a recursive intelligence bootstrapping framework. There are then subtle arguments for why intelligence bootstrapping might turn out to be easy, which Nate and co. are skeptical of, but hashing out the full chain of reasoning for why a daring unconventional approach just might turn out to work anyway requires some complicated extra dialoguing. Part of how this is framed depends on what problem categories get the first-pass "this looks really tricky to pull off" label.

        1. Paul Christiano

          > the proposed alternatives to MIRI's research directions by and large look more difficult than what MIRI's trying to do, from a naive traditional CS/Econ standpoint. E.g., I expect the average game theorist would find a utility/objective/reward-centered framework much less weird than a recursive intelligence bootstrapping framework.

          If by "weird" we mean "a weird way to build a safe AI," and by "average game theorist" we mean "average algorithmic game theorist," then I don't think this is true right now. Moreover, I doubt that anyone's views will change if/when it becomes clear that this isn't true.

  10. Sarah Constantin

    There are also outside-view reasons to be skeptical of the OpenPhil grant to OpenAI.

    OpenPhil has encouraged Good Ventures *not* to rapidly spend down its endowment on charitable causes, despite the fact that Dustin Moskovitz has expressed the goal of giving away his fortune in his lifetime. Now, OPP recommends its largest ever grant -- to the organization that employs two of Holden Karnofsky's long-time friends and roommates. (Not, for instance, to Stuart Russell, the world's most prominent AI safety researcher.)

    From the outside, this looks like nepotism.

    It's especially unfavorable that OPP's reasoning ( doesn't involve an overview of organizations and individual researchers working on AI safety. It simply says that "technical advisors" judge OpenAI and DeepMind as the major players in the field (not mentioning Google Brain, FAIR, IARPA, Baidu, etc) and that OPP could influence AI safety for the better by starting a partnership with OpenAI. This suggests that it's not that they think OpenAI is the *most effective* AI safety org, but that it's the best candidate for a partnership with OPP. This is plausible, given the close existing connections between OPP and OpenAI researchers. But this policy is out of line with the GiveWell-style policy of evaluating the impact of charities and donating to the most effective ones. If, instead of giving to the *best* organizations, you give to the ones that you think you can get most value out of influencing for the better, it's much harder to give a public accounting of why your spending has good outcomes, since all of your positive influence is happening behind closed doors.

    1. Sarah Constantin

      To be clear, I don't think that personal relationships are inherently unfair. In any kind of selection process, from hiring to investing to donating, personal relationships are going to influence who you consider and how favorably you consider them.

      But usually, professional fairness involves the assumption that while personal relationships are usable for hypothesis generation, some sort of *impersonal* objective criteria should be used for evaluation. You give your friend a chance to interview at your company, you don't just give her a job.

      It seems perfectly natural that a lot of the people working in the same field are going to get to know each other. It seems a little off that they're just going straight to the collaboration/influence stage without passing through any "fair" tests (whether a GiveWell-style review of impacts, or a market mechanism, or public discourse.)

      I think this means that outsiders should think of OpenAI more or less as they think of the Santa Fe Institute or the Institute for New Economic Thinking -- like "Ok, some bright people with some ideas have been given some money to play with, let's see what they do with it, the results could be anywhere from great to nonsense." OpenAI should not be thought of the way people think of Harvard (as, like, correct by default) or the way LessWrongers think of MIRI (as "our team" or "our friends.")

      1. Aceso Under Glass

        Full Disclosure: I'm friends with Dario and know things through him I can't share here. I've also outsourced my opinion on AI risk to him since before he was working at OpenAI.

        > OpenAI should not be thought of the way people think of Harvard (as, like, correct by default)

        Is there anywhere that should be thought of like that? Because definitely not Harvard.

        > This is plausible, given the close existing connections between OPP and OpenAI researchers.

        It's not just that- the other major players are for-profit or government and thus cannot receive donations. Their grant to MIRI was much smaller in absolute terms but a much larger percentage of MIRI's budget.

        > It seems perfectly natural that a lot of the people working in the same field are going to get to know each other. It seems a little off that they're just going straight to the collaboration/influence stage without passing through any "fair" tests.

        I think OPP has been moving away from the concept of fairness for a very long time, and that's a good thing, for reasons currently locked in 3 unfinished blog posts. Grants from OPP are not supposed to prizes in fair competitions, they're supposed to effect as much change as possible. This is a problem if people treat them as fair competitions and especially if OPP doesn't fight this perception, but they've always discouraged people from donating and publicized that they're not following the Find The Best philosophy ( A strike against them is the name open: that's obviously incorrect and misleading.

      2. Benquo Post author

        I think I agree with both Sarah and Aceso Under Glass here. The nepotism thing wouldn't seem bad to me in isolation - the conflicts of interest for the (much smaller) MIRI grant were similarly myriad. I'm more worried about the implied strategy of "give lots of money to whoever has the most, in order to influence them."

  11. Pingback: Effective Altruism is self-recommending | Compass Rose

  12. Pingback: Reading Group Session 63 -

  13. Pingback: Defense against discourse | Compass Rose

  14. Pingback: OpenAI makes humanity less safe – Amalgamated Contemplation

  15. Pingback: Drowning children are rare | Compass Rose

  16. Pingback: Approval Extraction Advertised as Production | Compass Rose

  17. Pingback: Case study: CFAR – Everything to Save It

Leave a Reply

Your email address will not be published. Required fields are marked *