I am now able to think about things like AI risk and feel like the concepts are real, not just verbal. This was the point of my modeling the world project. I’ve generated a few intuitions around what’s important in AI risk, including a few considerations that I think are being neglected. There are a few directions my line of research can be extended, and I’m looking for collaborators to pick this up and run with it.
I intend to write about much of this in more depth, but since EA Global is coming up I want a simple description to point to here. These are just loose sketches, so I'm trying to describe rather than persuade.
Contents
New intuitions around AI risk
- Better cybersecurity norms would probably reduce the chance of an accidental singleton.
- Transparency on AI projects’ level of progress reduces the pressure towards an AI arms race.
- Safety is convergent - widespread improvements in AI value alignment improve our chances at a benevolent singleton, even in an otherwise multipolar dynamic.
Cybersecurity
From my writeup - undefended hardware might in itself constitute a large, discontinuous first-mover advantage:
One AI project could acquire a large amount of external resources quickly. These resources could in turn be used to quickly and substantially increase the intelligence of the AI. This might be enough of a lead for the AI project to bootstrap itself to a level where it is smart enough to seize total physical control over its local environment (e.g. the Earth and nearby astronomical bodies), eliminating plausible competitors. [...]
An AI project could extract a large amount of money from the global financial system through fraud. Since electronic fraud can be committed very quickly, the AI could acquire control of these funds before financial institutions, regulators, or other policymaking bodies have a chance to respond. [...]
At some point most AI research is likely to be performed by AIs. The rate at which AIs can perform research depends strongly on the capacity of the hardware on which they run. There may be diminishing returns to hardware at a given level of algorithmic progress, but a project focused on improving its own algorithms might continue to benefit steeply from additional hardware. [...]
The first general intelligence with the specialized skills necessary to subvert networked computing hardware might have enough of an advantage over existing defenses, to quickly acquire direct access to a large fraction of computing hardware. Much computing hardware could potentially be subverted unnoticed for a substantial period of time (e.g. personal computers during times that they are not in use by their owners).
Right now there is a lot of undefended computer hardware, as well as poorly defended money in the financial system. Undefended resources could be used by a just barely general AI with good hacking skills, to bootstrap itself into a proper superintelligence. The harder it is to access a huge reservoir of undefended resources, the less the chance of an accidental or rogue intelligence explosion (which is less likely to be successfully value-aligned).
Transparency
From my writeup:
If a singleton is feasible, then even if it does not appear likely to happen by default based on short-run economic incentives, actors with foresight might invest in creating one in order to capture outsized long-run benefits by having all available resources used to satisfy their preferences. Other agents with foresight, anticipating this possibility, then face an even stronger incentive to create a singleton, because they no longer perceive themselves as facing a choice between controlling a singleton and participating in a multipolar outcome with mutually beneficial trade, but between winning everything and getting nothing. In this case, the equilibrium scenario may be an “arms race” in which all parties try to create an AI singleton under their control as quickly as possible, investing all of their resources in this project.
Differences in opinion about the feasibility of a singleton are likely to amplify this tendency: if even one AI project believes that a singleton would be profitable, other AI projects have an incentive to invest in winning the arms race, in self-defense. This works even if every other AI project believes that the initiator is excessively optimistic, and that the resource expenditure necessary to create a singleton is not worth the gains over a multipolar outcome.
The possibility of secrecy is likely to amplify the effect of differences of opinion: AI projects not only need to worry about the most ambitious project they know about, but the most ambitious project they do not know about.
If a singleton is feasible, one factor favoring an arms race would be the fear that some other team would make a singleton first. If teams are confident that they know how fast other teams are working, they will at worst try to match the pace of the fastest actual team. If they are not, then they have to take into account that the farthest ahead team may be secret. Getting major AI teams talking with each other seems important for this reason.
Safety is convergent
My initial impression of takeoff scenarios was that safety strategies might be very different under singleton vs multipolar scenarios. But now I think that improvements in value alignment are likely good ideas under both types of scenario. In any situation where humans would be well-advised to coordinate to prevent AI arms races and prevent unfriendly singletons, AIs that are value-aligned and slightly smarter than humans are even more likely to execute this strategy successfully, than the unassisted humans would have been.
Even if safety comes with a substantial efficiency penalty, if the vast majority of AI power is in the form of human-value-aligned AIs, then they might be strong and wise enough to form a consortium and gang up on any potential unfriendly singletons.
This means that if there is a version of the value alignment problem that is easier to solve for slightly superhuman than vastly superhuman AIs, it’s worth people working on this, even if it’s an inadequate solution in the long run - because it’s a strict improvement in the amount of friendly intelligence working on the problem, and helps us in scenarios where there's a substantial amount of time between the development of human-level AI and superintelligence.
Extending my research - opportunities to collaborate
I’m currently prioritizing other things but it seems like some people found my research valuable, and I’d like to see it extended. Also I keep hearing about people who want to do something about AI risk but don’t know how to help. I think I can solve one of these problems with the other.
Please let me know if you have any interest in collaborating with me on this, as a partner or research assistant. Don’t assume that you can’t do the work; let’s figure out what you can do, together. I’m happy to help anyone move the ball forward on this, in any way. If you end up needing funding, guidance, or other resources, I think it’s pretty likely we can work something out.
Potential research areas:
- Cybersecurity
- Review of the field - talk to people working on it, figure out what subdomains would be most relevant to AI risk, figure out if people worried about AI risk should be working on this.
- Extend theoretical models of AI scenarios
- Validate and deepen my takeoff scenarios model:
- Do a lit review of other work on modeling intelligence takeoffs to see what theoretical considerations I left out, and incorporate them.
- Make the presentation more accessible - maybe the one big writeup should be broken up into a series of linked articles. This could conceivably also take the form of a wiki other people can expand, or Arbital content once that tool becomes available.
- Talk to people doing related research and figuring out if they think it leaves out important considerations. Including neglected considerations.
- Do a similar modeling exercises in different domains (e.g. how the open problems in AI safety such as those described by MIRI or this joint paper relate to each other and what you would have to believe about AI development to believe that each one would be relevant.)
- Validate and deepen my takeoff scenarios model:
- Mathematical modeling of different tradeoffs to inform empirical research
- I published a sample as a proof of concept, but I’m not an economist so it took me forever just to rederive something that turned out to be a simple differential equation expressing the Cobb-Douglas production function. Anyone with experience modeling systems mathematically should be able to do this much faster, and possibly write it up more persuasively.
I want to have a viable project lined up before I start actively looking for funding, since I suspect it will be hard to find someone who can do the work and has the time to do so, but if you like what you've seen so far and would be excited about funding any of this or something related, please let me know that too.
Cross-posted from my research blog.
Pingback: Lessons learned from modeling AI takeoffs, and a call for volunteers « Ben Models the World