Existential risk is the risk of an event that would wipe out humanity. That means, not just present lives, but all the future ones. Populations can recover from lots of things, but not from complete extinction. If you value future people at all, you might care a lot about even a small reduction in the probability of an extinction event for humanity.

There are two big problems with reasoning about existential risk:

1) By definition, we have never observed the extinction of humanity.

This means that we don't have an uncensored dataset to do statistics on - extrapolating from the past will give our estimates anthropic bias: people only observe the events that don't wipe out humanity, but events that wipe out humanity are possible. Therefore, our past observations are a biased subset that make the universe look safer than it is.

2) Our intuitions are terrible about this sort of thing.

To reason about existential risk, we have to tell stories about the future, think about probabilities, sometimes very small ones, and think about very large numbers of people. Our brains are terrible at all these things. For some of these beliefs, there's just no way to build a "feedback loop" to test your beliefs quickly in small batches - and that's the main way we know how to figure out when we're making a mistake.

Moreover, we have to do this on a topic that evokes strong emotions. We're not talking about the extinction of some random beetle here. We're talking about you, and me, and everyone we know, and their grandchildren. We're talking about scary things that we desperately don't want to believe in, and promising technologies we want to believe are safe. We're talking about things that sound a lot like religious eschatology. We're talking about something weird that the world as a whole hasn't quite yet decided is a normal thing to be worried about.

Can you see how rationality training might be helpful here?

I'm writing this on a flight to Oakland, on my way to CFAR's workshop to test out in-development epistemic rationality material. A few months ago, I expressed my excitement at what they've already done in the field of instrumental rationality, and my disappointment at the lack of progress on training to help people have accurate beliefs.

During conversations with CFAR staff on this topic, it became clear to me that I cared about this primarily because of Existential Risk, and I strongly encouraged them to develop more epistemic rationality training, because while it's very hard to do, it seems like the most important thing.

In the meantime, I've been trying to figure out what the best way is to train myself to make judgments about existential risks, and about the best thing I can do to help mitigate them.

It turns out that it's really hard to fix something without a very specific idea of how it's broken. So I'm going to use the inadequate tools I have, on this flight, without access to experts or the ability to look up data, to build the best bad estimate I can of:

How much X-risk is there?

How important is mitigating it compared to saving a life now?

What works to mitigate existential risk, and how well?

What are the best next steps for me?

__How Much X-Risk Is There?__

Off the top of my head I can think of three methods for quantifying just how likely humanity is to die off soon:

Laplace's law of succession

Enumerate and figure out how likely each is

Meta-considerations like doomsday argument.

Enumerating every single existential risk seems unlikely, and bias-prone. I'm going to miss a lot of them (which will make my estimate low), but the ones I do hit on will seem more probably (because of the availability heuristic), so this method seems very, very noisy.

I understand the math behind the arguments like the Doomsday Argument and the Great Filter well enough to follow them, a little bit, but not well enough to actually make predictions. (This has gone on my list of things to get better at. Soon I'll be looking into this, this, and this as starting points.)

__Laplace's Law of Succession__

Laplace's Law of Succession is a very simple Bayesian way of estimating the probability an event will happen one more time, if all you have observed is a string of unbroken "successes" - a succession. The classic case is, "what is the probability that the sun will rise tomorrow?"

It turns out that if you assume the event recurs with a constant probability, then if you've only observed n successes, the probability of success each time is (n+1)/(n+2). This isn't very useful when you have a strong prior. For example, if you get a coin in change at a store, and it comes up heads the first two times you flip it, it's silly to infer that the probability of heads next time is 3/4, because you have strong reasons to believe it's not a weighted coin.

Since I lack any kind of reasonable prior information about existential risk, Laplace's Law is a promising approach. It makes a huge unjustified assumption that risk doesn't change over time, but until I have a way of quantifying how that risk changes over time, it's better than nothing. Also, it's much easier to do calculations with than more sophisticated Bayesian approaches.

The first thing I'm going to look at is, how long have humans been able to accidentally destroy the world? Even a large, highly motivated, skilled, intelligent, and coordinated team of paleolithic humans would have had trouble getting anywhere close. On the other hand, by the time Stanislav Petrov had to singlehandedly save humanity from possible extinction, we had intercontinental nuclear missiles, basic artificial intelligence, international travel (i.e. disease vectors), and a functional academy churning out scientific breakthroughs daily, any one of which might have enabled some smart idiot to end the world. Many of these factors are quite recent, so I'm going to say that people have only been able to destroy the world since around 1960. (This is a Fermi estimate, I'm looking for numbers that are correct within an order of magnitude.)

It's been 54 years since 1960, and in every single one of those years, the world has survived. Therefore, by Laplace's Law of Succession the annual probability that the we don't kill ourselves off by accident is 55/56, or about 98.2%. That implies a 1.8% chance we wipe ourselves out in any given year. Wow, that's high.

To give you an idea how high that risk is, let's assume that it doesn't get any worse, or any better - then the number of years we have left follows a geometric distribution, with parameter p=0.018. The expected value of the geometric distribution is 1/p=1/0.018, or about 55 years.

55 years is not a lot of time to get our act together.

That's not even counting the risk from things that aren't man-made. We've been around for maybe a million years (Likely off by at least order of magnitude, but I don't know which way, and I'm not looking things up because there would be no end to that), which - again, by Laplace's law of succession - means we've got about a one in a million chance each year of dying off because of an asteroid strike or a supernova or video games finally decisively outcompeting having kids or the eventual heat death of the universe. One in a million means the expected number of people who die each year because of this is, out of a world population of 7 billion, is about 7 thousand - not huge, but non-negligible. (Of course, in reality it either happens to all of us or none of us, and doesn't repeat from year to year - but it's at least as important as something that reliably kills a thousand people each year. More, if you care about the future at all.)

That also doesn't account for some of the truly weird existential risks, like the possibility that we're living in a simulation which will be too expensive to maintain if we do anything really complicated, but I have absolutely no idea what that would imply anyway, so I'm going to ignore that for now, not because it's good epistemic hygiene but because otherwise I simply will not finish this blog post.

So humanity has an expected 55 years left. What then?

**How Big a Deal Is This?**

The obvious answer is true, but I'm going to go throught the exercise anyway - it may help me notice skills I'd need to make comparisons between different existential risks.

Let's take the simplest case first, even though it leaves out a lot of things that favor the importance of X-risk over other considerations.

If all I care about is people currently alive (a huge untrue assumption), then at least in the "effective altruism" space it's easy to calculate the relative importance of things, because we're comparing like with like. X-Risk is about as big a deal as anything else killing off 1.8% of the world's population each year, or about 126,000,000 people each year. That's probably bigger than any other one problem - but about the same order of magnitude as the problem that everyone's body stops working with age. Most people die of age or age-related medical problems, and people live to about 70, so again under a naive geometric model that gives us a similar probability - each person has about a 1.4% chance of dying due to non-existential events each year.

War is a lot smaller, probably by an order of magnitude, but still pretty material. Maybe something like 1 in 10 deaths are due to violence of some kind, so that's 0.14% per year, leaving 1.26% per year for medical and age-related reasons. You can divide that into infectious diseases like TB and malaria, and mainly age-related problems, and I'm going to guess a 1:4 ratio, just to have a nice clean 1% chance of age-related death per person per year, and 0.25% chance of death by bacteria or virus.

I'm so used to X-Risk massively dominating every other possible effective charity, it surprised me that if all you care about is people alive now, "cure aging" is about as promising as "eliminate existential risk", if you judge by the magnitude of the problem alone. And "cure all infectious diseases" loses by less than an order of magnitude - which is well within the margin of error for this kind of back-of-the-envelope calculation.

Most of us do want to consider the value of future lives, though, even if this makes things more complicated. To a first approximation, nothing except existential risk affects how many people live in the future much at all. (As heartbreaking as it is, people do seem to choose how many kids to have in order to get the right expected number of surviving children, which means I don't expect aging, infection, or violence to matter much for this.)

So how many future people are there, and how much are their lives worth to me?

__The Worth of Future Lives__

So, how much are future lives worth? The reason I left this out of the last section is that it's a really hard problem I don't know how to solve. I'll try anyway - problems don't go away just because they're hard - but my confidence in this part is very, very low.

Utility functions can't be measured, but indifference curves can. So I asked myself how I'd manage a few tradeoffs involving coin flips (e.g. if there are 1,000 people in the universe, would I take a 50/50 chance of 100 vs. 10,000 people left in the universe over a 100% chance of 1,000 people), and I'm going to provisionally approximate my "altruistic" utility function as the log of the total number of person-life-years lived. There are about 7x10^9 people alive now, each of whom will live for about 30 more years, so I value that at about log(2x10^11), or about 11. If the world's population stays about the same size (which it might, sort of, with the demographic transition), we have about 5 billion years before the sun blows up, for a utility of around log(7x10^9x5x10^9)=log(4x10^19), or about 19. Then if we get off this rock, that's a lot more people over a lot more universe. I don't really remember how big or long the universe is supposed to be, so I'm just going to make a wild guess of an average of a trillion people over a trillion years, for a score of log(10^12*10^12)=24.

By comparison, saving one current child's life gets a score of about log(7x10^9+1)-log(7x10^9), or about 6x10^-11.

Curing aging without solving existential risk increases the number of life-years left in the short run (remember, we've only got 55 years left). Let's say in the next 55 years, about half the world would have died of age-related problems. So that's 3.5x10^9x27.5=9x10^9, so the improvement in score is log(2x10^11+10^9)-log(2x10^11), or about 0.02

Judging by those scores, curing aging is way less interesting than completely eliminating existential risk between now and when the sun blows up. But there's another problem: most likely, if we cure aging, it will stay cured. But if we cure existential risk, it might come back - anything that relies on continued political action is only as stable as the institutions that execute it. If avoiding nanotech risk means making sure every nanotech developer in the world is overseen by a competent safety board, it might just take 100 years with an incompetent board, or a sneaky researcher, or governments that decide it's no longer worth enforcing, or a breakdown of the One World Government, out of the many millions of years you thought you "saved", to bring the risk back and ruin all your good work.

I don't have an easy way to calculate this, to replace the geometric model of existential risk, so I'm going to end the post here, and write another one on the ground after the CFAR workshop. Let me know if you have any ideas.

Once I attempt that, I'm going to think through how my current actions can affect the future, how much that depends on my model, where the uncertainty is, and then go to experts on this sort of thing to patch the most important gaps. But I I had to do this first to know what needed patching.

Pingback: The humility argument for honesty | Compass Rose