Taking integrity literally

Simple consequentialist reasoning often appears to imply that you should trick others for the greater good. Paul Christiano recently proposed a simple consequentialist justification for acting with integrity:

I aspire to make decisions in a pretty simple way. I think about the consequences of each possible action and decide how much I like them; then I select the action whose consequences I like best.

To make decisions with integrity, I make one change: when I imagine picking an action, I pretend that picking it causes everyone to know that I am the kind of person who picks that option.

If I’m considering breaking a promise to you, and I am tallying up the costs and benefits, I consider the additional cost of you having known that I would break the promise under these conditions. If I made a promise to you, it’s usually because I wanted you to believe that I would keep it. So you knowing that I wouldn’t keep the promise is usually a cost, often a very large one.

Overall this seems like it’s on the right track – I endorse something similar. But it only solves part of the problem. In particular, it explains interpersonal integrity such as keeping one's word, but not integrity of character.

The murderer at the door

In Groundwork for the Metaphysics of Morals, Immanuel Kant famously considers the case of someone who comes to your front door with the intent to murder someone who is hidden in your house. He decides that it is immoral to lie to them and say their target isn't home, because lying undermines the thing that makes your words believable in the first place. This is a controversial but straightforward application of his attempt at a summing up of all morality, the Categorical Imperative:

Act according to maxims you would universalize.

A lot of people object to this because it seems like complicity in murder. However, I think it's pretty defensible, if you remember that you can much more efficiently optimize your heuristics and habits than each particular act. If your environment is mostly friendly and benefits from people being honest, then it makes sense to invest in a mental architecture optimized for truth rather than expedience. If you are part of a friendly information processing system, then you don't have to reason in an explicitly consequentialist way about which words to say – instead, you can aim at a simpler target: just try to accurately inform the group.

When I want to come up with a speech-act to accomplish a certain goal, the plans my mind generates are ones that accomplish the goal by informing people of reasons for it. If I want to conceal some information from others, that's an additional constraint that's not natural for me to think of. This gives me an advantage in long-run cooperation. By contrast, if I found it more natural to think of the people around me as social objects whose levers can be worked by my words, I'd be better at social manipulation and affiliation, but worse at transferring nonsocial information or evaluating arguments.

The honest person's response to the murderer at the door might be to shut the door, quickly, or yell at them to go away, or try and fight them off. If it occurs to them to lie, fine, but you shouldn't try to become more like the person to whom that idea occurs, as long as it's an exceptional case, and you want more truthfulness on net.

This argument cuts across at least the simple, stereotyped version of deontological, virtue-ethical, and consequentialist categories: you obtain the best consequences if you train decision heuristics that are absolutely generalizable rather than unprincipled local optimizations.

Anne Frank and resistance in an unfriendly environment

People often give a superficially similar counterexample to general arguments against lying: What if Anne Frank's family were in your attic, and the Nazis were at the door? The difference between Kant’s hiding someone from a murderer and the Anne Frank hypothetical is the risk of moral pollution. The random murderer is an outlaw – the Nazis were the legitimate government. At least in Germany, they had a mandate to govern. Here, your general interest in honesty is substantially attenuated – you might wish to deny cooperation to your society in a lot of ways, not just this one.

I posed this problem to Paul in the comments to his post. Here was part of his reply:

If you hide Anne Frank’s family you don’t need to imagine that you are letting your neighbors know that you are a political dissident. You are letting people know that if you were a political dissident (or had drastically opposed views about what is good, or whatever underlying difference it is), then you would behave subversively.

This conditional is not in fact a safe thing to let people believe about you under a Nazi regime. The Nazis, like many organizations trying to control rather than cooperate with human agency, were not very keen on freethinkers, even ones who happened to agree with them a lot (or were willing to pretend). More generally, there are many conformist groups that punish members who seem like they're not automatically going along with the group, even if they happen to largely agree on the object level. Such groups view having principles as fundamentally unfriendly. If you aren't being swept along with their feelings, you are suspect.

Social rituals based around rhythm or momentum create a situation where if you're stopping to check what you think of what's going on, you're too slow to keep up. This allows the group to identify and freeze out nonconformers. To tune in, you cannot have the sort of mental friction – the sort of cognitive resistance – that having your own beliefs requires.

The "what would other people think?" theory of integrity is too leaky to stand up to implied peer pressure; as specified, it permits social expectations to pressure you into arbitrary actions by declaring them important signals of integrity. Without some principled basis for excluding some types of influence as hostile, this sort of integrity immediately capitulates as soon as sufficiently violent gang demands that you surrender your principles.

The sort of person who gets along well with the Nazis is the Adolf Eichmann type, who seems to have believed in something larger than himself without particularly caring about the specific content of that larger something. With this process, there can be no negotiated solution – because it is opposed to negotiated solutions, and in favor of illegible attunement. With this sort of process, there can be no epistemic cooperation – because it views having an epistemology as inherently hostile. Trying to keep your map clean and model social reality as distinct from the territory is the first step in joining the resistance.

If you’re considering hiding Anne Frank’s family in your attic, and you consider the objection, “I highly value and benefit from collaboration with my Nazi neighbors in many areas of life. They would be really disappointed in me if they found out I wasn’t on board with the plan to murder all the Jews,” the moral weight of this objection (as opposed to the pragmatic weight) should come out to zero or nearly zero. It does not seem to me as though it is impractical to reliably reach this objection, as many people in countries under Nazi occupation did, and refused to collaborate, and often saved many lives by their refusal.

What do I mean by the moral weight of this objection? Moral considerations are not just a certain subset of values or ends that might be weighed against other ends. There are different modes of analysis, and one mode is the moral mode. You’d never talk of weighing decision-theoretic considerations against other considerations – you have to include other considerations in your decision-theoretic analysis! The same is true of a moral analysis of actions.

My current intuition is that the problem lies somewhere in the distinction between cooperative aspects of relationships where you are trying to reveal info, and adversarial aspects of relationships where you are trying to conceal info.

This isn’t quite the distinction between enemies and allies. Enemies can cooperate on, e.g., limiting the scope of conflict, keeping lines of communication open for surrenders, etc. This is why spies are treated differently than soldiers; spies aren’t just enemies but unlawful enemies.

Purely utilitarian framings of moral reasoning tend to demand that you calculate on the fly when to behave as though you faced incentives to cooperate and when to behave as though your environment were adversarial. In general, the prospects of calculating such things on the fly are poor. It is more tractable to exploit structural regularities of social incentives to simplify the calculation. This is called having principles.

But there's another type of integrity, that allows you to make promises independently of your surroundings. This is a structural virtue, and it's part of what I was talking about when I wrote about community and agency:

I don’t want my loyalty as a friend to be perceived as contingent on continued group membership. Some of my friends worry about becoming socially isolated if they don’t live up to the EA or Rationality community’s expectations. I want it to be intuitively obvious to my friends that if they are abandoned by everyone else, this doesn’t automatically mean they’ll be shunned by me. But when my actions are attributed to the community, then my friends don’t get to form that intuition. My actions are just read as more evidence that the community as a whole is valuable.

Integrity as in structural

Paul wrote another post on this topic, titled If we can’t lie to others, we will lie to ourselves. He offers the following example:

Suppose that I’m planning to meet you at noon. Unfortunately, I lose track of the time and leave 10 minutes late. As I head out, I let you know that I’ll be late, and give you an updated ETA.

In my experience, people—including me—are consistently overoptimistic about arrival times, often wildly so, despite being aware of this bias. Why is that?

Paul proposes that we think others will think better of us if they think that our delay is due more to bad luck than poor planning:

If I tell you a 12:05 ETA and you believe it, then you’ll attribute 5 minutes to error and 5 minutes to noise.  If I tell you a 12:10 ETA and you believe it, then you’ll attribute 10 minutes to error and 0 minutes to noise. I’d prefer the first outcome.

Since we are unwilling to consciously lie to our friends, Paul argues that we have an incentive to believe, without justification, that we will arrive sooner rather than later.

I used to uniformly underestimate how late I was, the way Paul describes. But then I changed. I was offended enough at my own bias, that I resolved to, when estimating my ETA, relinquish attachment to my originally stated arrival time, and build likely delays into my estimate. This didn't work perfectly – my estimates are probably still somewhat too optimistic on the whole – but the problem got a lot better. Now, a substantial share of the time, if I notice I'm running late, I end up arriving before my revised ETA. Not infrequently, after telling someone I'm running 5-10 minutes late, I end up arriving on time.

This shouldn't happen on Paul's model. It shouldn't be possible. What's going on?

When I'm attached to arriving somewhere at a certain time, I tend to bias my thoughts towards the desired outcome. This is helpful for steering reality towards the desired end – it can lead to generating useful ideas for shortcuts – but it produces systematically inaccurate estimates. So, when I'm generating an estimate, I imagine that I don't get to lean in the desired direction. I just watch, with my mind's eye, what happens as an equanimous version of me goes through the current plan. When I do that – when I stop trying to be there on time and start asking when I'll get there – it's easy to produce an unbiased estimate of how long things will take.

I can do this, because I have a way of accessing my beliefs and anticipations about a thing I care about, without actively corrupting them with wishful thinking. I have beliefs, distinct from my preferences. There are social incentives, but there is also objective reality. And my engagement with these is structurally distinct.

Human cognition isn't simply described by mapping perceptions onto actions. Instead, there's sometimes an immediate step, a simplified representation of external reality. This simplified representation allows us to simulate potential courses of action, and do the one that leads to the consequences we like the best.

This epistemic layer isn't always useful – sometimes it's more efficient to simply learn a skill through practice and habituation than deduce it from first principles – but in many cases, we want to optimize our thinking on the proxy metric of accurate beliefs, on the basis of which we can make inferences about the consequences of different actions.

Let's take an example of a simple albeit nonhuman mind. DeepMind's world champion Go program, AlphaGo, has three capacities: a policy network, a value network, and tree search.

The policy network takes as an input a board position, and directly interprets some potential moves as promising. AlphaGo running the policy network alone is a formidable player.

What do the other components do? What might you want a Go program to think about, other than what the best move is?

The value network also takes as an input a board position, but instead of suggesting a move, it evaluates how advantageous the position is. The tree search is not an evaluation of a single position, but instead a process in which AlphaGo represents potential board positions after making moves its policy network likes (and the other player does the same), and then evaluates them. AlphaGo can thus internally represent a simplified summary of the possible ways the game can play out, equivalent to a much more computationally intensive brute-force search of positions, or a much more elaborate policy network.

These networks are kept separate, even if a slight change to the representation of the current board position would satisfy the value network a lot better, or the policy network dislikes the move that the tree search ends up recommending. If you let information spill over, you end up losing the structural features that let the tree search make correct predictions.

Likewise, when I'm trying to meet a friend, I have to be able to assess the value of different outcomes (including the social reward or punishment for various statements or actions), generate hypotheses for which actions might be interesting, and model how given actions are likely to play out in reality. This isn't a perfect analogy to a value network, policy network, and lookahead – the important thing is that I have to be able to separate these things at all, in order to accurately inform people about when I'll get somewhere.

Where Paul is absolutely right, is that you can't be honest without this sort of structure. When your internal accounting is corrupted by your desire to look good, you'll give wrong answers. AlphaGo is, as far as I can tell, designed to have perfect integrity. Humans aren't so perfectly unleaky, but there are varying degrees to which we leak, and those differences matter.

4 thoughts on “Taking integrity literally

  1. Pingback: Rational Feed – deluks917

  2. John Salvatier

    > More generally, there are many conformist groups that punish members who seem like they're not automatically going along with the group, even if they happen to largely agree on the object level. Such groups view having principles as fundamentally unfriendly. If you aren't being swept along with their feelings, you are suspect.

    Excellent sentences. I don't think I've ever seen a group where this didn't happen to some extent, though I have had 1-1 relationships where it was basically not the case.

    >I can do this, because I have a way of accessing my beliefs and anticipations about a thing I care about, without actively corrupting them with wishful thinking. I have beliefs, distinct from my preferences. There are social incentives, but there is also objective reality. And my engagement with these is structurally distinct.

    > When your internal accounting is corrupted by your desire to look good, you'll give wrong answers.

    I certainly agree with this quite a lot, but I repeatedly find myself wanting to emphasize a particular point.

    There are more reasons than social desires that your internal representations might be askew from is "obviously true". For roughly the same reason that representations for predictive tasks (is this a cat or a dog) are often not that great for learning tasks (navigate this maze with cats in it).

    Our sense of "obvious truth" comes from a purely perceptive place rather than a perception+doing place, and this perspective ends up privileging obvious-but-not-necessarily-relevant fact over subtle-but-crucial facts. Legibility over productivity.

    1. Benquo Post author

      Glad you liked my wording 🙂

      I agree on desire to look good not being the only important distortion or filter.


Leave a Reply

Your email address will not be published.