What If: Observational Studies

Thu Aug 10 2023

Series: Working Through 'What If?'

Identifiability Conditions

Correlation vs. Causation

Ah the famous phrase - correlation does not equal causation. In the last chapters we were talking about randomized controlled trials, which gave us some nice guarantees about the causal effects of our treatment. But when we look at observational data, a whole mess of problems arise.

Before we start, some definitions of the identifiability conditions

  1. Consistency: The treatment being observed could otherwise be articulated as a intervention under experimental conditions.
  2. Exchangeability: The probability of any particular treatment value depends only on measured covariates.
  3. Positivity: There is at least one instance of every treatment value conditional on every covariate combination.

If you have all of these things in your observational data, then you can just treat it like a randomized controlled trial. Unfortunately, this is basically impossible. We’ll later cover some of the stuff we can do to compensate for the tribulations of reality.

Exchangeability In Observed Data

In randomized controlled trials, randomization gives us exchangeability by default (and conditional randomization gives us conditional exchangeability). In observational studies, it’s likely that at least some reasons for treatment are correlated with the outcome. We can refer back to our example in our exploration of chapter 2, where we were hypothesizing that screaming at people might reduce their likelihood of getting cancer. Recall that We split our likelihood of treatment based on whether or not the person was a smoker.

If instead, our data came from an observational study, and the “smoker” stratum was the only factor that was not distributed equally between those who received the treatment, then we could characterize that data as if it belonged to a conditional randomized trial.

Here’s the trick though, how could we ever actually know that smoking is the only factor that is not distributed equally? We can’t really. But if we assume that it is, then we can use the data as if it were from a conditional randomized trial.

Just for fun, let’s try to justify this assumption here.

Doctors prefer to yell at smokers, because they are more likely to leave the hospital without permission to outside to smoke. However, getting yelled at is not a predictor of getting cancer*, so we can assume randomization within the “smoker” stratum.

*Well maybe it does.

But let’s say that unbeknownst to us, doctors are more likely to yell at people who are young (out of respect to our older population). Then we have a problem - our data no longer has exchangeability because we have a factor that is not distributed equally - more old people are getting the treatment than young people. And age certainly is a predictor of getting cancer.

Our problem reflects a more general problem in observational studies - we can’t know what we don’t know. And even if we’re lucky enough to actually have conditional exchangeability, we can’t know for sure that we do.

Positivity In Observed Data

In experiments, positivity is a given. We assume that at least some people will be assigned to each treatment. Observational studies are much trickier. Maybe doctors always yell at smokers, and never yell at non-smokers. Then we have a problem.

This is basically the positivity problem.

Consistency In Observed Data

There are two facets of consistency that are complicated with respect to observed data. The first of these is that our treatment must be well-defined. In our example, we were yelling at people. But what if we were yelling at them to do different things? Maybe we were yelling emotionally supportive things at some people, and yelling angrily at others. Maybe the stress of being yelled at angrily is a predictor of getting cancer, and the experience of being aggressively emotionally supported reduces rates of cancer.

Since the treatment actually spans two different treatments, but we aren’t privy to the difference, our treatment is not well-defined.

Now, we don’t actually need to know every last detail of the treatment. It would be enough for our treatment to be sufficiently well-defined. Of course that does little to lessen our burden, as we can’t ever really know if our treatment is sufficiently well-defined.

The second facet of consistency is that our extremely well-defined treatment may not even exist in the observed data. Maybe we defined our treatment to be “yelling at people to do jumping jacks at 9am on a Tuesday” and all experts agree that this makes the treatment sufficiently well-defined. But then we take a look at our data, and we find that there are no instances at all of people being yelled at to do jumping jacks at 9am on a Tuesday. Oh no! One way to solve this problem is to assume treatment variation irrelevance. That just means that we consider the treatments a good enough approximation of the treatment we defined. Maybe we have lots of examples of doctors yelling at people to do jumping jacks at 9am on a Wednesday, for instance, and that’s probably good enough.

But this assumption is certainly not always reasonable. For instance, what if in our data, we only have examples of doctors screaming at patients to “get off the table, I’m working here!“. That’s probably not a good enough approximation of our treatment.

In practice, there is often some middle ground. We make some assumptions of treatment variation irrelevance, and then try to make those assumptions as transparent as we possibly can. And then invite other experts to tell us why we’re wrong.

The Target Trial

The target trial makes explicit the experiment that we are trying to emulate with our observational data. The whole point is to design an experiment that we can’t actually run, and then argue that our observational data is a good enough approximation of that experiment.

Now some people will argue that it’s not important to fully define the target trial. The common argument is that “who cares exactly what the causal effects are, so long as we understand that there is a causal effect”.

To liken this argument to our example, it’s like saying “who cares if yelling at people to do jumping jacks at 9am on a Tuesday reduces cancer rates, so long as we know if we just yelled at people they would be at reduced risk for cancer”.

The problem (so says the author) is that unspecified interventions may be impractical. If the true causal effect really really depends on yelling supportive platitudes at people, and we then conclude that “yelling at people reduces cancer rates”, we may be in for a rude awakening when we try to implement our findings.

Another issue relates to conditional exchangeability. If our intervention is not well-defined, then we have an even more complicated problem with exchangeability than we already do with observational data. That is, different versions of the treatment might have a totally different set of covariates. So not only do we have to worry about whether or not the factors influencing treatment are distributed reasonably, we have to worry about whether or not they’re even the same factors. A well-defined intervention at least helps reduce the severity of this issue.

On prediction vs. causal inference

When we can’t make a reasonable target trial, it’s still fair game to say that we have found a predictor. This is way weaker than a causal effect. It just means that we have found a pattern in the data that we can use to predict the outcome. We may find that “whether or not a person has been yelled at” is a good predictor of their cancer risk. But without our identifiability assumptions and careful design of our target trial, we can’t say with any certainty that “yelling at people reduces cancer risk”.

Causation is not Correlation, as the story goes.