What If: Randomized Experiments

Sun Aug 06 2023

Series: Working Through 'What If?'

Ah the golden standard. This is the great and true way to estimate causal effects, and how it’s done in pretty much all of the natural sciences. The idea is simple - if you want to know the effect of a treatment, apply it randomly to randomly selected units. Then compare the outcomes. These experiments still miss out on the counterfactuals, but barring a trusty time machine, this is the best we can do. The idea is that the randomization ensures our missing data is missing at random, rather than due to some underlying biased process. When true, this is great, because we can actually be mathematically rigorous about any causal estimates. This post will be short, because I’m much more interested in causal inference from observational data, but it’s important to understand the foundations.

An Example

Imagine we have a population of 100 people, and we want to know if screaming at (our treatment $A=1$ ) them makes reduces their likelihood of getting cancer (our outcome $Y=1$ ). An unlikely hypothesis, but we must pursue the truth in all its forms. We randomly select 50 people to scream at, and 50 people to not scream at, call them groups I and II respectively. Imagine we conduct this study, and lo and behold, $Pr[Y=1|A=1]=0.3$ and $Pr[Y=1|A=0]=0.5$ . So our association risk difference is $0.3-0.5=-0.2$ . Wow, maybe screaming does work.

Exchangeability

Ok whoops, we weren’t clear and communicated the wrong instructions to the lab assistants. The people who were originally intended to be in group I were actually in group II, and vice versa. Does this change our results? Well, no, it’s all fine and no one needs to get fired. This is because randomization induces exchangeability. The concept of exchangeability is pretty straightforward - if you have two groups of people, and you randomly assign them to treatment and control, then you could swap the groups and the results would be the same. In other words, the treatment assignment is independent of the potential outcomes.

This is formalized by $Pr[Y^a|A=0]=Pr[Y^a|A=1]$ and implies that the treatment assignment is independent of the potential outcomes.

I had a bit of trouble with this notation at first, so let’s break it down. Assume for a moment that we have any number of parallel universes. $Y^a$ outcome of $Y$ in the universe where we apply treatment $A=a$ .

This is different from $Pr[Y|A=0]$ . This is just the probability of the outcome as we observe it, not the outcome in that parallel universe. Now in our universe, the treatment assignment matches the outcome. I.e $Pr[Y^1|A=1]$ and $Pr[Y^0|A=0]$ . But you can see that no matter what treatment we choose in this universe, the outcome in the parallel universe is the same.

Note that $Y^a \perp \! \! \! \perp A$ is not the same as $Y \perp \! \! \! \perp A$ , as the latter is a statement about the observed outcome (which is dependent on the treatment), not the potential outcomes.

This metaphor is reasonable for randomized experiments, because the randomization ensures that the treatment assignment really is independent of the potential outcomes. If, however, this independence assumption was false (as can be the case, which we will see), then this metaphor would truly suck.

Conditional Randomization

Let’s imagine that our data has a third column, $C$ , which indicates whether or not the person is a smoker. Let’s come up with a small sample table (imagine each row represents a million people so we can ignore sampling error):

C	A	Y
0	0	0
0	0	1
0	1	0
0	1	0
1	1	1
1	1	1
1	0	1
1	1	0

In this case, we clearly haven’t assigned treatments randomly. We’ve instead decided on the following scheme:

50% of non-smokers get treatment
75% of smokers get treatment

This is not generally exchangeable. If we were to swap the treatment assignments, we could get different results because smokers may be more likely to develop cancer. It is, however, conditionally exchangeable. If we condition on the smoking status, then the treatment assignment is independent of the potential outcomes.

$Pr[Y^a|A=0,C=c]=Pr[Y^a|A=1,C=c]$ for all $a,c$ .

It’s easy to see that the conditional version is really just the unconditional version for each subgroup of C.

Standardization

In our example above, how might we go about estimating the effect of treatment on the population as a whole?

We know that our conditional randomization yields effectively 2 randomized experiments, one for each subgroup of C. So, let’s start with calculating the risk ratios for each subgroup:

RR_{C=0}=\frac{Pr[Y=1|A=1,C=0]}{Pr[Y=1|A=0,C=0]}=\frac{0}{1/2}=0

RR_{C=1}=\frac{Pr[Y=1|A=1,C=1]}{Pr[Y=1|A=0,C=1]}=\frac{2/3}{1}=\frac{2}{3}

Now we want to compute the causal risk ratio for the whole population. That is, $\frac{Pr[Y^{a=1}]}{Pr[Y^{a=0}]}$ . The standardization approach is to weight each subgroup by the proportion of the population that is in that subgroup. This gives us:

0.5(\frac{2}{3}) + 0.5(0) = \frac{1}{6}

More generally, we can write this as:

\frac{\sum_{c}Pr[C=c]Pr[Y=1|A=1,C=c]}{\sum_{c}Pr[C=c]Pr[Y=1|A=0,C=c]}