Reminders
Just some brief notes/examples to act as basics stats reminders.
Example - LOTP/LOTE
Assume the following true joint distribution for some outcome \(Y\) and its distribution conditional on sex, smoking and vegetarianism.
Code
set.seed(1)
d_tru <- CJ(sex = 0:1, smk = 0:1, veg = 0:1)
g <- function(sex, smk, veg){
-1 + 2 * sex - 1 * smk + 3 * veg +
-1 * sex * smk - 3 * sex * veg + 1 * smk * veg +
0.5 * sex * smk * veg
}
# probability of group membership
q <- rnorm(nrow(d_tru))
d_tru[, p_grp := exp(q)/sum(exp(q))]
# probability of outcome
d_tru[, p_y := plogis(g(sex,smk,veg))]
d_tru
sex smk veg p_grp p_y
1: 0 0 0 0.04225019 0.2689414
2: 0 0 1 0.09498377 0.8807971
3: 0 1 0 0.03427561 0.1192029
4: 0 1 1 0.38968687 0.8807971
5: 1 0 0 0.10989996 0.7310586
6: 1 0 1 0.03479920 0.7310586
7: 1 1 0 0.12870098 0.2689414
8: 1 1 1 0.16540341 0.6224593
Take a random sample from the population:
Recover the distribution of the covariates (the p_grp
)
sex smk veg V1
1: 0 0 0 0.0421805
2: 0 0 1 0.0950815
3: 0 1 0 0.0343139
4: 0 1 1 0.3898065
5: 1 0 0 0.1096864
6: 1 0 1 0.0348148
7: 1 1 0 0.1288739
8: 1 1 1 0.1652425
Recover the joint distribution of the outcome (the p_y
)
sex smk veg mu_y
1: 0 0 0 0.2693899
2: 0 0 1 0.8812230
3: 0 1 0 0.1189314
4: 0 1 1 0.8805292
5: 1 0 0 0.7309010
6: 1 0 1 0.7311000
7: 1 1 0 0.2692407
8: 1 1 1 0.6217244
Estimate the probability of outcome by sex
We are interested in \(\mathbb{E}[y | sex]\). Can we get to this via iterated expectation/LOTE?
Think - is all we need LOTP (with extra conditioning), e.g.
\[ \begin{aligned} Pr(y | sex = s) &= Pr(y | sex = s, smk = 0, veg = 0) Pr(smk = 0, veg = 0|sex = s) + \\ & \quad Pr(y | sex = s, smk = 0, veg = 1) Pr(smk = 0, veg = 1|sex = s) + \\ & \quad Pr(y | sex = s, smk = 1, veg = 0) Pr(smk = 1, veg = 0|sex = s) + \\ & \quad Pr(y | sex = s, smk = 1, veg = 1) Pr(smk = 1, veg = 1|sex = s) \end{aligned} \tag{1}\]
Operationalised, based on the known distributions, we get:
Code
Pr(Y | sex = 0) Pr(Y | sex = 1)
0.7882179 0.5545841
And which can be estimated from the simulated data:
Code
Pr(Y | sex = 0) Pr(Y | sex = 1)
0.7881758 0.5541419
Take Equation 1, multiply both sides by \(y\) and then sum over all \(y\)
\[ \begin{aligned} \sum_y y Pr(y | sex = s) &= \sum_y y Pr(y | sex = s, smk = 0, veg = 0) Pr(smk = 0, veg = 0|sex = s) + \\ & \quad \sum_y y Pr(y | sex = s, smk = 0, veg = 1) Pr(smk = 0, veg = 1|sex = s) + \\ & \quad \sum_y y Pr(y | sex = s, smk = 1, veg = 0) Pr(smk = 1, veg = 0|sex = s) + \\ & \quad \sum_y y Pr(y | sex = s, smk = 1, veg = 1) Pr(smk = 1, veg = 1|sex = s) \end{aligned} \tag{2}\]
which (I think) can be re-stated as
\[ \begin{aligned} \mathbb{E}(Y|sex=s) &= \mathbb{E}(Y|sex = s, smk = 0, veg = 0) Pr(smk = 0, veg = 0|sex = s) + \\ & \quad \mathbb{E}(Y|sex = s, smk = 0, veg = 1) Pr(smk = 0, veg = 1|sex = s) + \\ & \quad \mathbb{E}(Y|sex = s, smk = 1, veg = 0) Pr(smk = 1, veg = 0|sex = s) + \\ & \quad \mathbb{E}(Y|sex = s, smk = 1, veg = 1) Pr(smk = 1, veg = 1|sex = s) \\ &= \sum_{smk,veg} \mathbb{E}(Y|sex = s, smk, veg) Pr(smk, veg|sex = s) \end{aligned} \tag{3}\]
also note1 (just in case one is easier to determine than another):
\[ Pr(smk, veg|sex) = Pr(smk|veg, sex) Pr(veg | sex) \]
so (perhaps) alternatively:
\[ \begin{aligned} \mathbb{E}(Y|sex=s) &= \sum_{smk,veg} \mathbb{E}(Y|sex = s, smk, veg) Pr(smk|veg, sex = s) Pr(veg | sex = s) \end{aligned} \tag{4}\]
Footnotes
For the proof, consider \(P(Y,X|Z) = \frac{Pr(X,Y,Z)}{Pr(Z)}\), \(P(Y|X,Z) = \frac{Pr(X,Y,Z)}{Pr(X,Z)}\) and \(Pr(X|Z) = \frac{Pr(X,Z)}{Pr(Z)}\).↩︎