Reminders

Published

July 29, 2024

Modified

February 21, 2024

Just some brief notes/examples to act as basics stats reminders.

Example - LOTP/LOTE

Assume the following true joint distribution for some outcome Y and its distribution conditional on sex, smoking and vegetarianism.

Code
set.seed(1)
d_tru <- CJ(sex = 0:1, smk = 0:1, veg = 0:1)
g <- function(sex, smk, veg){
  -1 + 2 * sex - 1 * smk + 3 * veg +
    -1 * sex * smk - 3 * sex * veg + 1 * smk * veg +
    0.5 * sex * smk * veg
}
# probability of group membership
q <- rnorm(nrow(d_tru))
d_tru[, p_grp := exp(q)/sum(exp(q))]
# probability of outcome
d_tru[, p_y := plogis(g(sex,smk,veg))]
d_tru
Key: <sex, smk, veg>
     sex   smk   veg      p_grp       p_y
   <int> <int> <int>      <num>     <num>
1:     0     0     0 0.04225019 0.2689414
2:     0     0     1 0.09498377 0.8807971
3:     0     1     0 0.03427561 0.1192029
4:     0     1     1 0.38968687 0.8807971
5:     1     0     0 0.10989996 0.7310586
6:     1     0     1 0.03479920 0.7310586
7:     1     1     0 0.12870098 0.2689414
8:     1     1     1 0.16540341 0.6224593

Take a random sample from the population:

Code
n <- 1e7
i <- sample(1:nrow(d_tru), n, T, prob = d_tru$p_grp)
d <- d_tru[i]
d[, y := rbinom(.N, 1, p_y)]

Recover the distribution of the covariates (the p_grp)

Code
# recover distribution of covariates
d[, .(.N/nrow(d)), keyby = .(sex, smk, veg)]
Key: <sex, smk, veg>
     sex   smk   veg        V1
   <int> <int> <int>     <num>
1:     0     0     0 0.0421805
2:     0     0     1 0.0950815
3:     0     1     0 0.0343139
4:     0     1     1 0.3898065
5:     1     0     0 0.1096864
6:     1     0     1 0.0348148
7:     1     1     0 0.1288739
8:     1     1     1 0.1652425

Recover the joint distribution of the outcome (the p_y)

Code
# recover distribution of outcome
d[, .(mu_y = mean(y)), keyby = .(sex, smk, veg)]
Key: <sex, smk, veg>
     sex   smk   veg      mu_y
   <int> <int> <int>     <num>
1:     0     0     0 0.2693899
2:     0     0     1 0.8812230
3:     0     1     0 0.1189314
4:     0     1     1 0.8805292
5:     1     0     0 0.7309010
6:     1     0     1 0.7311000
7:     1     1     0 0.2692407
8:     1     1     1 0.6217244

Estimate the probability of outcome by sex

Code
d[, .(mu_y = mean(y)), keyby = sex]
Key: <sex>
     sex      mu_y
   <int>     <num>
1:     0 0.7881758
2:     1 0.5541419

We are interested in E[y|sex]. Can we get to this via iterated expectation/LOTE?

Think - is all we need LOTP (with extra conditioning), e.g.

(1)Pr(y|sex=s)=Pr(y|sex=s,smk=0,veg=0)Pr(smk=0,veg=0|sex=s)+Pr(y|sex=s,smk=0,veg=1)Pr(smk=0,veg=1|sex=s)+Pr(y|sex=s,smk=1,veg=0)Pr(smk=1,veg=0|sex=s)+Pr(y|sex=s,smk=1,veg=1)Pr(smk=1,veg=1|sex=s)

Operationalised, based on the known distributions, we get:

Code
c(
  "Pr(Y | sex = 0)" = sum(d_tru[sex == 0, p_y] * d_tru[sex == 0, p_grp / sum(p_grp)] ),
  "Pr(Y | sex = 1)" = sum(d_tru[sex == 1, p_y] * d_tru[sex == 1, p_grp / sum(p_grp)] )
)
Pr(Y | sex = 0) Pr(Y | sex = 1) 
      0.7882179       0.5545841 

And which can be estimated from the simulated data:

Code
c(
  "Pr(Y | sex = 0)" = 
    sum(d[sex == 0, mean(y)] * 
          d[sex == 0, .(p_grp = .N/nrow(d[sex == 0])), keyby = .(smk, veg)]$p_grp ),
  "Pr(Y | sex = 1)" = 
    sum(d[sex == 1, mean(y)] * 
          d[sex == 1, .(p_grp = .N/nrow(d[sex == 1])), keyby = .(smk, veg)]$p_grp)
)
Pr(Y | sex = 0) Pr(Y | sex = 1) 
      0.7881758       0.5541419 

Take , multiply both sides by y and then sum over all y

(2)yyPr(y|sex=s)=yyPr(y|sex=s,smk=0,veg=0)Pr(smk=0,veg=0|sex=s)+yyPr(y|sex=s,smk=0,veg=1)Pr(smk=0,veg=1|sex=s)+yyPr(y|sex=s,smk=1,veg=0)Pr(smk=1,veg=0|sex=s)+yyPr(y|sex=s,smk=1,veg=1)Pr(smk=1,veg=1|sex=s)

which (I think) can be re-stated as

(3)E(Y|sex=s)=E(Y|sex=s,smk=0,veg=0)Pr(smk=0,veg=0|sex=s)+E(Y|sex=s,smk=0,veg=1)Pr(smk=0,veg=1|sex=s)+E(Y|sex=s,smk=1,veg=0)Pr(smk=1,veg=0|sex=s)+E(Y|sex=s,smk=1,veg=1)Pr(smk=1,veg=1|sex=s)=smk,vegE(Y|sex=s,smk,veg)Pr(smk,veg|sex=s)

also note (just in case one is easier to determine than another):

Pr(smk,veg|sex)=Pr(smk|veg,sex)Pr(veg|sex)

so (perhaps) alternatively:

(4)E(Y|sex=s)=smk,vegE(Y|sex=s,smk,veg)Pr(smk|veg,sex=s)Pr(veg|sex=s)

Footnotes

  1. For the proof, consider P(Y,X|Z)=Pr(X,Y,Z)Pr(Z), P(Y|X,Z)=Pr(X,Y,Z)Pr(X,Z) and Pr(X|Z)=Pr(X,Z)Pr(Z).↩︎