Orthonormal contrasts

Published

July 29, 2024

Modified

July 23, 2024

Consider the standard one factor factorial setup where we have a grand mean (as in the mean of the factor level means) and deviations for main effects:

\[ \begin{aligned} \eta = 1^\top \mu + X_a \alpha + X_b \alpha + X_c \gamma \end{aligned} \]

where \(X_a\) is the design matrix.

It might be tempting setup a design matrix with a column for every parameter, but the OLS solution involves inverting \(X^\top X\) which you cannot do as the rank of \(X\) is less than the number of columns in \(X\):

X <- cbind(1, diag(3))
# in ols, the estimator involves inverting the design matrix
tryCatch(
  {
    solve(t(X)%*%X)
  } ,
  error = function(z){
    message(z, "\n")
  }
  
)
Error in solve.default(t(X) %*% X): system is computationally singular: reciprocal condition number = 1.38778e-17

however, if we use the intercept as the reference group (or introduce a sum to zero constraint) then we are fine

X <- cbind(1, diag(3)[,2:3])
# in ols, the estimator involves inverting the design matrix
solve(t(X)%*%X)
     [,1] [,2] [,3]
[1,]    1   -1   -1
[2,]   -1    2    1
[3,]   -1    1    2

Contrasts define a specific linear combination of parameters and affect their interpretation. When treatment contrasts are adopted with a design matrix that includes main and interaction effects for \(a\) with 3 levels and \(b\) with 2 levels we have

a <- factor(1:3)
b <- factor(1:2)
d <- CJ(a, b)
X <- model.matrix(~a * b, data = d)
X
  (Intercept) a2 a3 b2 a2:b2 a3:b2
1           1  0  0  0     0     0
2           1  0  0  1     0     0
3           1  1  0  0     0     0
4           1  1  0  1     1     0
5           1  0  1  0     0     0
6           1  0  1  1     0     1
attr(,"assign")
[1] 0 1 1 2 3 3
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"

attr(,"contrasts")$b
[1] "contr.treatment"

and for deviation (sum to zero) contrasts, which is what would be used for the anova setup, it would look like

a <- factor(1:3)
b <- factor(1:2)
d <- CJ(a, b)
contrasts(d$a) <- contr.sum(3)
contrasts(d$b) <- contr.sum(2)
X <- model.matrix(~a * b, data = d)
X
  (Intercept) a1 a2 b1 a1:b1 a2:b1
1           1  1  0  1     1     0
2           1  1  0 -1    -1     0
3           1  0  1  1     0     1
4           1  0  1 -1     0    -1
5           1 -1 -1  1    -1    -1
6           1 -1 -1 -1     1     1
attr(,"assign")
[1] 0 1 1 2 3 3
attr(,"contrasts")
attr(,"contrasts")$a
  [,1] [,2]
1    1    0
2    0    1
3   -1   -1

attr(,"contrasts")$b
  [,1]
1    1
2   -1

For treatment contrasts, the intercept represents the mean of the reference group, taken to be when \(a\) and \(b\) are both at their first level. For deviation contrasts the intercept becomes the grand mean, equating to the mean of the group means, not the mean value of the outcome in the dataset.

With classic ANOVA, a sum to zero constraint allows us to overcome the rank deficient nature of the design matrix.

When using the treatment contrasts, the second and third levels of \(a\) are explicit in the design matrix whereas for the deviation contrasts, the first and second levels appear. For example, with a single factor model:

K_a <- 4
# means at levels of a
b_a <- c(-1, 3, 2, 1)
N <- 1e6
a <- sample(1:K_a, size = N, replace = T, prob = c(0.3, 0.2, 0.4, 0.1))
d <- data.table(a = factor(a))
d[, mu := b_a[a]]
d[, y := rnorm(.N, mu, 1)]

# sum to zero/deviation contrasts as used for anova
contrasts(d$a) <- contr.sum(K_a)
f1 <- lm(y ~ a, data = d)

The coefficients include an intercept and deviations for the first three levels of \(a\)

coef(f1)
(Intercept)          a1          a2          a3 
  1.2503037  -2.2529658   1.7511701   0.7493635 

For the deviation contrasts, the mean for the first level of \(a\) is computed as the grand mean, plus the first parameter estimate and similarly for the second and third. The mean for the last level of \(a\) is implied as the grand mean minus the sum of the first three terms, hence sum to zero.

d_new <- CJ(a = factor(1:K_a))
contrasts(d_new$a) <-  contr.sum(K_a)
X <- model.matrix(~a, data = d_new)
X
  (Intercept) a1 a2 a3
1           1  1  0  0
2           1  0  1  0
3           1  0  0  1
4           1 -1 -1 -1
attr(,"assign")
[1] 0 1 1 1
attr(,"contrasts")
attr(,"contrasts")$a
  [,1] [,2] [,3]
1    1    0    0
2    0    1    0
3    0    0    1
4   -1   -1   -1

and \(\sum_{i=1}^{K_a} \alpha_i = 0\) which we can derive empirically by calculating the all K_a group means, subtracting the grand mean from each and taking the sum:

(mu_a_hat <- t(model.matrix(~a, data = d_new) %*% coef(f1)))
             1        2        3        4
[1,] -1.002662 3.001474 1.999667 1.002736
rbind(
  tru = b_a,
  estimate = as.numeric(round(mu_a_hat, 3))
)
           [,1]  [,2] [,3]  [,4]
tru      -1.000 3.000    2 1.000
estimate -1.003 3.001    2 1.003

which sum to zero:

Code
round(sum(mu_a_hat - coef(f1)[1]), 3)  
[1] 0

The above creates an inconvenience for the Bayesian who would like to (i) play frequentist in their insistence with differentiating fixed from random effects and (ii) place equal prior weights to each group of parameters. We could (and often do) respond to this by fixing the reference parameter at zero to overcome the identification issue, but that also puts a different prior weight on this reference parameter in relation to the other parameters within the group. An alternative to arbitrarily fixing one of the factor level parameters to zero is to project the \(K_a\) dimension parameter space associated with the first factor into a \(K_a - 1\) dimensional space in a way that will enforce the same marginal prior on all \(K_a\) terms.

The way that the sum-to-zero constraint is achieved is by placing a prior that has a negative correlation across the effects:

\[ \begin{aligned} \Sigma_a = \mathbb{I}_a - J_a / K_a \end{aligned} \]

where \(J_a\) is a matrix of ones of dimension \(K_a \times K_a\), leading to, for example, a covariance matrix such as the following:

# group levels
K_a <- 3
J <- matrix(1, K_a, K_a)
S_a <- diag(K_a) - J / K_a
# cov2cor(S_a)
S_a
           [,1]       [,2]       [,3]
[1,]  0.6666667 -0.3333333 -0.3333333
[2,] -0.3333333  0.6666667 -0.3333333
[3,] -0.3333333 -0.3333333  0.6666667

where again we are just considering a single factor at the moment. As a test, we can generate draws from a multivariate normal distribution using this covariance matrix and observe that they will sum to zero:

u <- mvtnorm::rmvnorm(3, rep(0, K_a), S_a)
# final column is the sum of the draws:
round(cbind(u, rowSums(u)), 3)
       [,1]   [,2]   [,3] [,4]
[1,]  0.698 -0.441 -0.257    0
[2,] -0.819  1.299 -0.479    0
[3,] -0.292  0.494 -0.203    0

\(\Sigma_a\) is not full rank, but it can be decomposed using its set of the eigenvectors. Since \(\Sigma_a\) is rank \(K_a - 1 = 2\), there will only be two non-zero eigenvalues. Both of the eigenvalues are equal to 1. Below, we construct \(\Sigma_a\) as the product of \(Q_a\) and \(\Lambda\) where \(Q_a\) is a matrix formed from \(K_a - 1\) eigenvectors that correspond to the non-zero eigenvalues of \(\Sigma_a\). In this setting, \(\Lambda = \mathbb{I}_{K_a - 1}\).

# for symmetric matrices, svd equiv to eigen
# eigen values are sorted in decreasing order so we drop the last one
svd_S_a <- svd(S_a)
# eigenvectors
Q_a <- svd_S_a$v
Q_a <- Q_a[, 1:(K_a - 1)]
# original setup for S_a
Q_a %*% diag(K_a - 1) %*% t(Q_a)
           [,1]       [,2]       [,3]
[1,]  0.6666667 -0.3333333 -0.3333333
[2,] -0.3333333  0.6666667 -0.3333333
[3,] -0.3333333 -0.3333333  0.6666667

This allows us to define a new vector of \(K_a - 1\) parameters \(\alpha^*\):

\[ \begin{aligned} \alpha^* = Q_a^\top \alpha \end{aligned} \]

that sum to zero due to \(Q_a\):

round(t(Q_a), 3)
       [,1]   [,2]  [,3]
[1,] -0.816  0.408 0.408
[2,]  0.000 -0.707 0.707

The \(Q_a\) defines the contrasts that are orthonormal and allow us to identify the \(K_a - 1\) parameters.

Armed with the above, we can now take our original tempting setup for the design matrix (i.e. with a column for every parameter) and convert it to a new design matrix that maps \(\alpha^*\) into the observations:

(X <- diag(K_a) )
     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1
(X_star <- cbind(X %*% Q_a))
           [,1]          [,2]
[1,] -0.8164966  4.108230e-17
[2,]  0.4082483 -7.071068e-01
[3,]  0.4082483  7.071068e-01

To go from the parameters in their original space to the parameters in the reduced dimensional space and back again:

Code
# original parameters constrained to sum to zero
b_a <- c(-1, 0.75, 0.25)
# representation in lower dimensional space
b_a_star <- t(Q_a) %*% b_a

# transformed back to b_a
X_star %*% b_a_star 
      [,1]
[1,] -1.00
[2,]  0.75
[3,]  0.25

An implementation for the single factor case follows:

get_data <- function(
    N = 100,
    K_a = 3,
    mu = 1,
    b_a = c(-1, 0.25, 0.75)
    ){
  
  a = sample(1:K_a, size = N, replace = T)
  d <- data.table(a)
  d[, eta := mu + b_a[a]]
  d[, y := rbinom(N, 1, plogis(eta))]
  
  # saturated design matrix
  X_a <- diag(K_a)
  
  # correlation matrix to enforce sum to zero
  J <- matrix(1, K_a, K_a)
  S_a <- diag(K_a) - J / K_a 
  
  # decomposition
  # eigen vectors
  Q_a <- eigen(S_a)$vector[, 1:(K_a - 1)]

  # full rank design
  X_a_s <- X_a %*% Q_a
  
  list(
    d = d, K_a = K_a, 
    mu = mu, b_a = b_a,
    X_a = X_a, X_a_s = X_a_s,
    S_a = S_a, Q_a = Q_a
  )
}
Logistic regression
// Model for:
// independent estimates of treatment arm by subgroup means
data {
  int N;
  array[N] int y;
  // arm index
  array[N] int x1trt;
  // dimension of design matrix
  int ncX1des;
  int nrX1des;
  matrix[nrX1des, ncX1des] X1des;
  vector[ncX1des] sx1;
  int prior_only;
}
transformed data {
  // build full design matrices
  matrix[N, ncX1des] X1 = X1des[x1trt];
}
parameters{
  real mu;
  vector[ncX1des] bx1;
}
transformed parameters{ 
  vector[N] eta = mu + X1*bx1;
}
model{
  target += logistic_lpdf(mu | 0, 1);
  target += normal_lpdf(bx1 | 0, sx1);
  
  if(!prior_only){
    target += bernoulli_logit_lpmf(y | eta);  
  }
}
generated quantities{
}
Code
m1 <- cmdstanr::cmdstan_model("stan/note-09-logistic.stan")

l1 <- get_data(
  N = 1e4, K_a = 5, mu = 0.5, 
  b_a = c(-2, -1, 0, 1, 2)
  )

# would be more efficient to transform into a binomial likelihood
# but keeping bernoulli for now since it allows me to think about
# unit level data (not sure if that will be necessary though)
ld <- list(
  N = length(l1$d$y),
  y = l1$d$y,
  # 
  x1trt = l1$d$a,
  nrX1des = nrow(l1$X_a_s), ncX1des = ncol(l1$X_a_s),
  X1des = l1$X_a_s,
  sx1 = rep(1, ncol(l1$X_a_s)),
  prior_only = 1
)

f1 <- m1$sample(
  ld, iter_warmup = 1000, iter_sampling = 3000,
    parallel_chains = 2, chains = 2, refresh = 0, show_exceptions = F,
    max_treedepth = 10)
Running MCMC with 2 parallel chains...

Chain 1 finished in 7.5 seconds.
Chain 2 finished in 7.5 seconds.

Both chains finished successfully.
Mean chain execution time: 7.5 seconds.
Total execution time: 7.6 seconds.
Code
f1$summary(variables = c("mu", "bx1"))
# A tibble: 5 × 10
  variable     mean   median    sd   mad    q5   q95  rhat ess_bulk ess_tail
  <chr>       <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 mu       -0.0139   0.00673 1.78  1.55  -2.96  2.90  1.00    7180.    3759.
2 bx1[1]   -0.00319 -0.00501 1.02  1.00  -1.66  1.69  1.00    7320.    4469.
3 bx1[2]   -0.0170  -0.0237  1.01  1.02  -1.68  1.62  1.00    7727.    4345.
4 bx1[3]   -0.0139  -0.0236  0.975 0.961 -1.59  1.57  1.00    8200.    4623.
5 bx1[4]    0.00134  0.00395 0.993 0.993 -1.64  1.64  1.00    7705.    4012.

To transform the parameter estimates back to the group means that we are interested in, we need to compute the product of the parameters and the unique entries from the design matrix.

In theory, these should all have the same prior weight, which appears to be the case as shown below, Figure 1:

Priors on group offsets
m_post_s <- f1$draws(variables = c("bx1"), format = "matrix")
post_mu <- m_post_s %*% t(cbind(l1$X_a_s))

d_fig <- data.table(post_mu)
d_fig <- melt(d_fig, measure.vars = names(d_fig))
d_fig[, variable := factor(
  variable, 
  levels = paste0("V", 1:(l1$K_a)))]

ggplot(d_fig, aes(x = value, group = variable)) +
  geom_density(lwd = 0.2) +
  facet_wrap(~variable, ncol = 1) 
Figure 1: Priors on group means

Run the model to see if we get close to the known (true) parameters although they won’t be immediately apparent from the summary:

Code
ld$prior_only <- 0

f1 <- m1$sample(
  ld, iter_warmup = 1000, iter_sampling = 2000,
    parallel_chains = 2, chains = 2, refresh = 0, show_exceptions = F,
    max_treedepth = 10)
Running MCMC with 2 parallel chains...

Chain 1 finished in 7.8 seconds.
Chain 2 finished in 7.7 seconds.

Both chains finished successfully.
Mean chain execution time: 7.8 seconds.
Total execution time: 7.8 seconds.
Code
# these are the parameters in the reduced dimensional space
f1$summary(variables = c("mu", "bx1"))
# A tibble: 5 × 10
  variable   mean median     sd    mad     q5    q95  rhat ess_bulk ess_tail
  <chr>     <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
1 mu        0.422  0.421 0.0264 0.0267  0.379  0.466  1.00    3823.    3171.
2 bx1[1]   -0.845 -0.845 0.0455 0.0452 -0.919 -0.771  1.00    4232.    3277.
3 bx1[2]    1.32   1.31  0.0744 0.0743  1.19   1.44   1.00    3887.    3051.
4 bx1[3]   -2.20  -2.20  0.0578 0.0575 -2.30  -2.11   1.00    3735.    3016.
5 bx1[4]   -1.55  -1.55  0.0549 0.0537 -1.65  -1.46   1.00    4071.    3169.

Obviously, the results won’t align exactly with the true values, but they should be somewhere close, see Figure 2:

Parameter posterior density
m_post_s <- f1$draws(variables = c("mu", "bx1"), format = "matrix")
post_mu <- m_post_s %*% t(cbind(1, l1$X_a_s))

d_fig <- data.table(post_mu)
d_fig <- melt(d_fig, measure.vars = names(d_fig))
d_fig[, variable := factor(
  variable, 
  levels = paste0("V", 1:(l1$K_a)))]

d_tru <- data.table(
  a = 1:l1$K_a
)
d_tru[, eta := l1$mu + l1$b_a[a]]
d_tru[, variable := factor(paste0("V", a))]

ggplot(d_fig, aes(x = value, group = variable)) +
  geom_density(lwd = 0.2) +
  geom_vline(data = d_tru, 
              aes(xintercept = eta), col = 2,
              lwd = 0.3) +
  facet_wrap(~variable, ncol = 1) +
  scale_x_continuous(breaks = seq(-3, 3, by = 0.5))
Figure 2: Parameter estimates for group level means
Parameter posterior density
m_post_s <- f1$draws(variables = c("bx1"), format = "matrix")
post_mu <- m_post_s %*% t(cbind(l1$X_a_s))

d_fig <- data.table(post_mu)
d_fig <- melt(d_fig, measure.vars = names(d_fig))
d_fig[, variable := factor(
  variable, 
  levels = paste0("V", 1:(l1$K_a)))]

d_tru <- data.table(
  a = 1:l1$K_a
)
d_tru[, eta := l1$b_a[a]]
d_tru[, variable := factor(paste0("V", a))]

ggplot(d_fig, aes(x = value, group = variable)) +
  geom_density(lwd = 0.2) +
  geom_vline(data = d_tru, 
              aes(xintercept = eta), col = 2,
              lwd = 0.3) +
  facet_wrap(~variable, ncol = 1) +
  scale_x_continuous(breaks = seq(-3, 3, by = 0.5))
Figure 3: Parameter estimates for group level offsets

Two-factor setup

Here we have two main effects and their associated interactions. I have kept this first implementation as a record of my initial mistake on columnwise vs rowwise specification of the interaction parameters. The implementation in the final section is the correct approach.

Revised data generation function
N = 100
K_a = 3
K_b = 2
b0 = 1
# effects are sum to zero
b_a = c(0, -1, 1)
b_b = c(-0.6, 0.6)
b_ab = c(-0.05, 0.15, -0.1, 0.05, -0.15,  0.1)
# b_ab = c(-0.05, 0.05, 0.15, -0.15,  0.1, -0.1)

get_data <- function(
    N = 100,
    K_a = 3,
    K_b = 2,
    b0 = 1,
    # effects are sum to zero
    b_a = c(0, -1, 1),
    b_b = c(-0.6, 0.6),
    b_ab = c(-0.05, 0.15, -0.1, 0.05, -0.15,  0.1)
    # b_ab = c(-0.05, 0.05, 0.15, -0.15,  0.1, -0.1)
    ){
  
  stopifnot(all.equal(0, sum(b_a), 
                      tolerance = sqrt(.Machine$double.eps)))
  stopifnot(all.equal(0, sum(b_b), 
                      tolerance = sqrt(.Machine$double.eps)))
  stopifnot(all.equal(0, sum(b_ab), 
                      tolerance = sqrt(.Machine$double.eps)))
  
  # cols and rows should sum to zero otherwise following will not work
  b_ab_matrix <- matrix(b_ab, nrow = K_a, ncol = K_b)
  stopifnot(all.equal(colSums(b_ab_matrix), rep(0, K_b),
                      tolerance = sqrt(.Machine$double.eps)))
  stopifnot(all.equal(rowSums(b_ab_matrix), rep(0, K_a),
                      tolerance = sqrt(.Machine$double.eps)))
  
  # we could go down this avenue but the original parameters would never be 
  # recovered.
  # b_ab_centered <- b_ab_matrix -
  #   rowMeans(b_ab_matrix) -
  #   rep(colMeans(b_ab_matrix), each = K_a) +
  #   mean(b_ab_matrix)
  # b_ab_centered <- as.vector(b_ab_centered)
  
  a = sample(1:K_a, size = N, replace = T)
  b = sample(1:K_b, size = N, replace = T)
  d <- data.table(a, b)
  
  d[, b0 := b0]
  d[, b_a := b_a[a]]
  d[, b_b := b_b[b]]
  d[, ix_b_ab := a + (K_a * (b - 1))]
  d[, b_ab := b_ab[ix_b_ab]]
  d[, eta := b0 + b_a + b_b + b_ab]

  # e.g.  
#   > unique(d[order(b, a)])
#        a     b    b0   b_a   b_b  b_ab   eta
#    <int> <int> <num> <num> <num> <num> <num>
# 1:     1     1     1     0  -0.6  -0.2   0.2
# 2:     2     1     1    -1  -0.6   0.0  -0.6
# 3:     3     1     1     1  -0.6   0.6   2.0
# 4:     1     2     1     0   0.4  -0.4   1.0
# 5:     2     2     1    -1   0.4   0.2   0.6
# 6:     3     2     1     1   0.4  -0.2   2.2
  
  cols_a <- paste0("a", 1:K_a)
  cols_b <- paste0("a", 1:K_b)
  cols_c <- unique(d[order(b, a), paste0("a",a,"b",b)])
  
  d[, y := rbinom(N, 1, plogis(eta))]
  
  # saturated design matrix
  X_a <- diag(K_a)
  colnames(X_a) <- cols_a
  X_b <- diag(K_b)
  colnames(X_b) <- cols_b
  X_ab <- diag(K_a * K_b)
  colnames(X_ab) <- cols_c

  # correlation matrix to enforce sum to zero
  S_a <- diag(K_a) - (1 / K_a )
  S_b <- diag(K_b) - (1 / K_b )
  S_ab <- kronecker(S_a, S_b)
  # should be rank 2 for a 1:3, b 1:2
  # pracma::Rank(S_ab)
  
  # decomposition eigen vectors
  Q_a <- eigen(S_a)$vector[, 1:(K_a - 1)]
  Q_b <- eigen(S_b)$vector[, 1:(K_b - 1)]
  # Q_ab <- kronecker(Q_a, Q_b)
  
  # Ok, Q_ab as defined above will not allow me to recover the original
  # paramters. Why this is I do not know. The following is a workaround.
  
  # All we are doing is building a matrix that will sum up the 
  # columns and the rows of the interaction parameters.
  # We will then set that to zero and solve (i.e. compute the
  # null space).
  C_ab <- construct_C_ab(K_a, K_b)

  # Calculate the null space of C_ab
  # Nullspace of C_ab gives the set of vectors st each vector
  # in Q_ab results in C_ab v = 0
  Q_ab <- pracma::nullspace(C_ab)
  
  # transformed pars
  b_a_s <- t(Q_a) %*% b_a
  b_b_s <- t(Q_b) %*% b_b
  b_ab_s <- t(Q_ab) %*% b_ab

  # full rank design
  X_a_s <- X_a %*% Q_a
  X_b_s <- X_b %*% Q_b
  X_ab_s <- X_ab %*% Q_ab

  X_full_s <- create_full_design(X_a_s, X_b_s, X_ab_s)
  # round(X_full_s, 3)
   
  # Check
  stopifnot(all.equal(as.numeric(X_a_s %*% b_a_s), b_a, 
                      tolerance = sqrt(.Machine$double.eps)))
  stopifnot(all.equal(as.numeric(X_b_s %*% b_b_s), b_b, 
                      tolerance = sqrt(.Machine$double.eps)))
  stopifnot(all.equal(as.numeric(X_ab_s %*% b_ab_s), b_ab, 
                      tolerance = sqrt(.Machine$double.eps)))
  

  list(
    d = d, b0 = b0, 
    b_a = b_a, b_b = b_b, b_ab = b_ab, 
    b_a_s = b_a_s, b_b_s = b_b_s, b_ab_s = b_ab_s, 
    
    K_a = K_a, K_b = K_b, 
    X_a = X_a, X_a_s = X_a_s, 
    X_b = X_b, X_b_s = X_b_s, 
    X_ab = X_ab, X_ab_s = X_ab_s, 
    X_full_s = X_full_s,
    
    S_a = S_a, S_b = S_b, S_ab = S_ab, 
    C_ab = C_ab, Q_a = Q_a, Q_b = Q_b, Q_ab = Q_ab 
  )
}

We need some utility functions to stop this getting too messy.

Utility functions
# Utility function to create combinations of parameters (all columns and 
# all rows) that should sum to zero.
construct_C_ab <- function(K_a, K_b) {
  # Total number of interaction parameters
  n_params <- K_a * K_b
  
  # Number of constraints
  n_constraints <- K_a + K_b
  
  # Initialize C_ab matrix
  C_ab <- matrix(0, nrow = n_constraints, ncol = n_params)
  
  # Constraints for factor b (columns)
  for (j in 1:K_b) {
    col_indices <- ((j - 1) * K_a + 1):(j * K_a)
    C_ab[j, col_indices] <- 1
  }

  # Constraints for factor a (rows)
  for (i in 1:K_a) {
    row_indices <- seq(i, n_params, by = K_a)
    C_ab[K_b + i, row_indices] <- 1
  }
  return(C_ab)
}

# Utility function to create full design matrix from constrained space.
create_full_design <- function(X_a_s, X_b_s, X_ab_s) {
  
  K_a <- nrow(X_a_s)
  K_b <- nrow(X_b_s)
  n_a <- ncol(X_a_s)
  n_b <- ncol(X_b_s)
  n_ab <- ncol(X_ab_s)

  # Create all combinations
  design <- expand.grid(a = 1:K_a, b = 1:K_b)
  
  # Add intercept and main effects
  X_main <- cbind(1, 
                  X_a_s[design$a, ],
                  X_b_s[design$b, ])
  
  # Create interaction effects
  X_int <- X_ab_s[(design$b - 1) * K_a + design$a, , drop = FALSE]
  
  # it looks like we can just zero out interaction when main effect is zero
  # so that we have a logical design matrix, i.e. if some of the main factors 
  # are set to zero then the interaction should also be.
  # for (i in 1:n_a) {
  #   for (j in 1:n_b) {
  #     int_col <- (i - 1) * n_b + j
  #     X_int[X_main[, 1 + i] == 0 | X_main[, 1 + n_a + j] == 0, int_col] <- 0
  #   }
  # }
  
  # Combine all effects
  X_full <- cbind(X_main, X_int)
  
  return(X_full)
}

# Utility to create random parameter values that sum to zero
create_sum_zero_matrix <- function(n_rows, n_cols, sd = 1, seed = 1) {
  
  set.seed(seed)
  # Step 1: Generate random values
  mat <- matrix(rnorm(n_rows * n_cols, sd = sd), nrow = n_rows, ncol = n_cols)
  
  # Step 2: Center columns
  mat <- scale(mat, center = TRUE, scale = FALSE)
  
  # Step 3: Center rows
  mat <- t(scale(t(mat), center = TRUE, scale = FALSE))
  
  # Step 4: Final adjustment to ensure exact zero sums
  col_adj <- colSums(mat) / n_rows
  row_adj <- rowSums(mat) / n_cols
  
  for (i in 1:n_rows) {
    for (j in 1:n_cols) {
      mat[i,j] <- mat[i,j] - col_adj[j] - row_adj[i] + mean(col_adj)
    }
  }
  
  return(mat)
}


# Utility to create random parameter values that sum to zero
create_sum_zero_vec <- function(n = 3, sd = 1, seed = 1) {
  
  set.seed(seed)
  
  v <- scale(rnorm(n, 0, sd), center = TRUE, scale = FALSE)
  
  return(as.numeric(v))
}

Do some tests to see if the data generation process makes sense.

Not sure that these matrices are logical in terms of the products of the various factors…

2x2 constrained design matrix
K_a = 2
K_b = 2
b0 = 1
b_a = create_sum_zero_vec(n = K_a, seed = 1)
b_b = create_sum_zero_vec(n = K_b, seed = 2)
b_ab = as.numeric(create_sum_zero_matrix(K_a, K_b, seed = 3))

l1 <- get_data(
  N = 100, K_a = K_a, K_b = K_b, 
  b0 = b0, b_a = b_a, b_b = b_b, b_ab = b_ab
)
# 2 x 2
round(l1$X_full_s, 2)
     [,1]  [,2]  [,3] [,4]
[1,]    1 -0.71 -0.71  0.5
[2,]    1  0.71 -0.71 -0.5
[3,]    1 -0.71  0.71 -0.5
[4,]    1  0.71  0.71  0.5
4x2 constrained design matrix
K_a = 4
K_b = 2
b0 = 1
b_a = create_sum_zero_vec(n = K_a, seed = 4)
b_b = create_sum_zero_vec(n = K_b, seed = 5)
b_ab = as.numeric(create_sum_zero_matrix(K_a, K_b, seed = 6))

l1 <- get_data(
  N = 100, K_a = K_a, K_b = K_b, 
  b0 = b0, b_a = b_a, b_b = b_b, b_ab = b_ab
)
# 4 x 2
round(l1$X_full_s, 2)
     [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]
[1,]    1  0.00  0.00  0.87 -0.71  0.17  0.51  0.29
[2,]    1 -0.58 -0.58 -0.29 -0.71 -0.61  0.00 -0.07
[3,]    1 -0.21  0.79 -0.29 -0.71  0.17 -0.49  0.32
[4,]    1  0.79 -0.21 -0.29 -0.71  0.27 -0.02 -0.55
[5,]    1  0.00  0.00  0.87  0.71 -0.17 -0.51 -0.29
[6,]    1 -0.58 -0.58 -0.29  0.71  0.61  0.00  0.07
[7,]    1 -0.21  0.79 -0.29  0.71 -0.17  0.49 -0.32
[8,]    1  0.79 -0.21 -0.29  0.71 -0.27  0.02  0.55
3x5 constrained design matrix
K_a = 3
K_b = 4
b0 = 1
b_a = create_sum_zero_vec(n = K_a, seed = 6)
b_b = create_sum_zero_vec(n = K_b, seed = 7)
b_ab = as.numeric(create_sum_zero_matrix(K_a, K_b, seed = 8))

l1 <- get_data(
  N = 100, K_a = K_a, K_b = K_b, 
  b0 = b0, b_a = b_a, b_b = b_b, b_ab = b_ab
)
# 4 x 2
round(l1$X_full_s, 2)
      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
 [1,]    1  0.00  0.82  0.00  0.00  0.87 -0.10  0.45  0.30 -0.21  0.34  0.19
 [2,]    1 -0.71 -0.41  0.00  0.00  0.87  0.47 -0.17  0.20  0.24 -0.39 -0.03
 [3,]    1  0.71 -0.41  0.00  0.00  0.87 -0.37 -0.29 -0.50 -0.03  0.05 -0.16
 [4,]    1  0.00  0.82 -0.58 -0.58 -0.29 -0.07 -0.12  0.03 -0.41 -0.46 -0.31
 [5,]    1 -0.71 -0.41 -0.58 -0.58 -0.29 -0.33 -0.25  0.07  0.12  0.21  0.52
 [6,]    1  0.71 -0.41 -0.58 -0.58 -0.29  0.40  0.37 -0.10  0.29  0.26 -0.21
 [7,]    1  0.00  0.82 -0.21  0.79 -0.29  0.42 -0.33 -0.33 -0.02  0.23  0.23
 [8,]    1 -0.71 -0.41 -0.21  0.79 -0.29 -0.23  0.54 -0.30 -0.01 -0.24 -0.08
 [9,]    1  0.71 -0.41 -0.21  0.79 -0.29 -0.18 -0.21  0.63  0.04  0.01 -0.15
[10,]    1  0.00  0.82  0.79 -0.21 -0.29 -0.25  0.00  0.00  0.64 -0.11 -0.11
[11,]    1 -0.71 -0.41  0.79 -0.21 -0.29  0.10 -0.13  0.03 -0.35  0.43 -0.41
[12,]    1  0.71 -0.41  0.79 -0.21 -0.29  0.15  0.13 -0.03 -0.30 -0.32  0.52

The model now splits up the design into three separate matrices so that we can look to recover each grouping of parameters. Also, I convert to a binomial likelihood to speed things up.

Logistic regression
// Model for:
// independent estimates of treatment arm by subgroup means
data {
  int N;
  array[N] int y;
  array[N] int n;
  // arm index
  array[N, 3] int trt;
  // factor 1 design matrix
  int ncXades;
  int nrXades;
  matrix[nrXades, ncXades] Xades;
  vector[ncXades] sa;
  // factor 2 design matrix
  int ncXbdes;
  int nrXbdes;
  matrix[nrXbdes, ncXbdes] Xbdes;
  vector[ncXbdes] sb;
  // factor 1/2 interaction design matrix
  int ncXabdes;
  int nrXabdes;
  matrix[nrXabdes, ncXabdes] Xabdes;
  vector[ncXabdes] sab;
  
  int prior_only;
}
transformed data {
  // build full design matrices
  matrix[N, ncXades] Xa = Xades[trt[,1]];
  matrix[N, ncXbdes] Xb = Xbdes[trt[,2]];
  // indexes the relevant col in the design matrix
  matrix[N, ncXabdes] Xab = Xabdes[trt[,3]];
}
parameters{
  real mu;
  vector[ncXades] ba;  
  vector[ncXbdes] bb;
  vector[ncXabdes] bab;
}
transformed parameters{ 
  vector[N] eta = mu + Xa*ba + Xb*bb + Xab*bab;
}
model{
  target += logistic_lpdf(mu | 0, 1);
  target += normal_lpdf(ba | 0, sa);
  target += normal_lpdf(bb | 0, sb);
  target += normal_lpdf(bab | 0, sab);
  
  if(!prior_only){
    // target += bernoulli_logit_lpmf(y | eta);  
    target += binomial_logit_lpmf(y | n, eta);  
  }
}
generated quantities{
}

Two-factor (3 x 2)

Fit the model without the likelihood involved to examine the implied priors.

Code
m1 <- cmdstanr::cmdstan_model("stan/note-09-logistic-2.stan")

l1 <- get_data(
  N = 1e6, K_a = 3, K_b = 2, 
  b0 = 1,   b_a = c(0, -1, 1), b_b = c(-0.4, 0.4),
  # has to have all rows, all cols sum to zero if you 
  # want to target recovering parameters.
  b_ab = c(-0.05, 0.15, -0.1, 0.05, -0.15,  0.1)
)

d_smry <- l1$d[, .(y = sum(y), n = .N), keyby = .(b, a)]


ld <- list(
  N = nrow(d_smry), y = d_smry$y, n = d_smry$n, 
  trt = cbind(d_smry$a, d_smry$b, d_smry$a + (l1$K_a * (d_smry$b - 1))),
  
  nrXades = nrow(l1$X_a_s), ncXades = ncol(l1$X_a_s), 
  Xades = l1$X_a_s, sa = rep(1, ncol(l1$X_a_s)),
  
  nrXbdes = nrow(l1$X_b_s), ncXbdes = ncol(l1$X_b_s), 
  Xbdes = l1$X_b_s, sb = rep(1, ncol(l1$X_b_s)),
  
  nrXabdes = nrow(l1$X_ab_s), ncXabdes = ncol(l1$X_ab_s), 
  Xabdes = l1$X_ab_s, sab = rep(1, ncol(l1$X_ab_s)),
  
  prior_only = 1
)


f1 <- m1$sample(
  ld, iter_warmup = 1000, iter_sampling = 3000,
    parallel_chains = 2, chains = 2, refresh = 0, show_exceptions = F,
    max_treedepth = 10)
Running MCMC with 2 parallel chains...

Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.

Both chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.1 seconds.
Code
f1$summary(variables = c("mu", "ba", "bb", "bab"))
# A tibble: 6 × 10
  variable     mean   median    sd   mad    q5   q95  rhat ess_bulk ess_tail
  <chr>       <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 mu       -0.00680  0.00949 1.83  1.59  -3.07  3.06  1.00    8006.    4149.
2 ba[1]    -0.0157  -0.0173  1.00  0.998 -1.65  1.62  1.00    8398.    4477.
3 ba[2]    -0.00834 -0.0232  0.976 0.977 -1.62  1.63  1.00    8236.    4753.
4 bb[1]    -0.00215  0.0134  1.00  1.01  -1.66  1.66  1.00    8462.    4937.
5 bab[1]   -0.0149  -0.0111  1.01  1.00  -1.66  1.68  1.00    9257.    4619.
6 bab[2]    0.00843  0.0218  1.03  1.02  -1.71  1.67  1.00    7462.    4568.
Priors on group means
m_post_s <- f1$draws(variables = c("mu", "ba", "bb", "bab"), format = "matrix")


X_s <- cbind(l1$X_a_s[rep(1:l1$K_a, each = 2),], 
             l1$X_b_s[rep(1:l1$K_b, len = l1$K_a * l1$K_b),], 
             l1$X_ab_s)
post_mu <- m_post_s %*% t(cbind(1, X_s))

d_fig <- data.table(post_mu)
d_fig <- melt(d_fig, measure.vars = names(d_fig))
d_fig[, variable := factor(
  variable, 
  levels = paste0("V", 1:(l1$K_a * l1$K_b)))]

ggplot(d_fig, aes(x = value, group = variable)) +
  geom_density(lwd = 0.2) +
  facet_wrap(~variable, ncol = 1) 
Figure 4: Priors on group means for all cells

And now look at the parameter estimates.

Code
ld$prior_only <- 0

f1 <- m1$sample(
  ld, iter_warmup = 1000, iter_sampling = 2000,
    parallel_chains = 2, chains = 2, refresh = 0, show_exceptions = F,
    max_treedepth = 10)
Running MCMC with 2 parallel chains...

Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.

Both chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.1 seconds.
Code
f1$summary(variables = c("mu", "ba", "bb", "bab"))
# A tibble: 6 × 10
  variable     mean   median      sd     mad       q5      q95  rhat ess_bulk
  <chr>       <dbl>    <dbl>   <dbl>   <dbl>    <dbl>    <dbl> <dbl>    <dbl>
1 mu        1.00     1.00    0.00251 0.00255  0.996    1.00     1.00    4747.
2 ba[1]     1.42     1.42    0.00451 0.00446  1.41     1.42     1.00    3642.
3 ba[2]    -0.00149 -0.00150 0.00433 0.00435 -0.00865  0.00560  1.00    4175.
4 bb[1]     0.559    0.559   0.00358 0.00363  0.553    0.565    1.00    4302.
5 bab[1]   -0.260   -0.260   0.00548 0.00548 -0.269   -0.251    1.00    4082.
6 bab[2]    0.0598   0.0599  0.00684 0.00688  0.0484   0.0709   1.00    3558.
# ℹ 1 more variable: ess_tail <dbl>

Checking the recovery of the original parameters, it looks like we got somewhere near to the original values we used.

Parameter posterior density for first factor
m_post_s <- f1$draws(variables = c("ba"), format = "matrix") 
post_mu <- m_post_s %*% t(l1$X_a_s)

# colMeans(post_mu)
d_fig <- data.table(post_mu)
d_fig <- melt(d_fig, measure.vars = names(d_fig))
d_fig[, variable := factor(
  variable, 
  levels = paste0("V", 1:l1$K_a))]

d_tru <- data.table(b_a = l1$b_a)
d_tru[, variable := paste0("V", 1:l1$K_a)]

ggplot(d_fig, aes(x = value, group = variable)) +
  geom_density(lwd = 0.2) +
  geom_vline(data = d_tru,
             aes(xintercept = b_a), col = 2,
             lwd = 0.3) +
  facet_wrap(~variable, ncol = 1)
Figure 5: Parameter posterior density for first factor
Parameter posterior density for second factor
m_post_s <- f1$draws(variables = c("bb"), format = "matrix") 
post_mu <- m_post_s %*% t(l1$X_b_s)

# colMeans(post_mu)
d_fig <- data.table(post_mu)
d_fig <- melt(d_fig, measure.vars = names(d_fig))
d_fig[, variable := factor(
  variable, 
  levels = paste0("V", 1:l1$K_b))]

d_tru <- data.table(b_a = l1$b_b)
d_tru[, variable := paste0("V", 1:l1$K_b)]

ggplot(d_fig, aes(x = value, group = variable)) +
  geom_density(lwd = 0.2) +
  geom_vline(data = d_tru,
             aes(xintercept = b_a), col = 2,
             lwd = 0.3) +
  facet_wrap(~variable, ncol = 1)
Figure 6: Parameter posterior density for second factor
Parameter posterior density for interaction
m_post_s <- f1$draws(variables = c("bab"), format = "matrix") 
post_mu <- m_post_s %*% t(l1$X_ab_s)

# colMeans(post_mu)
d_fig <- data.table(post_mu)
d_fig <- melt(d_fig, measure.vars = names(d_fig))
d_fig[, variable := factor(
  variable, 
  levels = paste0("V", 1:(l1$K_a * l1$K_b)))]

d_tru <- data.table(b_ab = l1$b_ab)
d_tru[, variable := paste0("V", 1:(l1$K_a * l1$K_b))]

ggplot(d_fig, aes(x = value, group = variable)) +
  geom_density(lwd = 0.2) +
  geom_vline(data = d_tru,
             aes(xintercept = b_ab), col = 2,
             lwd = 0.3) +
  facet_wrap(~variable, ncol = 1)
Figure 7: Parameter posterior density for interaction
Parameter posterior density group means
m_post_s <- f1$draws(variables = c("mu", "ba", "bb", "bab"), format = "matrix") 
post_mu <- m_post_s %*% t(l1$X_full_s)

# colMeans(post_mu)
d_fig <- data.table(post_mu)
d_fig <- melt(d_fig, measure.vars = names(d_fig))
d_fig[, variable := factor(
  variable, 
  levels = paste0("V", 1:(l1$K_a * l1$K_b)))]

d_tru <- unique(l1$d[order(b, a), .(a, b, eta)])
d_tru[, variable := paste0("V", 1:(l1$K_a * l1$K_b))]

ggplot(d_fig, aes(x = value, group = variable)) +
  geom_density(lwd = 0.2) +
  geom_vline(data = d_tru,
             aes(xintercept = eta), col = 2,
             lwd = 0.3) +
  facet_wrap(~variable, ncol = 1)
Figure 8: Parameter posterior density group means

Two-factor (2 x 2)

Code
l1 <- get_data(
  N = 1e6, K_a = 2, K_b = 2, 
  b0 = 1,   
  b_a = create_sum_zero_vec(2, seed = 22),
  b_b = create_sum_zero_vec(2, seed = 23),
  b_ab = as.numeric(create_sum_zero_matrix(2, 2, seed = 24))
)

d_smry <- l1$d[, .(y = sum(y), n = .N), keyby = .(b, a)]


ld <- list(
  N = nrow(d_smry), y = d_smry$y, n = d_smry$n, 
  trt = cbind(d_smry$a, d_smry$b, d_smry$a + (l1$K_a * (d_smry$b - 1))),
  
  nrXades = nrow(l1$X_a_s), ncXades = ncol(l1$X_a_s), 
  Xades = l1$X_a_s, sa = rep(1, ncol(l1$X_a_s)),
  
  nrXbdes = nrow(l1$X_b_s), ncXbdes = ncol(l1$X_b_s), 
  Xbdes = l1$X_b_s, sb = rep(1, ncol(l1$X_b_s)),
  
  nrXabdes = nrow(l1$X_ab_s), ncXabdes = ncol(l1$X_ab_s), 
  Xabdes = l1$X_ab_s, sab = rep(1, ncol(l1$X_ab_s)),
  
  prior_only = 0
)



f1 <- m1$sample(
  ld, iter_warmup = 1000, iter_sampling = 3000,
    parallel_chains = 2, chains = 2, refresh = 0, show_exceptions = F,
    max_treedepth = 10)
Running MCMC with 2 parallel chains...

Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.

Both chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.2 seconds.
Code
m_post_s <- f1$draws(variables = c("ba"), format = "matrix") 
post_ba <- m_post_s %*% t(l1$X_a_s)

knitr::kable(rbind(
  b_a = l1$b_a,
  posterior_mean = colMeans(post_ba)
), digits = 3, caption = "Compare true parameters with posterior means (a factor)")
b_a -1.499 1.499
posterior_mean -1.504 1.504
Compare true parameters with posterior means (a factor)
Code
m_post_s <- f1$draws(variables = c("bb"), format = "matrix") 
post_bb <- m_post_s %*% t(l1$X_b_s)

knitr::kable(rbind(
  b_b = l1$b_b,
  posterior_mean = colMeans(post_bb)
), digits = 3, caption = "Compare true parameters with posterior means (b factor)")
b_b 0.314 -0.314
posterior_mean 0.314 -0.314
Compare true parameters with posterior means (b factor)
Code
m_post_s <- f1$draws(variables = c("bab"), format = "matrix") 
post_bab <- m_post_s %*% t(l1$X_ab_s)

knitr::kable(rbind(
  b_ab = l1$b_ab,
  posterior_mean = colMeans(post_bab)
), digits = 3, caption = "Compare true parameters with posterior means (interaction)")
b_ab -0.521 0.521 0.521 -0.521
posterior_mean -0.521 0.521 0.521 -0.521
Compare true parameters with posterior means (interaction)

Two-factor (3 x 3)

Code
l1 <- get_data(
  N = 1e6, K_a = 3, K_b = 3, 
  b0 = 1,   
  b_a = create_sum_zero_vec(3, seed = 22),
  b_b = create_sum_zero_vec(3, seed = 23),
  b_ab = as.numeric(create_sum_zero_matrix(3, 3, seed = 24))
)

d_smry <- l1$d[, .(y = sum(y), n = .N), keyby = .(b, a)]


ld <- list(
  N = nrow(d_smry), y = d_smry$y, n = d_smry$n, 
  trt = cbind(d_smry$a, d_smry$b, d_smry$a + (l1$K_a * (d_smry$b - 1))),
  
  nrXades = nrow(l1$X_a_s), ncXades = ncol(l1$X_a_s), 
  Xades = l1$X_a_s, sa = rep(1, ncol(l1$X_a_s)),
  
  nrXbdes = nrow(l1$X_b_s), ncXbdes = ncol(l1$X_b_s), 
  Xbdes = l1$X_b_s, sb = rep(1, ncol(l1$X_b_s)),
  
  nrXabdes = nrow(l1$X_ab_s), ncXabdes = ncol(l1$X_ab_s), 
  Xabdes = l1$X_ab_s, sab = rep(1, ncol(l1$X_ab_s)),
  
  prior_only = 0
)


f1 <- m1$sample(
  ld, iter_warmup = 1000, iter_sampling = 3000,
    parallel_chains = 2, chains = 2, refresh = 0, show_exceptions = F,
    max_treedepth = 10)
Running MCMC with 2 parallel chains...

Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.

Both chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.2 seconds.
Code
m_post_s <- f1$draws(variables = c("ba"), format = "matrix") 
post_ba <- m_post_s %*% t(l1$X_a_s)

knitr::kable(rbind(
  b_a = l1$b_a,
  posterior_mean = colMeans(post_ba)
), digits = 3, caption = "Compare true parameters with posterior means (a factor)")
b_a -1.506 1.492 0.014
posterior_mean -1.508 1.494 0.014
Compare true parameters with posterior means (a factor)
Code
m_post_s <- f1$draws(variables = c("bb"), format = "matrix") 
post_bb <- m_post_s %*% t(l1$X_b_s)

knitr::kable(rbind(
  b_b = l1$b_b,
  posterior_mean = colMeans(post_bb)
), digits = 3, caption = "Compare true parameters with posterior means (b factor)")
b_b -0.031 -0.659 0.689
posterior_mean -0.032 -0.668 0.700
Compare true parameters with posterior means (b factor)
Code
m_post_s <- f1$draws(variables = c("bab"), format = "matrix") 
post_bab <- m_post_s %*% t(l1$X_ab_s)

knitr::kable(rbind(
  b_ab = l1$b_ab,
  posterior_mean = colMeans(post_bab)
), digits = 3, caption = "Compare true parameters with posterior means (interaction)")
b_ab -0.447 0.102 0.345 -0.524 0.373 0.151 0.971 -0.474 -0.496
posterior_mean -0.444 0.096 0.347 -0.524 0.368 0.156 0.967 -0.464 -0.503
Compare true parameters with posterior means (interaction)

Two-factor (2 x 4)

Code
l1 <- get_data(
  N = 1e6, K_a = 2, K_b = 4, 
  b0 = 1,   
  b_a = create_sum_zero_vec(2, seed = 22),
  b_b = create_sum_zero_vec(4, seed = 23),
  b_ab = as.numeric(create_sum_zero_matrix(2, 4, seed = 24))
)

d_smry <- l1$d[, .(y = sum(y), n = .N), keyby = .(b, a)]


ld <- list(
  N = nrow(d_smry), y = d_smry$y, n = d_smry$n, 
  trt = cbind(d_smry$a, d_smry$b, d_smry$a + (l1$K_a * (d_smry$b - 1))),
  
  nrXades = nrow(l1$X_a_s), ncXades = ncol(l1$X_a_s), 
  Xades = l1$X_a_s, sa = rep(1, ncol(l1$X_a_s)),
  
  nrXbdes = nrow(l1$X_b_s), ncXbdes = ncol(l1$X_b_s), 
  Xbdes = l1$X_b_s, sb = rep(1, ncol(l1$X_b_s)),
  
  nrXabdes = nrow(l1$X_ab_s), ncXabdes = ncol(l1$X_ab_s), 
  Xabdes = l1$X_ab_s, sab = rep(1, ncol(l1$X_ab_s)),
  
  prior_only = 0
)

f1 <- m1$sample(
  ld, iter_warmup = 1000, iter_sampling = 3000,
    parallel_chains = 2, chains = 2, refresh = 0, show_exceptions = F,
    max_treedepth = 10)
Running MCMC with 2 parallel chains...

Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.

Both chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.2 seconds.
Code
m_post_s <- f1$draws(variables = c("ba"), format = "matrix") 
post_ba <- m_post_s %*% t(l1$X_a_s)

knitr::kable(rbind(
  b_a = l1$b_a,
  posterior_mean = colMeans(post_ba)
), digits = 3, caption = "Compare true parameters with posterior means (a factor)")
b_a -1.499 1.499
posterior_mean -1.501 1.501
Compare true parameters with posterior means (a factor)
Code
m_post_s <- f1$draws(variables = c("bb"), format = "matrix") 
post_bb <- m_post_s %*% t(l1$X_b_s)

knitr::kable(rbind(
  b_b = l1$b_b,
  posterior_mean = colMeans(post_bb)
), digits = 3, caption = "Compare true parameters with posterior means (b factor)")
b_b -0.423 -1.051 0.297 1.177
posterior_mean -0.423 -1.050 0.296 1.177
Compare true parameters with posterior means (b factor)
Code
m_post_s <- f1$draws(variables = c("bab"), format = "matrix") 
post_bab <- m_post_s %*% t(l1$X_ab_s)

knitr::kable(rbind(
  b_ab = l1$b_ab,
  posterior_mean = colMeans(post_bab)
), digits = 3, caption = "Compare true parameters with posterior means (interaction)")
b_ab -0.718 0.718 0.325 -0.325 0.114 -0.114 0.279 -0.279
posterior_mean -0.717 0.717 0.319 -0.319 0.113 -0.113 0.285 -0.285
Compare true parameters with posterior means (interaction)

Two-factor (5 x 2)

Code
l1 <- get_data(
  N = 1e6, K_a = 5, K_b = 2, 
  b0 = 1,   
  b_a = create_sum_zero_vec(5, seed = 22),
  b_b = create_sum_zero_vec(2, seed = 23),
  b_ab = as.numeric(create_sum_zero_matrix(5, 2, seed = 24))
)

d_smry <- l1$d[, .(y = sum(y), n = .N), keyby = .(b, a)]


ld <- list(
  N = nrow(d_smry), y = d_smry$y, n = d_smry$n, 
  trt = cbind(d_smry$a, d_smry$b, d_smry$a + (l1$K_a * (d_smry$b - 1))),
  
  nrXades = nrow(l1$X_a_s), ncXades = ncol(l1$X_a_s), 
  Xades = l1$X_a_s, sa = rep(1, ncol(l1$X_a_s)),
  
  nrXbdes = nrow(l1$X_b_s), ncXbdes = ncol(l1$X_b_s), 
  Xbdes = l1$X_b_s, sb = rep(1, ncol(l1$X_b_s)),
  
  nrXabdes = nrow(l1$X_ab_s), ncXabdes = ncol(l1$X_ab_s), 
  Xabdes = l1$X_ab_s, sab = rep(1, ncol(l1$X_ab_s)),
  
  prior_only = 0
)

f1 <- m1$sample(
  ld, iter_warmup = 1000, iter_sampling = 3000,
    parallel_chains = 2, chains = 2, refresh = 0, show_exceptions = F,
    max_treedepth = 10)
Running MCMC with 2 parallel chains...

Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.

Both chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.2 seconds.
Code
m_post_s <- f1$draws(variables = c("ba"), format = "matrix") 
post_ba <- m_post_s %*% t(l1$X_a_s)

knitr::kable(rbind(
  b_a = l1$b_a,
  posterior_mean = colMeans(post_ba)
), digits = 3, caption = "Compare true parameters with posterior means (a factor)")
b_a -1.125 1.872 0.395 -0.32 -0.822
posterior_mean -1.120 1.880 0.383 -0.32 -0.822
Compare true parameters with posterior means (a factor)
Code
m_post_s <- f1$draws(variables = c("bb"), format = "matrix") 
post_bb <- m_post_s %*% t(l1$X_b_s)

knitr::kable(rbind(
  b_b = l1$b_b,
  posterior_mean = colMeans(post_bb)
), digits = 3, caption = "Compare true parameters with posterior means (b factor)")
b_b 0.314 -0.314
posterior_mean 0.313 -0.313
Compare true parameters with posterior means (b factor)
Code
m_post_s <- f1$draws(variables = c("bab"), format = "matrix") 
post_bab <- m_post_s %*% t(l1$X_ab_s)

knitr::kable(rbind(
  b_ab = l1$b_ab,
  posterior_mean = colMeans(post_bab)
), digits = 3, caption = "Compare true parameters with posterior means (interaction)")
b_ab -0.534 -0.082 0.315 0.005 0.295 0.534 0.082 -0.315 -0.005 -0.295
posterior_mean -0.532 -0.083 0.312 0.004 0.300 0.532 0.083 -0.312 -0.004 -0.300
Compare true parameters with posterior means (interaction)

What is it with kronecker(Q_a, Q_b)?

The problem with kronecker(Q_a, Q_b) was that I was using a column wise definition for the interaction parameters. However, kronecker(Q_a, Q_b) varies b first rather than a which is incompatible with the specification of the interaction parameters.

Revised data generation function
get_data_alt <- function(
    N = 100,
    K_a = 3,
    K_b = 2,
    b0 = 1,
    # effects are sum to zero
    b_a = c(0, -1, 1),
    b_b = c(-0.6, 0.6),
    # rowwise
    b_ab = c(-0.05, 0.05, 0.15, -0.15, -0.1, 0.1)
    ){
  
  stopifnot(all.equal(0, sum(b_a), 
                      tolerance = sqrt(.Machine$double.eps)))
  stopifnot(all.equal(0, sum(b_b), 
                      tolerance = sqrt(.Machine$double.eps)))
  stopifnot(all.equal(0, sum(b_ab), 
                      tolerance = sqrt(.Machine$double.eps)))
  
  # cols and rows should sum to zero otherwise following will not work
  b_ab_matrix <- matrix(b_ab, nrow = K_a, ncol = K_b, byrow = T)
  stopifnot(all.equal(colSums(b_ab_matrix), rep(0, K_b),
                      tolerance = sqrt(.Machine$double.eps)))
  stopifnot(all.equal(rowSums(b_ab_matrix), rep(0, K_a),
                      tolerance = sqrt(.Machine$double.eps)))
  
  a = sample(1:K_a, size = N, replace = T)
  b = sample(1:K_b, size = N, replace = T)
  d <- data.table(a, b)
  
  d[, b0 := b0]
  d[, b_a := b_a[a]]
  d[, b_b := b_b[b]]
  # column wise indexing into b_ab
  # d[, ix_b_ab := a + (K_a * (b - 1))]
  # rowwise indexing into b_ab
  d[, ix_b_ab := ((a - 1) * K_b) + b]
  d[, b_ab := b_ab[ix_b_ab]]
  d[, eta := b0 + b_a + b_b + b_ab]

  # e.g.  
  unique(d[order(b, a)])
#        a     b    b0   b_a   b_b  b_ab   eta
#    <int> <int> <num> <num> <num> <num> <num>
# 1:     1     1     1     0  -0.6  -0.2   0.2
# 2:     2     1     1    -1  -0.6   0.0  -0.6
# 3:     3     1     1     1  -0.6   0.6   2.0
# 4:     1     2     1     0   0.4  -0.4   1.0
# 5:     2     2     1    -1   0.4   0.2   0.6
# 6:     3     2     1     1   0.4  -0.2   2.2
  
  cols_a <- paste0("a", 1:K_a)
  cols_b <- paste0("a", 1:K_b)
  cols_c <- unique(d[order(b, a), paste0("a",a,"b",b)])
  
  d[, y := rbinom(N, 1, plogis(eta))]
  
  # saturated design matrix
  X_a <- diag(K_a)
  colnames(X_a) <- cols_a
  X_b <- diag(K_b)
  colnames(X_b) <- cols_b
  X_ab <- diag(K_a * K_b)
  colnames(X_ab) <- cols_c

  # correlation matrix to enforce sum to zero
  S_a <- diag(K_a) - (1 / K_a )
  S_b <- diag(K_b) - (1 / K_b )
  S_ab <- kronecker(S_a, S_b)
  # should be rank 2 for a 1:3, b 1:2
  # pracma::Rank(S_ab)
  
  # decomposition eigen vectors
  Q_a <- eigen(S_a)$vector[, 1:(K_a - 1)]
  Q_b <- eigen(S_b)$vector[, 1:(K_b - 1)]
  Q_ab <- kronecker(Q_a, Q_b)
  
  # transformed pars
  b_a_s <- t(Q_a) %*% b_a
  b_b_s <- t(Q_b) %*% b_b
  b_ab_s <- t(Q_ab) %*% b_ab

  # full rank design
  X_a_s <- X_a %*% Q_a
  X_b_s <- X_b %*% Q_b
  X_ab_s <- X_ab %*% Q_ab
  
 
  # Check
  stopifnot(all.equal(as.numeric(X_a_s %*% b_a_s), b_a, 
                      tolerance = sqrt(.Machine$double.eps)))
  stopifnot(all.equal(as.numeric(X_b_s %*% b_b_s), b_b, 
                      tolerance = sqrt(.Machine$double.eps)))
  stopifnot(all.equal(as.numeric(X_ab_s %*% b_ab_s), b_ab, 
                      tolerance = sqrt(.Machine$double.eps)))
  
  
  
  # build the design matrix up incrementally
  X_full_s <- cbind(
    X_a_s[rep(1:K_a, length = K_a * K_b), ],
    X_b_s[rep(1:K_b, each = K_a), ]
  )
  for(j in 1:(K_b-1)){
    for(i in 1:(K_a-1)){
      X_full_s <- cbind(X_full_s,
                        X_full_s[, i]  * X_full_s[, ((K_a - 1) + j)])
    }
  }
  X_full_s
  # round(X_full_s, 3)
  
  list(
    d = d, b0 = b0, 
    b_a = b_a, b_b = b_b, b_ab = b_ab, 
    b_a_s = b_a_s, b_b_s = b_b_s, b_ab_s = b_ab_s, 
    
    K_a = K_a, K_b = K_b, 
    X_a = X_a, X_a_s = X_a_s, 
    X_b = X_b, X_b_s = X_b_s, 
    X_ab = X_ab, X_ab_s = X_ab_s, 
    X_full_s = X_full_s,
    
    S_a = S_a, S_b = S_b, S_ab = S_ab, 
    Q_a = Q_a, Q_b = Q_b, Q_ab = Q_ab 
  )
}
Code
l1 <- get_data_alt(
  N = 1e6, K_a = 3, K_b = 2, 
  b0 = 1,   
  b_a = create_sum_zero_vec(3, seed = 22),
  b_b = create_sum_zero_vec(2, seed = 23),
  b_ab = as.numeric(c(t(create_sum_zero_matrix(3, 2, seed = 24))))
)

d_smry <- l1$d[, .(y = sum(y), n = .N), keyby = .(b, a)]


ld <- list(
  N = nrow(d_smry), y = d_smry$y, n = d_smry$n, 
  # columnwise
  # trt = cbind(d_smry$a, d_smry$b, d_smry$a + (l1$K_a * (d_smry$b - 1))),
  
  # rowwise
  trt = cbind(d_smry$a, d_smry$b, ((d_smry$a - 1) * l1$K_b) + d_smry$b),
  
  nrXades = nrow(l1$X_a_s), ncXades = ncol(l1$X_a_s), 
  Xades = l1$X_a_s, sa = rep(1, ncol(l1$X_a_s)),
  
  nrXbdes = nrow(l1$X_b_s), ncXbdes = ncol(l1$X_b_s), 
  Xbdes = l1$X_b_s, sb = rep(1, ncol(l1$X_b_s)),
  
  nrXabdes = nrow(l1$X_ab_s), ncXabdes = ncol(l1$X_ab_s), 
  Xabdes = l1$X_ab_s, sab = rep(1, ncol(l1$X_ab_s)),
  
  prior_only = 0
)

f1 <- m1$sample(
  ld, iter_warmup = 1000, iter_sampling = 3000,
    parallel_chains = 2, chains = 2, refresh = 0, show_exceptions = F,
    max_treedepth = 10)
Running MCMC with 2 parallel chains...

Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.

Both chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.2 seconds.
Code
m_post_s <- f1$draws(variables = c("ba"), format = "matrix") 
post_ba <- m_post_s %*% t(l1$X_a_s)

knitr::kable(rbind(
  b_a = l1$b_a,
  posterior_mean = colMeans(post_ba)
), digits = 3, caption = "Compare true parameters with posterior means (a factor)")
b_a -1.506 1.492 0.014
posterior_mean -1.507 1.486 0.021
Compare true parameters with posterior means (a factor)
Code
m_post_s <- f1$draws(variables = c("bb"), format = "matrix") 
post_bb <- m_post_s %*% t(l1$X_b_s)

knitr::kable(rbind(
  b_b = l1$b_b,
  posterior_mean = colMeans(post_bb)
), digits = 3, caption = "Compare true parameters with posterior means (b factor)")
b_b 0.314 -0.314
posterior_mean 0.311 -0.311
Compare true parameters with posterior means (b factor)
Code
m_post_s <- f1$draws(variables = c("bab"), format = "matrix") 
post_bab <- m_post_s %*% t(l1$X_ab_s)

knitr::kable(rbind(
  b_ab = l1$b_ab,
  posterior_mean = colMeans(post_bab)
), digits = 3, caption = "Compare true parameters with posterior means (interaction)")
b_ab 0.039 -0.039 -0.136 0.136 0.097 -0.097
posterior_mean 0.041 -0.041 -0.134 0.134 0.093 -0.093
Compare true parameters with posterior means (interaction)