Warning: package 'cmdstanr' was built under R version 4.4.3
Orthonormal contrasts
Consider the standard one factor factorial setup where we have a grand mean (as in the mean of the factor level means) and deviations for main effects:
\[ \begin{aligned} \eta = 1^\top \mu + X_a \alpha + X_b \alpha + X_c \gamma \end{aligned} \]
where \(X_a\) is the design matrix.
It might be tempting setup a design matrix with a column for every parameter, but the OLS solution involves inverting \(X^\top X\) which you cannot do as the rank of \(X\) is less than the number of columns in \(X\):
X <- cbind(1, diag(3))
# in ols, the estimator involves inverting the design matrix
tryCatch(
{
solve(t(X)%*%X)
} ,
error = function(z){
message(z, "\n")
}
)
Error in solve.default(t(X) %*% X): system is computationally singular: reciprocal condition number = 1.38778e-17
however, if we use the intercept as the reference group (or introduce a sum to zero constraint) then we are fine
X <- cbind(1, diag(3)[,2:3])
# in ols, the estimator involves inverting the design matrix
solve(t(X)%*%X)
[,1] [,2] [,3]
[1,] 1 -1 -1
[2,] -1 2 1
[3,] -1 1 2
Contrasts define a specific linear combination of parameters and affect their interpretation. When treatment contrasts are adopted with a design matrix that includes main and interaction effects for \(a\) with 3 levels and \(b\) with 2 levels we have
(Intercept) a2 a3 b2 a2:b2 a3:b2
1 1 0 0 0 0 0
2 1 0 0 1 0 0
3 1 1 0 0 0 0
4 1 1 0 1 1 0
5 1 0 1 0 0 0
6 1 0 1 1 0 1
attr(,"assign")
[1] 0 1 1 2 3 3
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"
attr(,"contrasts")$b
[1] "contr.treatment"
and for deviation (sum to zero) contrasts, which is what would be used for the anova setup, it would look like
a <- factor(1:3)
b <- factor(1:2)
d <- CJ(a, b)
contrasts(d$a) <- contr.sum(3)
contrasts(d$b) <- contr.sum(2)
X <- model.matrix(~a * b, data = d)
X
(Intercept) a1 a2 b1 a1:b1 a2:b1
1 1 1 0 1 1 0
2 1 1 0 -1 -1 0
3 1 0 1 1 0 1
4 1 0 1 -1 0 -1
5 1 -1 -1 1 -1 -1
6 1 -1 -1 -1 1 1
attr(,"assign")
[1] 0 1 1 2 3 3
attr(,"contrasts")
attr(,"contrasts")$a
[,1] [,2]
1 1 0
2 0 1
3 -1 -1
attr(,"contrasts")$b
[,1]
1 1
2 -1
For treatment contrasts, the intercept represents the mean of the reference group, taken to be when \(a\) and \(b\) are both at their first level. For deviation contrasts the intercept becomes the grand mean, equating to the mean of the group means, not the mean value of the outcome in the dataset.
With classic ANOVA, a sum to zero constraint allows us to overcome the rank deficient nature of the design matrix.
When using the treatment contrasts, the second and third levels of \(a\) are explicit in the design matrix whereas for the deviation contrasts, the first and second levels appear. For example, with a single factor model:
K_a <- 4
# means at levels of a
b_a <- c(-1, 3, 2, 1)
N <- 1e6
a <- sample(1:K_a, size = N, replace = T, prob = c(0.3, 0.2, 0.4, 0.1))
d <- data.table(a = factor(a))
d[, mu := b_a[a]]
d[, y := rnorm(.N, mu, 1)]
# sum to zero/deviation contrasts as used for anova
contrasts(d$a) <- contr.sum(K_a)
f1 <- lm(y ~ a, data = d)
The coefficients include an intercept and deviations for the first three levels of \(a\)
For the deviation contrasts, the mean for the first level of \(a\) is computed as the grand mean, plus the first parameter estimate and similarly for the second and third. The mean for the last level of \(a\) is implied as the grand mean minus the sum of the first three terms, hence sum to zero.
d_new <- CJ(a = factor(1:K_a))
contrasts(d_new$a) <- contr.sum(K_a)
X <- model.matrix(~a, data = d_new)
X
(Intercept) a1 a2 a3
1 1 1 0 0
2 1 0 1 0
3 1 0 0 1
4 1 -1 -1 -1
attr(,"assign")
[1] 0 1 1 1
attr(,"contrasts")
attr(,"contrasts")$a
[,1] [,2] [,3]
1 1 0 0
2 0 1 0
3 0 0 1
4 -1 -1 -1
and \(\sum_{i=1}^{K_a} \alpha_i = 0\) which we can derive empirically by calculating the all K_a
group means, subtracting the grand mean from each and taking the sum:
1 2 3 4
[1,] -1.002029 3.002591 2.001221 1.001319
[,1] [,2] [,3] [,4]
tru -1.000 3.000 2.000 1.000
estimate -1.002 3.003 2.001 1.001
which sum to zero:
The above creates an inconvenience for the Bayesian who would like to (i) play frequentist in their insistence with differentiating fixed from random effects and (ii) place equal prior weights to each group of parameters. We could (and often do) respond to this by fixing the reference parameter at zero to overcome the identification issue, but that also puts a different prior weight on this reference parameter in relation to the other parameters within the group. An alternative to arbitrarily fixing one of the factor level parameters to zero is to project the \(K_a\) dimension parameter space associated with the first factor into a \(K_a - 1\) dimensional space in a way that will enforce the same marginal prior on all \(K_a\) terms.
The way that the sum-to-zero constraint is achieved is by placing a prior that has a negative correlation across the effects:
\[ \begin{aligned} \Sigma_a = \mathbb{I}_a - J_a / K_a \end{aligned} \]
where \(J_a\) is a matrix of ones of dimension \(K_a \times K_a\), leading to, for example, a covariance matrix such as the following:
[,1] [,2] [,3]
[1,] 0.6666667 -0.3333333 -0.3333333
[2,] -0.3333333 0.6666667 -0.3333333
[3,] -0.3333333 -0.3333333 0.6666667
where again we are just considering a single factor at the moment. As a test, we can generate draws from a multivariate normal distribution using this covariance matrix and observe that they will sum to zero:
u <- mvtnorm::rmvnorm(3, rep(0, K_a), S_a)
# final column is the sum of the draws:
round(cbind(u, rowSums(u)), 3)
[,1] [,2] [,3] [,4]
[1,] -1.658 1.320 0.338 0
[2,] -0.802 0.157 0.645 0
[3,] 0.420 0.077 -0.497 0
\(\Sigma_a\) is not full rank, but it can be decomposed using its set of the eigenvectors. Since \(\Sigma_a\) is rank \(K_a - 1 = 2\), there will only be two non-zero eigenvalues. Both of the eigenvalues are equal to 1. Below, we construct \(\Sigma_a\) as the product of \(Q_a\) and \(\Lambda\) where \(Q_a\) is a matrix formed from \(K_a - 1\) eigenvectors that correspond to the non-zero eigenvalues of \(\Sigma_a\). In this setting, \(\Lambda = \mathbb{I}_{K_a - 1}\).
# for symmetric matrices, svd equiv to eigen
# eigen values are sorted in decreasing order so we drop the last one
svd_S_a <- svd(S_a)
# eigenvectors
Q_a <- svd_S_a$v
Q_a <- Q_a[, 1:(K_a - 1)]
# original setup for S_a
Q_a %*% diag(K_a - 1) %*% t(Q_a)
[,1] [,2] [,3]
[1,] 0.6666667 -0.3333333 -0.3333333
[2,] -0.3333333 0.6666667 -0.3333333
[3,] -0.3333333 -0.3333333 0.6666667
This allows us to define a new vector of \(K_a - 1\) parameters \(\alpha^*\):
\[ \begin{aligned} \alpha^* = Q_a^\top \alpha \end{aligned} \]
that sum to zero due to \(Q_a\):
The \(Q_a\) defines the contrasts that are orthonormal and allow us to identify the \(K_a - 1\) parameters.
Armed with the above, we can now take our original tempting setup for the design matrix (i.e. with a column for every parameter) and convert it to a new design matrix that maps \(\alpha^*\) into the observations:
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
[,1] [,2]
[1,] -0.8164966 4.108230e-17
[2,] 0.4082483 -7.071068e-01
[3,] 0.4082483 7.071068e-01
To go from the parameters in their original space to the parameters in the reduced dimensional space and back again:
Code
[,1]
[1,] -1.00
[2,] 0.75
[3,] 0.25
An implementation for the single factor case follows:
get_data <- function(
N = 100,
K_a = 3,
mu = 1,
b_a = c(-1, 0.25, 0.75)
){
a = sample(1:K_a, size = N, replace = T)
d <- data.table(a)
d[, eta := mu + b_a[a]]
d[, y := rbinom(N, 1, plogis(eta))]
# saturated design matrix
X_a <- diag(K_a)
# correlation matrix to enforce sum to zero
J <- matrix(1, K_a, K_a)
S_a <- diag(K_a) - J / K_a
# decomposition
# eigen vectors
Q_a <- eigen(S_a)$vector[, 1:(K_a - 1)]
# full rank design
X_a_s <- X_a %*% Q_a
list(
d = d, K_a = K_a,
mu = mu, b_a = b_a,
X_a = X_a, X_a_s = X_a_s,
S_a = S_a, Q_a = Q_a
)
}
Logistic regression
// Model for:
// independent estimates of treatment arm by subgroup means
data {
int N;
array[N] int y;
// arm index
array[N] int x1trt;
// dimension of design matrix
int ncX1des;
int nrX1des;
matrix[nrX1des, ncX1des] X1des;
vector[ncX1des] sx1;
int prior_only;
}
transformed data {
// build full design matrices
matrix[N, ncX1des] X1 = X1des[x1trt];
}
parameters{
real mu;
vector[ncX1des] bx1;
}
transformed parameters{
vector[N] eta = mu + X1*bx1;
}
model{
target += logistic_lpdf(mu | 0, 1);
target += normal_lpdf(bx1 | 0, sx1);
if(!prior_only){
target += bernoulli_logit_lpmf(y | eta);
}
}
generated quantities{
}
Code
m1 <- cmdstanr::cmdstan_model("stan/note-09-logistic.stan")
l1 <- get_data(
N = 1e4, K_a = 5, mu = 0.5,
b_a = c(-2, -1, 0, 1, 2)
)
# would be more efficient to transform into a binomial likelihood
# but keeping bernoulli for now since it allows me to think about
# unit level data (not sure if that will be necessary though)
ld <- list(
N = length(l1$d$y),
y = l1$d$y,
#
x1trt = l1$d$a,
nrX1des = nrow(l1$X_a_s), ncX1des = ncol(l1$X_a_s),
X1des = l1$X_a_s,
sx1 = rep(1, ncol(l1$X_a_s)),
prior_only = 1
)
f1 <- m1$sample(
ld, iter_warmup = 1000, iter_sampling = 3000,
parallel_chains = 2, chains = 2, refresh = 0, show_exceptions = F,
max_treedepth = 10)
Running MCMC with 2 parallel chains...
Chain 1 finished in 7.3 seconds.
Chain 2 finished in 7.5 seconds.
Both chains finished successfully.
Mean chain execution time: 7.4 seconds.
Total execution time: 7.6 seconds.
# A tibble: 5 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 mu 0.0345 0.0442 1.79 1.62 -2.87 2.96 1.00 6641. 3844.
2 bx1[1] -0.00345 0.0114 1.01 1.03 -1.66 1.67 1.00 6856. 4563.
3 bx1[2] 0.00486 0.0131 0.984 0.980 -1.65 1.60 1.00 6731. 4671.
4 bx1[3] 0.00926 0.00449 1.00 1.00 -1.62 1.66 1.00 7864. 5051.
5 bx1[4] -0.0181 -0.0131 0.999 0.997 -1.69 1.59 1.00 6882. 4314.
To transform the parameter estimates back to the group means that we are interested in, we need to compute the product of the parameters and the unique entries from the design matrix.
In theory, these should all have the same prior weight, which appears to be the case as shown below, Figure 1:
Priors on group offsets
m_post_s <- f1$draws(variables = c("bx1"), format = "matrix")
post_mu <- m_post_s %*% t(cbind(l1$X_a_s))
d_fig <- data.table(post_mu)
d_fig <- melt(d_fig, measure.vars = names(d_fig))
d_fig[, variable := factor(
variable,
levels = paste0("V", 1:(l1$K_a)))]
ggplot(d_fig, aes(x = value, group = variable)) +
geom_density(lwd = 0.2) +
facet_wrap(~variable, ncol = 1)
Run the model to see if we get close to the known (true) parameters although they won’t be immediately apparent from the summary:
Code
Running MCMC with 2 parallel chains...
Chain 2 finished in 8.3 seconds.
Chain 1 finished in 8.6 seconds.
Both chains finished successfully.
Mean chain execution time: 8.4 seconds.
Total execution time: 8.6 seconds.
Code
# A tibble: 5 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 mu 0.487 0.487 0.0265 0.0270 0.443 0.531 1.00 3768. 3500.
2 bx1[1] -0.926 -0.925 0.0458 0.0462 -1.00 -0.850 1.00 4492. 2929.
3 bx1[2] 1.25 1.25 0.0743 0.0731 1.13 1.37 1.00 4736. 3642.
4 bx1[3] -2.16 -2.16 0.0561 0.0555 -2.26 -2.07 1.00 4326. 2946.
5 bx1[4] -1.58 -1.58 0.0563 0.0558 -1.67 -1.48 1.00 4114. 2929.
Obviously, the results won’t align exactly with the true values, but they should be somewhere close, see Figure 2:
Parameter posterior density
m_post_s <- f1$draws(variables = c("mu", "bx1"), format = "matrix")
post_mu <- m_post_s %*% t(cbind(1, l1$X_a_s))
d_fig <- data.table(post_mu)
d_fig <- melt(d_fig, measure.vars = names(d_fig))
d_fig[, variable := factor(
variable,
levels = paste0("V", 1:(l1$K_a)))]
d_tru <- data.table(
a = 1:l1$K_a
)
d_tru[, eta := l1$mu + l1$b_a[a]]
d_tru[, variable := factor(paste0("V", a))]
ggplot(d_fig, aes(x = value, group = variable)) +
geom_density(lwd = 0.2) +
geom_vline(data = d_tru,
aes(xintercept = eta), col = 2,
lwd = 0.3) +
facet_wrap(~variable, ncol = 1) +
scale_x_continuous(breaks = seq(-3, 3, by = 0.5))
Parameter posterior density
m_post_s <- f1$draws(variables = c("bx1"), format = "matrix")
post_mu <- m_post_s %*% t(cbind(l1$X_a_s))
d_fig <- data.table(post_mu)
d_fig <- melt(d_fig, measure.vars = names(d_fig))
d_fig[, variable := factor(
variable,
levels = paste0("V", 1:(l1$K_a)))]
d_tru <- data.table(
a = 1:l1$K_a
)
d_tru[, eta := l1$b_a[a]]
d_tru[, variable := factor(paste0("V", a))]
ggplot(d_fig, aes(x = value, group = variable)) +
geom_density(lwd = 0.2) +
geom_vline(data = d_tru,
aes(xintercept = eta), col = 2,
lwd = 0.3) +
facet_wrap(~variable, ncol = 1) +
scale_x_continuous(breaks = seq(-3, 3, by = 0.5))
Two-factor setup
Here we have two main effects and their associated interactions.
I have kept this first implementation as a record of my initial mistake on columnwise vs rowwise specification of the interaction parameters.
If you look at the argument for b_ab
you will see that the original values were -0.05, 0.05, 0.15, ...
but now they have been revised to -0.05, 0.15, -0.1, ...
with the original being rowwise and the corrected approach being columnwise.
So, the implementation in the final section (see get_data_alt
right at the end of this note), which uses kronecker
rather than construct_C_ab
is the correct/preferred way to do this.
Revised data generation function
N = 100
K_a = 3
K_b = 2
b0 = 1
# effects are sum to zero
b_a = c(0, -1, 1)
b_b = c(-0.6, 0.6)
b_ab = c(-0.05, 0.15, -0.1, 0.05, -0.15, 0.1)
# b_ab = c(-0.05, 0.05, 0.15, -0.15, 0.1, -0.1)
get_data <- function(
N = 100,
K_a = 3,
K_b = 2,
b0 = 1,
# effects are sum to zero
b_a = c(0, -1, 1),
b_b = c(-0.6, 0.6),
b_ab = c(-0.05, 0.15, -0.1, 0.05, -0.15, 0.1)
# b_ab = c(-0.05, 0.05, 0.15, -0.15, 0.1, -0.1)
){
stopifnot(all.equal(0, sum(b_a),
tolerance = sqrt(.Machine$double.eps)))
stopifnot(all.equal(0, sum(b_b),
tolerance = sqrt(.Machine$double.eps)))
stopifnot(all.equal(0, sum(b_ab),
tolerance = sqrt(.Machine$double.eps)))
# cols and rows should sum to zero otherwise following will not work
b_ab_matrix <- matrix(b_ab, nrow = K_a, ncol = K_b)
stopifnot(all.equal(colSums(b_ab_matrix), rep(0, K_b),
tolerance = sqrt(.Machine$double.eps)))
stopifnot(all.equal(rowSums(b_ab_matrix), rep(0, K_a),
tolerance = sqrt(.Machine$double.eps)))
# we could go down this avenue but the original parameters would never be
# recovered.
# b_ab_centered <- b_ab_matrix -
# rowMeans(b_ab_matrix) -
# rep(colMeans(b_ab_matrix), each = K_a) +
# mean(b_ab_matrix)
# b_ab_centered <- as.vector(b_ab_centered)
a = sample(1:K_a, size = N, replace = T)
b = sample(1:K_b, size = N, replace = T)
d <- data.table(a, b)
d[, b0 := b0]
d[, b_a := b_a[a]]
d[, b_b := b_b[b]]
d[, ix_b_ab := a + (K_a * (b - 1))]
d[, b_ab := b_ab[ix_b_ab]]
d[, eta := b0 + b_a + b_b + b_ab]
# e.g.
# > unique(d[order(b, a)])
# a b b0 b_a b_b b_ab eta
# <int> <int> <num> <num> <num> <num> <num>
# 1: 1 1 1 0 -0.6 -0.2 0.2
# 2: 2 1 1 -1 -0.6 0.0 -0.6
# 3: 3 1 1 1 -0.6 0.6 2.0
# 4: 1 2 1 0 0.4 -0.4 1.0
# 5: 2 2 1 -1 0.4 0.2 0.6
# 6: 3 2 1 1 0.4 -0.2 2.2
cols_a <- paste0("a", 1:K_a)
cols_b <- paste0("a", 1:K_b)
cols_c <- unique(d[order(b, a), paste0("a",a,"b",b)])
d[, y := rbinom(N, 1, plogis(eta))]
# saturated design matrix
X_a <- diag(K_a)
colnames(X_a) <- cols_a
X_b <- diag(K_b)
colnames(X_b) <- cols_b
X_ab <- diag(K_a * K_b)
colnames(X_ab) <- cols_c
# correlation matrix to enforce sum to zero
S_a <- diag(K_a) - (1 / K_a )
S_b <- diag(K_b) - (1 / K_b )
S_ab <- kronecker(S_a, S_b)
# should be rank 2 for a 1:3, b 1:2
# pracma::Rank(S_ab)
# decomposition eigen vectors
Q_a <- eigen(S_a)$vector[, 1:(K_a - 1)]
Q_b <- eigen(S_b)$vector[, 1:(K_b - 1)]
# Q_ab <- kronecker(Q_a, Q_b)
# Ok, Q_ab as defined above will not allow me to recover the original
# paramters. Why this is I do not know. The following is a workaround.
# All we are doing is building a matrix that will sum up the
# columns and the rows of the interaction parameters.
# We will then set that to zero and solve (i.e. compute the
# null space).
C_ab <- construct_C_ab(K_a, K_b)
# Calculate the null space of C_ab
# Nullspace of C_ab gives the set of vectors st each vector
# in Q_ab results in C_ab v = 0
Q_ab <- pracma::nullspace(C_ab)
# transformed pars
b_a_s <- t(Q_a) %*% b_a
b_b_s <- t(Q_b) %*% b_b
b_ab_s <- t(Q_ab) %*% b_ab
# full rank design
X_a_s <- X_a %*% Q_a
X_b_s <- X_b %*% Q_b
X_ab_s <- X_ab %*% Q_ab
X_full_s <- create_full_design(X_a_s, X_b_s, X_ab_s)
# round(X_full_s, 3)
# Check
stopifnot(all.equal(as.numeric(X_a_s %*% b_a_s), b_a,
tolerance = sqrt(.Machine$double.eps)))
stopifnot(all.equal(as.numeric(X_b_s %*% b_b_s), b_b,
tolerance = sqrt(.Machine$double.eps)))
stopifnot(all.equal(as.numeric(X_ab_s %*% b_ab_s), b_ab,
tolerance = sqrt(.Machine$double.eps)))
list(
d = d, b0 = b0,
b_a = b_a, b_b = b_b, b_ab = b_ab,
b_a_s = b_a_s, b_b_s = b_b_s, b_ab_s = b_ab_s,
K_a = K_a, K_b = K_b,
X_a = X_a, X_a_s = X_a_s,
X_b = X_b, X_b_s = X_b_s,
X_ab = X_ab, X_ab_s = X_ab_s,
X_full_s = X_full_s,
S_a = S_a, S_b = S_b, S_ab = S_ab,
C_ab = C_ab, Q_a = Q_a, Q_b = Q_b, Q_ab = Q_ab
)
}
We need some utility functions to stop this getting too messy.
Utility functions
# Utility function to create combinations of parameters (all columns and
# all rows) that should sum to zero.
construct_C_ab <- function(K_a, K_b) {
# Total number of interaction parameters
n_params <- K_a * K_b
# Number of constraints
n_constraints <- K_a + K_b
# Initialize C_ab matrix
C_ab <- matrix(0, nrow = n_constraints, ncol = n_params)
# Constraints for factor b (columns)
for (j in 1:K_b) {
col_indices <- ((j - 1) * K_a + 1):(j * K_a)
C_ab[j, col_indices] <- 1
}
# Constraints for factor a (rows)
for (i in 1:K_a) {
row_indices <- seq(i, n_params, by = K_a)
C_ab[K_b + i, row_indices] <- 1
}
return(C_ab)
}
# Utility function to create full design matrix from constrained space.
create_full_design <- function(X_a_s, X_b_s, X_ab_s) {
K_a <- nrow(X_a_s)
K_b <- nrow(X_b_s)
n_a <- ncol(X_a_s)
n_b <- ncol(X_b_s)
n_ab <- ncol(X_ab_s)
# Create all combinations
design <- expand.grid(a = 1:K_a, b = 1:K_b)
# Add intercept and main effects
X_main <- cbind(1,
X_a_s[design$a, ],
X_b_s[design$b, ])
# Create interaction effects
X_int <- X_ab_s[(design$b - 1) * K_a + design$a, , drop = FALSE]
# it looks like we can just zero out interaction when main effect is zero
# so that we have a logical design matrix, i.e. if some of the main factors
# are set to zero then the interaction should also be.
# for (i in 1:n_a) {
# for (j in 1:n_b) {
# int_col <- (i - 1) * n_b + j
# X_int[X_main[, 1 + i] == 0 | X_main[, 1 + n_a + j] == 0, int_col] <- 0
# }
# }
# Combine all effects
X_full <- cbind(X_main, X_int)
return(X_full)
}
# Utility to create random parameter values that sum to zero
create_sum_zero_matrix <- function(n_rows, n_cols, sd = 1, seed = 1) {
set.seed(seed)
# Step 1: Generate random values
mat <- matrix(rnorm(n_rows * n_cols, sd = sd), nrow = n_rows, ncol = n_cols)
# Step 2: Center columns
mat <- scale(mat, center = TRUE, scale = FALSE)
# Step 3: Center rows
mat <- t(scale(t(mat), center = TRUE, scale = FALSE))
# Step 4: Final adjustment to ensure exact zero sums
col_adj <- colSums(mat) / n_rows
row_adj <- rowSums(mat) / n_cols
for (i in 1:n_rows) {
for (j in 1:n_cols) {
mat[i,j] <- mat[i,j] - col_adj[j] - row_adj[i] + mean(col_adj)
}
}
return(mat)
}
# Utility to create random parameter values that sum to zero
create_sum_zero_vec <- function(n = 3, sd = 1, seed = 1) {
set.seed(seed)
v <- scale(rnorm(n, 0, sd), center = TRUE, scale = FALSE)
return(as.numeric(v))
}
Do some tests to see if the data generation process makes sense.
Not sure that these matrices are logical in terms of the products of the various factors…
2x2 constrained design matrix
[,1] [,2] [,3] [,4]
[1,] 1 -0.71 -0.71 0.5
[2,] 1 0.71 -0.71 -0.5
[3,] 1 -0.71 0.71 -0.5
[4,] 1 0.71 0.71 0.5
4x2 constrained design matrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 0.00 0.00 0.87 -0.71 0.17 0.51 0.29
[2,] 1 -0.58 -0.58 -0.29 -0.71 -0.61 0.00 -0.07
[3,] 1 -0.21 0.79 -0.29 -0.71 0.17 -0.49 0.32
[4,] 1 0.79 -0.21 -0.29 -0.71 0.27 -0.02 -0.55
[5,] 1 0.00 0.00 0.87 0.71 -0.17 -0.51 -0.29
[6,] 1 -0.58 -0.58 -0.29 0.71 0.61 0.00 0.07
[7,] 1 -0.21 0.79 -0.29 0.71 -0.17 0.49 -0.32
[8,] 1 0.79 -0.21 -0.29 0.71 -0.27 0.02 0.55
3x5 constrained design matrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 0.00 0.82 0.00 0.00 0.87 -0.10 0.45 0.30 -0.21 0.34 0.19
[2,] 1 -0.71 -0.41 0.00 0.00 0.87 0.47 -0.17 0.20 0.24 -0.39 -0.03
[3,] 1 0.71 -0.41 0.00 0.00 0.87 -0.37 -0.29 -0.50 -0.03 0.05 -0.16
[4,] 1 0.00 0.82 -0.58 -0.58 -0.29 -0.07 -0.12 0.03 -0.41 -0.46 -0.31
[5,] 1 -0.71 -0.41 -0.58 -0.58 -0.29 -0.33 -0.25 0.07 0.12 0.21 0.52
[6,] 1 0.71 -0.41 -0.58 -0.58 -0.29 0.40 0.37 -0.10 0.29 0.26 -0.21
[7,] 1 0.00 0.82 -0.21 0.79 -0.29 0.42 -0.33 -0.33 -0.02 0.23 0.23
[8,] 1 -0.71 -0.41 -0.21 0.79 -0.29 -0.23 0.54 -0.30 -0.01 -0.24 -0.08
[9,] 1 0.71 -0.41 -0.21 0.79 -0.29 -0.18 -0.21 0.63 0.04 0.01 -0.15
[10,] 1 0.00 0.82 0.79 -0.21 -0.29 -0.25 0.00 0.00 0.64 -0.11 -0.11
[11,] 1 -0.71 -0.41 0.79 -0.21 -0.29 0.10 -0.13 0.03 -0.35 0.43 -0.41
[12,] 1 0.71 -0.41 0.79 -0.21 -0.29 0.15 0.13 -0.03 -0.30 -0.32 0.52
The model now splits up the design into three separate matrices so that we can look to recover each grouping of parameters. Also, I convert to a binomial likelihood to speed things up.
Logistic regression
// Model for:
// independent estimates of treatment arm by subgroup means
data {
int N;
array[N] int y;
array[N] int n;
// arm index
array[N, 3] int trt;
// factor 1 design matrix
int ncXades;
int nrXades;
matrix[nrXades, ncXades] Xades;
vector[ncXades] sa;
// factor 2 design matrix
int ncXbdes;
int nrXbdes;
matrix[nrXbdes, ncXbdes] Xbdes;
vector[ncXbdes] sb;
// factor 1/2 interaction design matrix
int ncXabdes;
int nrXabdes;
matrix[nrXabdes, ncXabdes] Xabdes;
vector[ncXabdes] sab;
int prior_only;
}
transformed data {
// build full design matrices
matrix[N, ncXades] Xa = Xades[trt[,1]];
matrix[N, ncXbdes] Xb = Xbdes[trt[,2]];
// indexes the relevant col in the design matrix
matrix[N, ncXabdes] Xab = Xabdes[trt[,3]];
}
parameters{
real mu;
vector[ncXades] ba;
vector[ncXbdes] bb;
vector[ncXabdes] bab;
}
transformed parameters{
vector[N] eta = mu + Xa*ba + Xb*bb + Xab*bab;
}
model{
target += logistic_lpdf(mu | 0, 1);
target += normal_lpdf(ba | 0, sa);
target += normal_lpdf(bb | 0, sb);
target += normal_lpdf(bab | 0, sab);
if(!prior_only){
// target += bernoulli_logit_lpmf(y | eta);
target += binomial_logit_lpmf(y | n, eta);
}
}
generated quantities{
}
Two-factor (3 x 2)
Fit the model without the likelihood involved to examine the implied priors.
Code
m1 <- cmdstanr::cmdstan_model("stan/note-09-logistic-2.stan")
l1 <- get_data(
N = 1e6, K_a = 3, K_b = 2,
b0 = 1, b_a = c(0, -1, 1), b_b = c(-0.4, 0.4),
# has to have all rows, all cols sum to zero if you
# want to target recovering parameters.
b_ab = c(-0.05, 0.15, -0.1, 0.05, -0.15, 0.1)
)
d_smry <- l1$d[, .(y = sum(y), n = .N), keyby = .(b, a)]
ld <- list(
N = nrow(d_smry), y = d_smry$y, n = d_smry$n,
trt = cbind(d_smry$a, d_smry$b, d_smry$a + (l1$K_a * (d_smry$b - 1))),
nrXades = nrow(l1$X_a_s), ncXades = ncol(l1$X_a_s),
Xades = l1$X_a_s, sa = rep(1, ncol(l1$X_a_s)),
nrXbdes = nrow(l1$X_b_s), ncXbdes = ncol(l1$X_b_s),
Xbdes = l1$X_b_s, sb = rep(1, ncol(l1$X_b_s)),
nrXabdes = nrow(l1$X_ab_s), ncXabdes = ncol(l1$X_ab_s),
Xabdes = l1$X_ab_s, sab = rep(1, ncol(l1$X_ab_s)),
prior_only = 1
)
f1 <- m1$sample(
ld, iter_warmup = 1000, iter_sampling = 3000,
parallel_chains = 2, chains = 2, refresh = 0, show_exceptions = F,
max_treedepth = 10)
Running MCMC with 2 parallel chains...
Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.
Both chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.1 seconds.
# A tibble: 6 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 mu -0.00680 0.00949 1.83 1.59 -3.07 3.06 1.00 8006. 4149.
2 ba[1] -0.0157 -0.0173 1.00 0.998 -1.65 1.62 1.00 8398. 4477.
3 ba[2] -0.00834 -0.0232 0.976 0.977 -1.62 1.63 1.00 8236. 4753.
4 bb[1] -0.00215 0.0134 1.00 1.01 -1.66 1.66 1.00 8462. 4937.
5 bab[1] -0.0149 -0.0111 1.01 1.00 -1.66 1.68 1.00 9257. 4619.
6 bab[2] 0.00843 0.0218 1.03 1.02 -1.71 1.67 1.00 7462. 4568.
Priors on group means
m_post_s <- f1$draws(variables = c("mu", "ba", "bb", "bab"), format = "matrix")
X_s <- cbind(l1$X_a_s[rep(1:l1$K_a, each = 2),],
l1$X_b_s[rep(1:l1$K_b, len = l1$K_a * l1$K_b),],
l1$X_ab_s)
post_mu <- m_post_s %*% t(cbind(1, X_s))
d_fig <- data.table(post_mu)
d_fig <- melt(d_fig, measure.vars = names(d_fig))
d_fig[, variable := factor(
variable,
levels = paste0("V", 1:(l1$K_a * l1$K_b)))]
ggplot(d_fig, aes(x = value, group = variable)) +
geom_density(lwd = 0.2) +
facet_wrap(~variable, ncol = 1)
And now look at the parameter estimates.
Code
Running MCMC with 2 parallel chains...
Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.
Both chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.1 seconds.
# A tibble: 6 × 10
variable mean median sd mad q5 q95 rhat ess_bulk
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 mu 1.00 1.00 0.00251 0.00255 0.996 1.00 1.00 4747.
2 ba[1] 1.42 1.42 0.00451 0.00446 1.41 1.42 1.00 3642.
3 ba[2] -0.00149 -0.00150 0.00433 0.00435 -0.00865 0.00560 1.00 4175.
4 bb[1] 0.559 0.559 0.00358 0.00363 0.553 0.565 1.00 4302.
5 bab[1] -0.260 -0.260 0.00548 0.00548 -0.269 -0.251 1.00 4082.
6 bab[2] 0.0598 0.0599 0.00684 0.00688 0.0484 0.0709 1.00 3558.
# ℹ 1 more variable: ess_tail <dbl>
Checking the recovery of the original parameters, it looks like we got somewhere near to the original values we used.
Parameter posterior density for first factor
m_post_s <- f1$draws(variables = c("ba"), format = "matrix")
post_mu <- m_post_s %*% t(l1$X_a_s)
# colMeans(post_mu)
d_fig <- data.table(post_mu)
d_fig <- melt(d_fig, measure.vars = names(d_fig))
d_fig[, variable := factor(
variable,
levels = paste0("V", 1:l1$K_a))]
d_tru <- data.table(b_a = l1$b_a)
d_tru[, variable := paste0("V", 1:l1$K_a)]
ggplot(d_fig, aes(x = value, group = variable)) +
geom_density(lwd = 0.2) +
geom_vline(data = d_tru,
aes(xintercept = b_a), col = 2,
lwd = 0.3) +
facet_wrap(~variable, ncol = 1)
Parameter posterior density for second factor
m_post_s <- f1$draws(variables = c("bb"), format = "matrix")
post_mu <- m_post_s %*% t(l1$X_b_s)
# colMeans(post_mu)
d_fig <- data.table(post_mu)
d_fig <- melt(d_fig, measure.vars = names(d_fig))
d_fig[, variable := factor(
variable,
levels = paste0("V", 1:l1$K_b))]
d_tru <- data.table(b_a = l1$b_b)
d_tru[, variable := paste0("V", 1:l1$K_b)]
ggplot(d_fig, aes(x = value, group = variable)) +
geom_density(lwd = 0.2) +
geom_vline(data = d_tru,
aes(xintercept = b_a), col = 2,
lwd = 0.3) +
facet_wrap(~variable, ncol = 1)
Parameter posterior density for interaction
m_post_s <- f1$draws(variables = c("bab"), format = "matrix")
post_mu <- m_post_s %*% t(l1$X_ab_s)
# colMeans(post_mu)
d_fig <- data.table(post_mu)
d_fig <- melt(d_fig, measure.vars = names(d_fig))
d_fig[, variable := factor(
variable,
levels = paste0("V", 1:(l1$K_a * l1$K_b)))]
d_tru <- data.table(b_ab = l1$b_ab)
d_tru[, variable := paste0("V", 1:(l1$K_a * l1$K_b))]
ggplot(d_fig, aes(x = value, group = variable)) +
geom_density(lwd = 0.2) +
geom_vline(data = d_tru,
aes(xintercept = b_ab), col = 2,
lwd = 0.3) +
facet_wrap(~variable, ncol = 1)
Parameter posterior density group means
m_post_s <- f1$draws(variables = c("mu", "ba", "bb", "bab"), format = "matrix")
post_mu <- m_post_s %*% t(l1$X_full_s)
# colMeans(post_mu)
d_fig <- data.table(post_mu)
d_fig <- melt(d_fig, measure.vars = names(d_fig))
d_fig[, variable := factor(
variable,
levels = paste0("V", 1:(l1$K_a * l1$K_b)))]
d_tru <- unique(l1$d[order(b, a), .(a, b, eta)])
d_tru[, variable := paste0("V", 1:(l1$K_a * l1$K_b))]
ggplot(d_fig, aes(x = value, group = variable)) +
geom_density(lwd = 0.2) +
geom_vline(data = d_tru,
aes(xintercept = eta), col = 2,
lwd = 0.3) +
facet_wrap(~variable, ncol = 1)
Two-factor (2 x 2)
Code
l1 <- get_data(
N = 1e6, K_a = 2, K_b = 2,
b0 = 1,
b_a = create_sum_zero_vec(2, seed = 22),
b_b = create_sum_zero_vec(2, seed = 23),
b_ab = as.numeric(create_sum_zero_matrix(2, 2, seed = 24))
)
d_smry <- l1$d[, .(y = sum(y), n = .N), keyby = .(b, a)]
ld <- list(
N = nrow(d_smry), y = d_smry$y, n = d_smry$n,
trt = cbind(d_smry$a, d_smry$b, d_smry$a + (l1$K_a * (d_smry$b - 1))),
nrXades = nrow(l1$X_a_s), ncXades = ncol(l1$X_a_s),
Xades = l1$X_a_s, sa = rep(1, ncol(l1$X_a_s)),
nrXbdes = nrow(l1$X_b_s), ncXbdes = ncol(l1$X_b_s),
Xbdes = l1$X_b_s, sb = rep(1, ncol(l1$X_b_s)),
nrXabdes = nrow(l1$X_ab_s), ncXabdes = ncol(l1$X_ab_s),
Xabdes = l1$X_ab_s, sab = rep(1, ncol(l1$X_ab_s)),
prior_only = 0
)
f1 <- m1$sample(
ld, iter_warmup = 1000, iter_sampling = 3000,
parallel_chains = 2, chains = 2, refresh = 0, show_exceptions = F,
max_treedepth = 10)
Running MCMC with 2 parallel chains...
Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.
Both chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.2 seconds.
Code
b_a | -1.499 | 1.499 |
posterior_mean | -1.504 | 1.504 |
Code
b_b | 0.314 | -0.314 |
posterior_mean | 0.314 | -0.314 |
Code
b_ab | -0.521 | 0.521 | 0.521 | -0.521 |
posterior_mean | -0.521 | 0.521 | 0.521 | -0.521 |
Two-factor (3 x 3)
Code
l1 <- get_data(
N = 1e6, K_a = 3, K_b = 3,
b0 = 1,
b_a = create_sum_zero_vec(3, seed = 22),
b_b = create_sum_zero_vec(3, seed = 23),
b_ab = as.numeric(create_sum_zero_matrix(3, 3, seed = 24))
)
d_smry <- l1$d[, .(y = sum(y), n = .N), keyby = .(b, a)]
ld <- list(
N = nrow(d_smry), y = d_smry$y, n = d_smry$n,
trt = cbind(d_smry$a, d_smry$b, d_smry$a + (l1$K_a * (d_smry$b - 1))),
nrXades = nrow(l1$X_a_s), ncXades = ncol(l1$X_a_s),
Xades = l1$X_a_s, sa = rep(1, ncol(l1$X_a_s)),
nrXbdes = nrow(l1$X_b_s), ncXbdes = ncol(l1$X_b_s),
Xbdes = l1$X_b_s, sb = rep(1, ncol(l1$X_b_s)),
nrXabdes = nrow(l1$X_ab_s), ncXabdes = ncol(l1$X_ab_s),
Xabdes = l1$X_ab_s, sab = rep(1, ncol(l1$X_ab_s)),
prior_only = 0
)
f1 <- m1$sample(
ld, iter_warmup = 1000, iter_sampling = 3000,
parallel_chains = 2, chains = 2, refresh = 0, show_exceptions = F,
max_treedepth = 10)
Running MCMC with 2 parallel chains...
Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.
Both chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.2 seconds.
Code
b_a | -1.506 | 1.492 | 0.014 |
posterior_mean | -1.508 | 1.494 | 0.014 |
Code
b_b | -0.031 | -0.659 | 0.689 |
posterior_mean | -0.032 | -0.668 | 0.700 |
Code
b_ab | -0.447 | 0.102 | 0.345 | -0.524 | 0.373 | 0.151 | 0.971 | -0.474 | -0.496 |
posterior_mean | -0.444 | 0.096 | 0.347 | -0.524 | 0.368 | 0.156 | 0.967 | -0.464 | -0.503 |
Two-factor (2 x 4)
Code
l1 <- get_data(
N = 1e6, K_a = 2, K_b = 4,
b0 = 1,
b_a = create_sum_zero_vec(2, seed = 22),
b_b = create_sum_zero_vec(4, seed = 23),
b_ab = as.numeric(create_sum_zero_matrix(2, 4, seed = 24))
)
d_smry <- l1$d[, .(y = sum(y), n = .N), keyby = .(b, a)]
ld <- list(
N = nrow(d_smry), y = d_smry$y, n = d_smry$n,
trt = cbind(d_smry$a, d_smry$b, d_smry$a + (l1$K_a * (d_smry$b - 1))),
nrXades = nrow(l1$X_a_s), ncXades = ncol(l1$X_a_s),
Xades = l1$X_a_s, sa = rep(1, ncol(l1$X_a_s)),
nrXbdes = nrow(l1$X_b_s), ncXbdes = ncol(l1$X_b_s),
Xbdes = l1$X_b_s, sb = rep(1, ncol(l1$X_b_s)),
nrXabdes = nrow(l1$X_ab_s), ncXabdes = ncol(l1$X_ab_s),
Xabdes = l1$X_ab_s, sab = rep(1, ncol(l1$X_ab_s)),
prior_only = 0
)
f1 <- m1$sample(
ld, iter_warmup = 1000, iter_sampling = 3000,
parallel_chains = 2, chains = 2, refresh = 0, show_exceptions = F,
max_treedepth = 10)
Running MCMC with 2 parallel chains...
Chain 1 finished in 0.2 seconds.
Chain 2 finished in 0.2 seconds.
Both chains finished successfully.
Mean chain execution time: 0.2 seconds.
Total execution time: 0.2 seconds.
Code
b_a | -1.499 | 1.499 |
posterior_mean | -1.501 | 1.501 |
Code
b_b | -0.423 | -1.051 | 0.297 | 1.177 |
posterior_mean | -0.423 | -1.050 | 0.296 | 1.177 |
Code
b_ab | -0.718 | 0.718 | 0.325 | -0.325 | 0.114 | -0.114 | 0.279 | -0.279 |
posterior_mean | -0.717 | 0.717 | 0.319 | -0.319 | 0.113 | -0.113 | 0.285 | -0.285 |
Two-factor (5 x 2)
Code
l1 <- get_data(
N = 1e6, K_a = 5, K_b = 2,
b0 = 1,
b_a = create_sum_zero_vec(5, seed = 22),
b_b = create_sum_zero_vec(2, seed = 23),
b_ab = as.numeric(create_sum_zero_matrix(5, 2, seed = 24))
)
d_smry <- l1$d[, .(y = sum(y), n = .N), keyby = .(b, a)]
ld <- list(
N = nrow(d_smry), y = d_smry$y, n = d_smry$n,
trt = cbind(d_smry$a, d_smry$b, d_smry$a + (l1$K_a * (d_smry$b - 1))),
nrXades = nrow(l1$X_a_s), ncXades = ncol(l1$X_a_s),
Xades = l1$X_a_s, sa = rep(1, ncol(l1$X_a_s)),
nrXbdes = nrow(l1$X_b_s), ncXbdes = ncol(l1$X_b_s),
Xbdes = l1$X_b_s, sb = rep(1, ncol(l1$X_b_s)),
nrXabdes = nrow(l1$X_ab_s), ncXabdes = ncol(l1$X_ab_s),
Xabdes = l1$X_ab_s, sab = rep(1, ncol(l1$X_ab_s)),
prior_only = 0
)
f1 <- m1$sample(
ld, iter_warmup = 1000, iter_sampling = 3000,
parallel_chains = 2, chains = 2, refresh = 0, show_exceptions = F,
max_treedepth = 10)
Running MCMC with 2 parallel chains...
Chain 1 finished in 0.2 seconds.
Chain 2 finished in 0.2 seconds.
Both chains finished successfully.
Mean chain execution time: 0.2 seconds.
Total execution time: 0.2 seconds.
Code
b_a | -1.125 | 1.872 | 0.395 | -0.32 | -0.822 |
posterior_mean | -1.120 | 1.880 | 0.383 | -0.32 | -0.822 |
Code
b_b | 0.314 | -0.314 |
posterior_mean | 0.313 | -0.313 |
Code
b_ab | -0.534 | -0.082 | 0.315 | 0.005 | 0.295 | 0.534 | 0.082 | -0.315 | -0.005 | -0.295 |
posterior_mean | -0.532 | -0.083 | 0.312 | 0.004 | 0.300 | 0.532 | 0.083 | -0.312 | -0.004 | -0.300 |
What is it with kronecker(Q_a, Q_b)
?
The problem with kronecker(Q_a, Q_b)
was that I was using a columnwise definition for the interaction argument passed to the data generation function. However, kronecker(Q_a, Q_b)
varies b
first rather than a
and that is incompatible with the original specification of the interaction argument. After fixing that up, I can use kronecker
as shown below. Also note that I now transform the return values from the create_sum_zero_matrix
function so that the matrix will flatten with c
in the correct way.
Revised data generation function
get_data_alt <- function(
N = 100,
K_a = 3,
K_b = 2,
b0 = 1,
# effects are sum to zero
b_a = c(0, -1, 1),
b_b = c(-0.6, 0.6),
# rowwise
b_ab = c(-0.05, 0.05, 0.15, -0.15, -0.1, 0.1)
){
stopifnot(all.equal(0, sum(b_a),
tolerance = sqrt(.Machine$double.eps)))
stopifnot(all.equal(0, sum(b_b),
tolerance = sqrt(.Machine$double.eps)))
stopifnot(all.equal(0, sum(b_ab),
tolerance = sqrt(.Machine$double.eps)))
# cols and rows should sum to zero otherwise following will not work
b_ab_matrix <- matrix(b_ab, nrow = K_a, ncol = K_b, byrow = T)
stopifnot(all.equal(colSums(b_ab_matrix), rep(0, K_b),
tolerance = sqrt(.Machine$double.eps)))
stopifnot(all.equal(rowSums(b_ab_matrix), rep(0, K_a),
tolerance = sqrt(.Machine$double.eps)))
a = sample(1:K_a, size = N, replace = T)
b = sample(1:K_b, size = N, replace = T)
d <- data.table(a, b)
d[, b0 := b0]
d[, b_a := b_a[a]]
d[, b_b := b_b[b]]
# column wise indexing into b_ab
# d[, ix_b_ab := a + (K_a * (b - 1))]
# rowwise indexing into b_ab
d[, ix_b_ab := ((a - 1) * K_b) + b]
d[, b_ab := b_ab[ix_b_ab]]
d[, eta := b0 + b_a + b_b + b_ab]
# e.g.
unique(d[order(b, a)])
# a b b0 b_a b_b b_ab eta
# <int> <int> <num> <num> <num> <num> <num>
# 1: 1 1 1 0 -0.6 -0.2 0.2
# 2: 2 1 1 -1 -0.6 0.0 -0.6
# 3: 3 1 1 1 -0.6 0.6 2.0
# 4: 1 2 1 0 0.4 -0.4 1.0
# 5: 2 2 1 -1 0.4 0.2 0.6
# 6: 3 2 1 1 0.4 -0.2 2.2
cols_a <- paste0("a", 1:K_a)
cols_b <- paste0("a", 1:K_b)
cols_c <- unique(d[order(b, a), paste0("a",a,"b",b)])
d[, y := rbinom(N, 1, plogis(eta))]
# saturated design matrix
X_a <- diag(K_a)
colnames(X_a) <- cols_a
X_b <- diag(K_b)
colnames(X_b) <- cols_b
X_ab <- diag(K_a * K_b)
colnames(X_ab) <- cols_c
# correlation matrix to enforce sum to zero
S_a <- diag(K_a) - (1 / K_a )
S_b <- diag(K_b) - (1 / K_b )
S_ab <- kronecker(S_a, S_b)
# should be rank 2 for a 1:3, b 1:2
# pracma::Rank(S_ab)
# decomposition eigen vectors
Q_a <- eigen(S_a)$vector[, 1:(K_a - 1)]
Q_b <- eigen(S_b)$vector[, 1:(K_b - 1)]
Q_ab <- kronecker(Q_a, Q_b)
# transformed pars
b_a_s <- t(Q_a) %*% b_a
b_b_s <- t(Q_b) %*% b_b
b_ab_s <- t(Q_ab) %*% b_ab
# full rank design
X_a_s <- X_a %*% Q_a
X_b_s <- X_b %*% Q_b
X_ab_s <- X_ab %*% Q_ab
# Check
stopifnot(all.equal(as.numeric(X_a_s %*% b_a_s), b_a,
tolerance = sqrt(.Machine$double.eps)))
stopifnot(all.equal(as.numeric(X_b_s %*% b_b_s), b_b,
tolerance = sqrt(.Machine$double.eps)))
stopifnot(all.equal(as.numeric(X_ab_s %*% b_ab_s), b_ab,
tolerance = sqrt(.Machine$double.eps)))
# build the design matrix up incrementally
X_full_s <- cbind(
X_a_s[rep(1:K_a, length = K_a * K_b), ],
X_b_s[rep(1:K_b, each = K_a), ]
)
for(j in 1:(K_b-1)){
for(i in 1:(K_a-1)){
X_full_s <- cbind(X_full_s,
X_full_s[, i] * X_full_s[, ((K_a - 1) + j)])
}
}
X_full_s
# round(X_full_s, 3)
list(
d = d, b0 = b0,
b_a = b_a, b_b = b_b, b_ab = b_ab,
b_a_s = b_a_s, b_b_s = b_b_s, b_ab_s = b_ab_s,
K_a = K_a, K_b = K_b,
X_a = X_a, X_a_s = X_a_s,
X_b = X_b, X_b_s = X_b_s,
X_ab = X_ab, X_ab_s = X_ab_s,
X_full_s = X_full_s,
S_a = S_a, S_b = S_b, S_ab = S_ab,
Q_a = Q_a, Q_b = Q_b, Q_ab = Q_ab
)
}
Code
l1 <- get_data_alt(
N = 1e6, K_a = 3, K_b = 2,
b0 = 1,
b_a = create_sum_zero_vec(3, seed = 22),
b_b = create_sum_zero_vec(2, seed = 23),
b_ab = as.numeric(c(t(create_sum_zero_matrix(3, 2, seed = 24))))
)
d_smry <- l1$d[, .(y = sum(y), n = .N), keyby = .(b, a)]
ld <- list(
N = nrow(d_smry), y = d_smry$y, n = d_smry$n,
# columnwise
# trt = cbind(d_smry$a, d_smry$b, d_smry$a + (l1$K_a * (d_smry$b - 1))),
# rowwise
trt = cbind(d_smry$a, d_smry$b, ((d_smry$a - 1) * l1$K_b) + d_smry$b),
nrXades = nrow(l1$X_a_s), ncXades = ncol(l1$X_a_s),
Xades = l1$X_a_s, sa = rep(1, ncol(l1$X_a_s)),
nrXbdes = nrow(l1$X_b_s), ncXbdes = ncol(l1$X_b_s),
Xbdes = l1$X_b_s, sb = rep(1, ncol(l1$X_b_s)),
nrXabdes = nrow(l1$X_ab_s), ncXabdes = ncol(l1$X_ab_s),
Xabdes = l1$X_ab_s, sab = rep(1, ncol(l1$X_ab_s)),
prior_only = 0
)
f1 <- m1$sample(
ld, iter_warmup = 1000, iter_sampling = 3000,
parallel_chains = 2, chains = 2, refresh = 0, show_exceptions = F,
max_treedepth = 10)
Running MCMC with 2 parallel chains...
Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.
Both chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.2 seconds.
Code
b_a | -1.506 | 1.492 | 0.014 |
posterior_mean | -1.507 | 1.486 | 0.021 |
Code
b_b | 0.314 | -0.314 |
posterior_mean | 0.311 | -0.311 |
Code
b_ab | 0.039 | -0.039 | -0.136 | 0.136 | 0.097 | -0.097 |
posterior_mean | 0.041 | -0.041 | -0.134 | 0.134 | 0.093 | -0.093 |