Simulation results 6

Sequential design with early stopping (restricted action set) - risk based

Published

June 20, 2025

Modified

June 3, 2025

Libraries and globals

source("./R/init.R")

Warning: package 'cmdstanr' was built under R version 4.4.3

Libraries and globals

source("./R/util.R")
source("./R/data.R")
log_info("Called simulation-results 6 notebook")

toks <- unlist(tstrsplit(getwd(), "/")) 
if(toks[length(toks)] == "roadmap-sim"){
  prefix_cfg <- "./etc/sim06/"
  prefix_stan <- "./stan"
  prefix_fig <- "./fig"
} else {
  prefix_cfg <- "../etc/sim06/"
  prefix_stan <- "../stan"
  prefix_fig <- "../fig"
}

Load simulation results

# Each input file corresponds to the results from a single simulation
# scenario/configuration.
# Load all the files into a single list.

# files of interest
sim_lab <- "sim06-06"

flist <- list.files(paste0("data/", sim_lab), pattern = "sim06")
toks <- list()
l <- list()
i <- 1
for(i in 1:length(flist)){
  l[[i]] <- qs::qread(file.path(paste0("data/", sim_lab), flist[i]))
  toks[[i]] <-  unlist(tstrsplit(flist[i], "[-.]"))
}

Introduction

Data generation

Data is generated based on the empirical distributions obtained mostly from the PIANO study.

We simulate silo membership from a multinomial distribution with probabilities 0.3, 0.5 and 0.2 for early, late and chronic. Conditional on silo membership, we simulate site of infection from a multinomial distribution with probabilities 0.4, 0.6 (knee, hip | early), 0.7, 0.3 (knee, hip | late) and 0.5, 0.5 (knee, hip | chronic).

For the surgical domain, we simulate preference towards revision from a multinomial distribution with probabilities 0.66, 0.33 (rev(1), rev(2) | early), 0.3, 0.7 (rev(1), rev(2) | late), 0.25, 0.75 (rev(1), rev(2) | chronic).

We simulate surgical intervention from a multinomial distribution with probabilities 0.85, 0.15 (DAIR, revision | early), 0.5, 0.5 (DAIR, revision | late), 0.2, 0.8 (DAIR, revision | chronic). Revision is subsequently decomposed into one and two-stage via the preferences.

A silo-specific index can be computed as \(d_1 + (K_{d_1} (s - 1))\) where \(d_1 \in \{1, 2, 3\}\) is the treatment allocation (DAIR, rev(1), rev(2)), \(K_{d_1} = 3\) is the number of surgical treatment types and \(s \in \{1, 2, 3\}\) is the silo membership (early, late, chronic).

For the antibiotic duration domain, we fix all participants that have received DAIR or two-stage revision as having non-randomised treatment. For all those receiving one-stage, we simulate antibiotic duration intervention from a multinomial distribution with probabilities 0.3, 0.35, 0.35 (non-rand, 12 weeks, 6 weeks | rev(1)). That is, 70% of the eligible set get randomised treatment and the split between 12 weeks and 6 weeks is 1:1.

For the extended prophylaxis domain, we fix all participants that have received DAIR or one-stage revision as having non-randomised treatment. For all those receiving one-stage, we simulate extended prophylaxis intervention from a multinomial distribution with probabilities 0.1, 0.45, 0.45 (non-rand, 12 weeks, 6 weeks | rev(2)). That is, 90% of the eligible set get randomised treatment and the split between 12 weeks and 6 weeks is 1:1.

For the antibiotic choice domain, we simulate the intervention from a multinomial distribution with probabilities 0.4, 0.3, 0.3 (non-rand, no-rifampicin, rifampicin) unconditional on silo membership. That is, 60% of the eligible set get randomised treatment and the split between no-rifampicin and rifampicin is 1:1.

As the trial progresses, decisions may be made which would lead to some allocations being shut off.

The true log-odds of response by subject is calculated as the sum of their parameters indexed by the levels generated above. Treatment success is simulated as a bernoulli random variable with probability equal to the inverse logit transform of the log-odds from the linear predictor. To speed up the model, we aggregate number of successes and number of trials by covariate group which gives the analogous binomial random variable representation.

Model

We adopt a revised model where we allow for silo-specific effects in the surgical domain.

If, for whatever reason, the early or chronic silo show variation in the surgical domain effects adjustment for silo is inadequate to account for this and we end up with a polluted version of the surgical domain parameters. Most other things in the model remain the same as the previous simulation. The model is used to compute unit level risk (probability) and assesses decisions based on risk difference for the intervention comparisons by domain.

For the simulations, we have a single, multivariable logistic regression model with a linear predictor that incorporates all domains and is specified as follows:

\[ \begin{aligned} Y &\sim \text{Binomial}(n, p) \\ \text{logit}(p) &= \mu + \beta_{s} + \beta_{j} + \beta_{p} + \\ \quad& \beta_{d1[k_{d1}, s]} + \beta_{d2[k_{d2}]} + \beta_{d3[k_{d3}]} + \beta_{d4[k_{d4}]} \end{aligned} \]

for each distinct covariate grouping where:

\(\mu\) represents an overall reference level from which all covariates deviate
\(\beta_{s}\) represents the silo deviations, which can be thought of as a seriousness of disease adjustment. We fix the first parameter (early) in this vector to zero for identifiability and the components are for early, late and chronic silo membership.
\(\beta_{j}\) represents the joint deviations, accounting for heterogeneity in outcome due to site of infection. Again, we fix the first parameter (knee) in this vector to zero for identifiability and the components are knee and hip.
\(\beta_{p}\) represents the (pre-revealed) preference adjustment, assuming revision was allocated but included irrespective of silo and what the assignment ultimately turned out to be. This accounts for heterogeneity in outcome due to clinical preference for revision type, which can be thought of as expert elicitation on aspects of the patient state and clinicial experience. Again, we fix the first parameter (preference one-stage revision) in this vector to zero for identifiability and the components are preference for one-stage or two-stage. Assuming revision is assigned, we would expect that the original preference would ultimately align with the type of revision received, but there is nothing enforcing this. Arguably, this could be restricted to the late-acute cohort, but we would need to extend the vector of parameters to accommodate for this. The preference indicators are also used to compute the sample weights for aggregating one and two-stage revision into a single overall revision effect.
\(\beta_{d1[k_{d1}, s]}\) represents the silo-specific deviations associated with the surgery type. This accounts for heterogeneity in outcome due to surgery type and with the late-acute deviations taken as a randomised comparison of dair vs revision after the agreed weighting is applied. Again, we fix the first parameter (dair in the early cohort) in this vector to zero for identifiability and the components are dair, rev(1), rev(2) for each silo. The reason for the silo-specific context is that if revision effects exist in one silo but not another, then without this conditioning, we will end up with biased inference. For example, if (for whatever reason) there is an revision effect in the early domain but not the late-acute cohort then without this level of adjustment, we would end up reporting a revision effect when we shouldn’t be; adjustment for silo doesn’t protect us from this, even though it is perfectly collinear with a unit receiving non-randomised or randomised surgical treatment.
\(\beta_{d2[k_{d2}]}\) represents the deviations associated with backbone antibiotic duration. This accounts for heterogeneity in outcome due to the assigned backbone antibiotic duration. The first parameter (non-randomised treatment) in this vector is set to zero for identifiability and the components are non-randomised, 12 weeks and 6 weeks. The term is included in the model irrespective of what surgery type was received but only units receiving randomised and revealed one-stage are captured within the 12 week/6 week comparison. The other units contribute to the non-randomised set (which is essentially a nuissance variable that we don’t particularly care about). We are assuming that there is no silo-specific (or any other) hetereogeneity for this comparison.
\(\beta_{d3[k_{d3}]}\) represents the deviations associated with extended prophylaxis duration. This accounts for heterogeneity in outcome due to the assigned extended prophylaxis duration. The first parameter (non-randomised treatment) in this vector is set to zero for identifiability and the components are non-randomised, no ext-proph and 12 weeks. The term is included in the model irrespective of what surgery type was received but only units receiving randomised and revealed two-stage are captured within the none/12 week comparison. The other units contribute to the non-randomised set (which again is a nuissance variable that we don’t particularly care about). We are assuming that there is no silo-specific hetereogeneity for this comparison.
\(\beta_{d4[k_{d4}]}\) represents the deviations associated with antibiotic choice. This accounts for heterogeneity in outcome due to the assigned antibiotic choice. The first parameter (non-randomised treatment) in this vector is set to zero for identifiability and the components are non-randomised, no rifampicin and rifampicin. The term is included for all units for which rifampicin may be indicated. The other units contribute to the non-randomised set. We are assuming that there is no silo-specific hetereogeneity for this comparison.

The simulation model isn’t particularly complicated but the way that terms enter the model is convoluted and understanding the dependency implications and consequently interpretations is fairly challenging. For the actual trial analysis, the model will be revised to the equivalent Bernoulli likelihood on unit level data and also extended to account for site, randomisation period and prognostic covariates. Bar the surgical domain, for which ‘by silo’ deviations are implicit in the existing parameterisation, no further interactions are included. However, given that heterogeneity by site of infection (joint) is of interest (primarily in the surgical domain) the analysis plan will include specification to account for this (essentially by an additional set of surgical domain parameters for which each term could be partially pooled, if that aligns with the prior belief structure).

Decision

At each interim, we assess the posterior and if a decision threshold is met, we act. For example, if a superiority decision is reached in one of the domains for which this decision type is relevant, then we consider that domain dealt with and all subsequent participants are assigned to receive the superior intervention. We can (and presently do) continue to update the posterior inference for the comparison that has stopped in subsequent interim analyses until we get to the point where all questions have been answered in all domains, at which point the trial will stop.

Superiority and non-inferiority are applicable to some domains and not others, however, we define reference and threshold values for all domains, just in case. Decisions are made with respect to the average within any `random-effect’ terms that might exist within the model.

For the superiority decision, a reference value of 0 was used and the probability thresholds were:

0.92 for surgical domain
0.95 for antibiotic duration domain
0.95 for extended prophylaxis domain
0.995 for antibiotic choice domain

There was no particular reason for the choice of these thresholds bar the fact that they lead to preferred operating characteristics and give nominal control of the type-i assertion probabilities.

For the futility decision (in relation to superiority) a reference value of 0.05 was used and the probability thresholds were:

0.3 for surgical domain
0.25 for antibiotic duration domain
0.25 for extended prophylaxis domain
0.25 for antibiotic choice domain

The above means, for example, that if the probability that the risk difference is greater than 0.05 is less than 0.3 in the surgical domain comparison, then we say the superiority goal is futile.

For the ni decision, a reference value of -0.05 was used and the probability thresholds were:

0.975 for surgical domain
0.925 for antibiotic duration domain
0.975 for extended prophylaxis domain
0.975 for antibiotic choice domain

The above means, for example, that if the probability that the risk difference is greater than -0.05 is greater than 0.925, in the antibiotic duration domain, then we will say the intervention is non-inferior.

The futility decision (in relation to non-inferiority) has a reference value of 0 and the probability thresholds were:

0.25 for surgical domain
0.1 for antibiotic duration domain
0.25 for extended prophylaxis domain
0.25 for antibiotic choice domain

This means, for example, that if the probability that the risk difference is greater than 0 is less than 0.1, in the antibiotic duration domain, then we say the non-inferiority goal is futile.

Prior

The priors were as follows:

Reference log-odds of response: logistic distribution, mean 0.7 and scale 0.7
Silo effects: normal distribution, mean 0 and scale 1
Joint effects: normal distribution, mean 0 and scale 1
Preference effects: normal distribution, mean 0 and scale 1
Treatment effects: normal distribution, mean 0 and scale 1

The prior predictive distribution is consistent with a mean probability across of response over the covariate groupings that covers the entire support on the probability scale, is centred around 0.6 and has first and third quarters at 0.4 to 0.8.

Simulation results

For this set of simulations, the number of simulated trials per scenario was 1000, the simulation label is sim06-06.

Cumulative probability of each decision type

# Cumulative probability of decisions:

# Traverse the list of simulation results and for each one summarise the 
# cumulative probability of each decision type.

i <- 1
d_tbl_1 <- data.table()

# For each scenario that was simulated
for(i in 1:length(l)){
  
  # extract the decision matrix - sim, analysis, quantity, domain level decision
  d_dec_1 <- copy(l[[i]]$d_decision)
  # config for scenario
  l_cfg <- copy(l[[i]]$cfg)
  
  # number of enrolments at each interim (interim sample size sequence)
  d_N <- data.table(analys = seq_along(l_cfg$N_pt), N = l_cfg$N_pt)
  
  # long version of decisions
  d_dec_2 <- melt(d_dec_1, 
                  id.vars = c("sim", "analys", "quant"), 
                  variable.name = "domain")

  # Should be right, but just in case...
  if(any(is.na(d_dec_2$value))){
    message("Some of the decision values are NA in index ", i, " file ", flist[i])
    d_dec_2[is.na(value), value := FALSE]
  }
  d_dec_2[, domain := as.numeric(gsub("d", "", domain))]
  
  # Domains 1, 3 and 4 will stop for superiority or futility for superiority.
  # Domaain 2 will stop for NI or futility for NI.
  # No other stopping rules apply and so we only evaluate the operating 
  # characteristics on these, i.e. we do not care about the results for the 
  # cumualative probability of ni for domain 1, 3 and 4 because we would never
  # stop for this. 
  d_dec_2 <- rbind(
    d_dec_2[domain %in% c(1, 3, 4) & quant %in% c("sup", "fut_sup")],
    d_dec_2[domain %in% c(2) & quant %in% c("ni", "fut_ni")]
  )
  
  
  # compute the cumulative instances of a decision being made by sim, each 
  # decision type and by parameter
  d_dec_2[, value := as.logical(cumsum(value)>0), keyby = .(sim, quant, domain)]
  
  d_dec_2 <- merge(d_dec_2, d_N, by = "analys")
  
  # cumulative proportion for which each decision quantity has been met by 
  # analysis and domain
  d_dec_cumprob <- d_dec_2[, .(pr_val = mean(value)), keyby = .(analys, N, quant, domain)]
  
  d_tbl_1 <- rbind(
    d_tbl_1,
    cbind(scenario = i, desc = l_cfg$desc, d_dec_cumprob)
  )

}

Expected enrolment progression

# Expected sample size

# Similar to above but focus on expected sample size

# Traverse the list of simulation results and for each one summarise the 
# sample sizes at which stopping for a domain occurs for any reason.

# All we are trying to get to is the expected sample size by domain and 
# are not really interested in what decision was made. The cumulative prob
# of each decision type is computed previously.

i <- 2
d_tbl_2 <- data.table()

for(i in 1:length(l)){
  
  # extract the decision matrix - sim, analysis, quantity, domain level decision
  d_dec_1 <- copy(l[[i]]$d_decision)
  # config for scenario
  l_cfg <- copy(l[[i]]$cfg)
  # interim looks
  d_N <- data.table(analys = seq_along(l_cfg$N_pt), N = l_cfg$N_pt)
  
  # long version of decision
  d_dec_2 <- melt(d_dec_1, 
                  id.vars = c("sim", "analys", "quant"), 
                  variable.name = "domain")
  # Should be right, but just in case...
  if(any(is.na(d_dec_2$value))){
    message("Some of the decision values are NA in index ", i, " file ", flist[i])
    d_dec_2[is.na(value), value := FALSE]
  }
  d_dec_2[, domain := as.numeric(gsub("d", "", domain))]
  d_dec_2 <- merge(d_dec_2, d_N, by = "analys")
  
  # First instance of any decision rule being hit by sim and domain.
  
  # Domains 1, 3 and 4 will stop for superiority or futility for superiority.
  # Domaain 2 will stop for NI or futility for NI.
  
  # Sometimes a decision rule will not be hit for a domain and it will continue
  # to the max sample size. We will deal with that in a minute.
  d_dec_stop <- rbind(
    d_dec_2[
      domain %in% c(1, 3, 4) & value == T & quant %in% c("sup", "fut_sup"), 
      .SD[1], keyby = .(sim, domain)],
    d_dec_2[
      domain %in% c(2) & value == T & quant %in% c("ni", "fut_ni"), 
      .SD[1], keyby = .(sim, domain)]
  )
  setnames(d_dec_stop, "N", "N_stopped")
  setnames(d_dec_stop, "value", "stopped_early")
  
  # Add in any rows for which no early stopping happened
  d_dec_stop <- merge(d_dec_stop[, .SD, .SDcols = !c("analys")], 
                      # all combinations of sim and domain
                      unique(d_dec_2[, .(sim, domain)]),
                      by = c("sim", "domain"), all.y = T)
  
  # If domain or trial not stopped then record as having run to the 
  # maximum sample size with no decision made.
  d_dec_stop[is.na(N_stopped), N_stopped := max(d_N$N)]
  d_dec_stop[is.na(stopped_early), stopped_early := F]
  # So now we know where every domain was stopped and the reason for stopping
  # within each sim. Great.
  d_dec_stop[is.na(quant), quant := "null"]
  
  d_tbl_2 <- rbind(
    d_tbl_2,
    cbind(
      scenario = i, 
      desc = l_cfg$desc, 
      d_dec_stop
      )
  )

}

Distributions of posterior means (unconditional)

# Distribution of posterior means for parameters of interest.

# Some simulated trials will have stopped prior to the maximum sample size and
# these will have NA for their posterior means. If you were to summarise these 
# posterior means, they would be conditional on the trial having 'survived' 
# until the relevant interim. This means that you have missing data at later 
# interims, which creates a selection bias in that your selection of sims at any
# given interim are not a random sample, but rather a sample conditioned on the 
# stopping rules. 

# If you do not account for this in some way then a summary can be either 
# optimistic or pessimistic depending on how the stopping rules interact 
# with the data. Here we try to account for this missingness by imputing the 
# missing posterior means with locf within each simulation.
# Note that this is really only a partial 'fix' to get a sense of whether 
# our estimates is representative of the parameter values we used to simulate
# the data.

i <- 1
d_tbl_3 <- data.table()

for(i in 1:length(l)){
  
  # config for scenario
  l_cfg <- copy(l[[i]]$cfg)
  # params
  d_pars <- copy(l[[i]]$d_post_smry_1)
  d_pars <- d_pars[par %in% c("lor", "rd")]
  
  # interim looks
  d_N <- data.table(analys = seq_along(l_cfg$N_pt), N = l_cfg$N_pt)
  
  d_pars <- dcast(d_pars, sim + id_analys + domain ~ par, value.var = c("mu", "se"))
  
  # locf
  d_pars[, `:=`(mu_lor = nafill(mu_lor, type = "locf"),
                mu_rd = nafill(mu_rd, type = "locf"),
                se_lor = nafill(se_lor, type = "locf"),
                se_rd = nafill(se_rd, type = "locf")
                ), 
         keyby = .(sim, domain)]
  setnames(d_pars, "id_analys", "analys")
  #
  
  d_pars <- merge(d_pars, d_N, by = "analys")
  
  d_tbl_3 <- rbind(
    d_tbl_3,
    cbind(
      scenario = i, desc = l_cfg$desc,
      d_pars[, .(analys, sim, domain, N, mu_lor, mu_rd, se_lor, se_rd)]
      )
  )

}

Number of participants for each randomised comparison over time

# 

i <- 1
d_N_by_arm <- data.table()

for(i in 1:length(l)){
  
  # extract the decision matrix - sim, analysis, quantity, domain level decision
  d_dec_1 <- copy(l[[i]]$d_decision)
  # long version of decisions
  d_dec_2 <- melt(
    d_dec_1, id.vars = c("sim", "analys", "quant"), variable.name = "domain")
  # Should be right, but just in case...
  if(any(is.na(d_dec_2$value))){
    message("Some of the decision values are NA in index ", i, " file ", flist[i])
    d_dec_2[is.na(value), value := FALSE]
  }
  d_dec_2[, domain := as.numeric(gsub("d", "", domain))]
  # Domains 1, 3 and 4 will stop for superiority or futility for superiority.
  # Domaain 2 will stop for NI or futility for NI.
  # No other stopping rules apply and so we only evaluate the operating 
  # characteristics on these, i.e. we do not care about the results for the 
  # cumualative probability of ni for domain 1, 3 and 4 because we would never
  # stop for this. 
  d_dec_2 <- rbind(
    d_dec_2[domain %in% c(1, 3, 4) & quant %in% c("sup", "fut_sup")],
    d_dec_2[domain %in% c(2) & quant %in% c("ni", "fut_ni")]
  )
  
  # config for scenario
  l_cfg <- copy(l[[i]]$cfg)
  # interim looks
  d_enrolment <- data.table(analys = seq_along(l_cfg$N_pt), N_enrol = l_cfg$N_pt)
  # observed trial data sets
  d_tru <- copy(l[[i]]$d_all)
  setnames(d_tru, "id_analys", "analys")
  
  
  # late acute, surgical arms
  # *** silo isn't included in the keyby because we want to average across 
  # the entire trial population, not silo specific sample sizes.
  d_1_tmp <- d_tru[
    silo == 2, .(N = sum(N), domain = 1), keyby = .(sim, analys, d1)]
  d_1_tmp[, N := cumsum(N), keyby = .(sim, d1)]
  setnames(d_1_tmp, old = "d1", "arm")
  d_1_tmp <- merge(
    d_1_tmp, 
    CJ(sim = 1:max(d_tru$sim), analys = 1:5, arm = 1:3, domain = 1), 
    by = c("sim", "analys", "arm", "domain"),
    all.y = T)
  d_1_tmp[, N := nafill(N, type = "locf"), keyby = .(sim, arm)]
  setkey(d_1_tmp, sim, analys, arm)
  
  # one-stage revision, ab duration randomised arms
  d_2_tmp <- d_tru[
    d1 == 2 & d2 %in% 2:3, 
    .(N = sum(N), domain = 2), keyby = .(sim, analys, d2)]
  d_2_tmp[, N := cumsum(N), keyby = .(sim, d2)]
  setnames(d_2_tmp, old = "d2", "arm")
  d_2_tmp <- merge(
    d_2_tmp, 
    CJ(sim = 1:max(d_tru$sim), analys = 1:5, arm = 2:3, domain = 2), 
    by = c("sim", "analys", "arm", "domain"),
    all.y = T)
  d_2_tmp[analys == 1 & is.na(N), N := 0]
  d_2_tmp[, N := nafill(N, type = "locf"), keyby = .(sim, arm)]
  setkey(d_2_tmp, sim, analys, arm)
  
  # two-stage revision, extended prophy randomised arms
  d_3_tmp <- d_tru[
    d1 == 3 & d3 %in% 2:3, 
    .(N = sum(N), domain = 3), keyby = .(sim, analys, d3)]
  d_3_tmp[, N := cumsum(N), keyby = .(sim, d3)]
  setnames(d_3_tmp, old = "d3", "arm")
  d_3_tmp <- merge(
    d_3_tmp, 
    CJ(sim = 1:max(d_tru$sim), analys = 1:5, arm = 2:3, domain = 3), 
    by = c("sim", "analys", "arm", "domain"),
    all.y = T)
  d_3_tmp[analys == 1 & is.na(N), N := 0]
  d_3_tmp[, N := nafill(N, type = "locf"), keyby = .(sim, arm)]
  setkey(d_3_tmp, sim, analys, arm)
  
  # ab choice randomised arms
  d_4_tmp <- d_tru[
    d4 %in% 2:3, .(N = sum(N), domain = 4), keyby = .(sim, analys, d4)]
  d_4_tmp[, N := cumsum(N), keyby = .(sim, d4)]
  setnames(d_4_tmp, old = "d4", "arm")
  d_4_tmp <- merge(
    d_4_tmp, 
    CJ(sim = 1:max(d_tru$sim), analys = 1:5, arm = 2:3, domain = 4), 
    by = c("sim", "analys", "arm", "domain"),
    all.y = T)
  d_4_tmp[analys == 1 & is.na(N), N := 0]
  d_4_tmp[, N := nafill(N, type = "locf"), keyby = .(sim, arm)]
  setkey(d_4_tmp, sim, analys, arm)
  
  d_arm_tmp <- rbind(d_1_tmp, d_2_tmp, d_3_tmp, d_4_tmp)

  
  d_arm_tmp[, `:=`(
    scenario = i, desc = l_cfg$desc
  )]
  
  d_arm_tmp <- merge(
    d_arm_tmp,
    d_enrolment,
    by = "analys")
  
  d_N_by_arm <- rbind(d_N_by_arm, d_arm_tmp)

}

Summaries of empirical probability of treatment success

i <- 1
d_tbl_4 <- data.table()

for(i in 1:length(l)){
  
  # extract the decision matrix - sim, analysis, quantity, domain level decision
  d_dec_1 <- copy(l[[i]]$d_decision)
  # config for scenario
  l_cfg <- copy(l[[i]]$cfg)
  # interim looks
  d_enrolment <- data.table(analys = seq_along(l_cfg$N_pt), N_enrol = l_cfg$N_pt)
  # observed data
  d_all <- copy(l[[i]]$d_all)
  setnames(d_all, "id_analys", "analys")
  
  d_all <- merge(d_all, d_enrolment , by = "analys")
  # long version of decision
  d_dec_2 <- melt(
    d_dec_1, id.vars = c("sim", "analys", "quant"), variable.name = "domain")
  # Should be right, but just in case...
  if(any(is.na(d_dec_2$value))){
    message("Some of the decision values are NA in index ", i, " file ", flist[i])
    d_dec_2[is.na(value), value := FALSE]
  }
  d_dec_2[, domain := as.numeric(gsub("d", "", domain))]
  d_dec_2 <- merge(d_dec_2, d_N, by = "analys")
  
  # First instance of any decision rule being hit by sim and domain.
  
  # Domains 1, 3 and 4 will stop for superiority or futility for superiority.
  # Domaain 2 will stop for NI or futility for NI.
  
  # Sometimes a decision rule will not be hit for a domain and it will continue
  # to the max sample size. We will deal with that in a minute.
  d_dec_stop <- rbind(
    d_dec_2[
      domain %in% c(1, 3, 4) & value == T & quant %in% c("sup", "fut_sup"), 
      .SD[1], keyby = .(sim, domain)],
    d_dec_2[
      domain %in% c(2) & value == T & quant %in% c("ni", "fut_ni"), 
      .SD[1], keyby = .(sim, domain)]
  )
  setnames(d_dec_stop, "N", "N_stopped")
  setnames(d_dec_stop, "value", "stopped_early")
  # Add in any rows for which no early stopping happened
  d_dec_stop <- merge(
    d_dec_stop[, .SD, .SDcols = !c("analys")], 
    # all combinations of sim and domain
    unique(d_dec_2[, .(sim, domain)]),
    by = c("sim", "domain"), all.y = T)
  # If domain or trial not stopped then record as having run to the 
  # maximum sample size with no decision made.
  d_dec_stop[is.na(N_stopped), N_stopped := max(d_N$N)]
  d_dec_stop[is.na(stopped_early), stopped_early := F]
  # So now we know where every domain was stopped and the reason for stopping
  # within each sim. Great.
  d_dec_stop[is.na(quant), quant := "null"]
  
  d_domain_1 <- d_all[
    silo == 2 & d1 %in% 1:3, .(sim, analys, silo, jnt, pref_rev, arm = d1, y, N, N_enrol)]
  d_domain_1 <- merge(d_domain_1, d_dec_stop[domain == 1], by = c("sim"))
  d_domain_1 <- d_domain_1[N_enrol <= N_stopped, ]
  d_domain_1 <- d_domain_1[, .(domain = 1, y = sum(y), N = sum(N)), keyby = .(arm)]
  d_domain_1[, p_hat := y / N]
  
  d_domain_2 <- d_all[
    d1 == 2 & d2 %in% 2:3, .(sim, analys, silo, jnt, pref_rev, arm = d2, y, N, N_enrol)]
  d_domain_2 <- merge(d_domain_2, d_dec_stop[domain == 2], by = c("sim"))
  d_domain_2 <- d_domain_2[N_enrol <= N_stopped, ]
  d_domain_2 <- d_domain_2[, .(domain = 2, y = sum(y), N = sum(N)), keyby = .(arm)]
  d_domain_2[, p_hat := y / N]
  
  d_domain_3 <- d_all[
    d1 == 3 & d3 %in% 2:3, .(sim, analys, silo, jnt, pref_rev, arm = d3, y, N, N_enrol)]
  d_domain_3 <- merge(d_domain_3, d_dec_stop[domain == 3], by = c("sim"))
  d_domain_3 <- d_domain_3[N_enrol <= N_stopped, ]
  d_domain_3 <- d_domain_3[, .(domain = 3, y = sum(y), N = sum(N)), keyby = .(arm)]
  d_domain_3[, p_hat := y / N]
  
  d_domain_4 <- d_all[
    d4 %in% 2:3, .(sim, analys, silo, jnt, pref_rev, arm = d4, y, N, N_enrol)]
  d_domain_4 <- merge(d_domain_4, d_dec_stop[domain == 4], by = c("sim"))
  d_domain_4 <- d_domain_4[N_enrol <= N_stopped, ]
  d_domain_4 <- d_domain_4[, .(domain = 4, y = sum(y), N = sum(N)), keyby = .(arm)]
  d_domain_4[, p_hat := y / N]
  
  d_domain <- rbind(
    d_domain_1, d_domain_2, d_domain_3, d_domain_4
  )
  
  d_tbl_4 <- rbind(
    d_tbl_4, 
    cbind(scenario = i, desc = l_cfg$desc, d_domain)
  )
  
}

Probability of triggering decision

Table 1 shows the cumulative probability of a superiority decision across each of the scenarios simulated (the same information is shown in Figure 1). Operating characteristics are shown only for the relevant domains and the futility of a superiority decision is included in parentheses.

Superiority decision - tabulated
Superiority decision - visualisation

Code

d_tbl_1_cur <- d_tbl_1[quant %in% c("sup", "fut_sup") & domain %in% c(1, 3, 4), .SD]
setorderv(d_tbl_1_cur, cols = c("scenario", "domain", "analys", "quant"), 
          order = c(1, 1, 1, -1))
d_tbl_1_cur <- dcast(d_tbl_1_cur, scenario + desc + domain ~ quant + N, value.var = "pr_val")
d_tbl_1_cur <- d_tbl_1_cur[, .SD, .SDcols = !c("scenario")]

d_tbl_1_cur[, domain := factor(domain, 
                               levels = c(1, 3, 4), 
                               labels = c("Surgical", "AB Ext-proph", "AB Choice"))]

g_tbl <- d_tbl_1_cur |> 
  gt(groupname_col = "desc") |> 
  gt::text_transform(
    locations = cells_row_groups(),
    fn = function(x) {
      lapply(x, function(x) {
        gt::md(paste0("*", x, "*"))
      })
    }
  ) |>
  cols_align(
    columns = 1,
    align = "left"
  )  |> 
  cols_align(
    columns = 2:ncol(d_tbl_1_cur),
    align = "center"
  )  |> 
  cols_merge(
    columns = c("sup_500", "fut_sup_500"
                ),
    pattern = "<<{1}>><< ({2})>>"
  )   |> 
  cols_merge(
    columns = c("sup_1000", "fut_sup_1000"
                ),
    pattern = "<<{1}>><< ({2})>>"
  ) |> 
  cols_merge(
    columns = c("sup_1500", "fut_sup_1500"
                ),
    pattern = "<<{1}>><< ({2})>>"
  )  |> 
  cols_merge(
    columns = c("sup_2000", "fut_sup_2000"
                ),
    pattern = "<<{1}>><< ({2})>>"
  )  |> 
  cols_merge(
    columns = c("sup_2500", "fut_sup_2500"
                ),
    pattern = "<<{1}>><< ({2})>>"
  ) |>
  tab_spanner(
    label = md("Cumulative probability of **superiority** (futility) decision"),
    columns = 3:ncol(d_tbl_1_cur)
  )  |>
  cols_label(
    domain = "Domain",
    sup_500 = html("500"),
    sup_1000 = html("1000"),
    sup_1500 = html("1500"),
    sup_2000 = html("2000"),
    sup_2500 = html("2500")
  ) |>
  tab_options(
    table.font.size = "70%"
  ) |>
  fmt_number(decimals = 3, drop_trailing_zeros = TRUE)

g_tbl

Domain	Cumulative probability of superiority (futility) decision
Domain	500	1000	1500	2000	2500
Null effect in all domains +silo specific effects
Surgical	0.03 (0.603)	0.046 (0.749)	0.06 (0.807)	0.066 (0.839)	0.071 (0.849)
AB Ext-proph	0.05 (0.5)	0.086 (0.642)	0.092 (0.721)	0.103 (0.764)	0.109 (0.812)
AB Choice	0.005 (0.597)	0.012 (0.811)	0.016 (0.892)	0.017 (0.934)	0.017 (0.959)
Moderate (OR 1.75) surgical revision effect (one and two-stage) +silo specific effects
Surgical	0.453 (0.073)	0.646 (0.098)	0.736 (0.11)	0.77 (0.115)	0.791 (0.122)
AB Ext-proph	0.047 (0.505)	0.074 (0.706)	0.098 (0.801)	0.108 (0.84)	0.113 (0.865)
AB Choice	0.003 (0.61)	0.01 (0.813)	0.012 (0.883)	0.016 (0.929)	0.018 (0.951)
Moderate (OR 1.75) surgical revision effect (one-stage) +silo specific effects
Surgical	0.124 (0.35)	0.203 (0.461)	0.232 (0.502)	0.25 (0.532)	0.261 (0.554)
AB Ext-proph	0.053 (0.475)	0.082 (0.65)	0.097 (0.734)	0.108 (0.781)	0.122 (0.819)
AB Choice	0.006 (0.632)	0.007 (0.818)	0.014 (0.903)	0.015 (0.942)	0.016 (0.965)
Moderate (OR 1.75) surgical revision effect (two-stage) +silo specific effects
Surgical	0.229 (0.219)	0.345 (0.3)	0.405 (0.336)	0.446 (0.35)	0.471 (0.357)
AB Ext-proph	0.041 (0.503)	0.073 (0.674)	0.093 (0.765)	0.102 (0.821)	0.108 (0.848)
AB Choice	0.005 (0.602)	0.011 (0.81)	0.013 (0.903)	0.013 (0.939)	0.014 (0.968)
Moderate (OR 1.75) antibiotic duration 6wk effect +silo specific effects
Surgical	0.041 (0.546)	0.063 (0.723)	0.071 (0.789)	0.076 (0.816)	0.076 (0.834)
AB Ext-proph	0.051 (0.484)	0.074 (0.643)	0.089 (0.723)	0.102 (0.769)	0.111 (0.797)
AB Choice	0.002 (0.622)	0.007 (0.816)	0.01 (0.893)	0.011 (0.94)	0.012 (0.961)
Moderate (OR 1.75) antibiotic ext-proph 12wk effect +silo specific effects
Surgical	0.055 (0.547)	0.077 (0.706)	0.092 (0.771)	0.097 (0.808)	0.1 (0.819)
AB Ext-proph	0.396 (0.056)	0.62 (0.075)	0.735 (0.083)	0.796 (0.091)	0.842 (0.093)
AB Choice	0.009 (0.629)	0.011 (0.814)	0.011 (0.889)	0.015 (0.942)	0.016 (0.958)
Moderate (OR 1.75) antibiotic choice rifampacin effect +silo specific effects
Surgical	0.035 (0.613)	0.049 (0.756)	0.057 (0.808)	0.06 (0.848)	0.066 (0.862)
AB Ext-proph	0.044 (0.52)	0.062 (0.657)	0.082 (0.729)	0.092 (0.777)	0.104 (0.817)
AB Choice	0.317 (0.031)	0.688 (0.036)	0.883 (0.038)	0.945 (0.038)	0.957 (0.038)
Moderate (OR 1.75) effects in all domains +silo specific effects
Surgical	0.51 (0.072)	0.692 (0.089)	0.758 (0.099)	0.777 (0.101)	0.793 (0.102)
AB Ext-proph	0.339 (0.109)	0.654 (0.129)	0.798 (0.137)	0.844 (0.139)	0.858 (0.14)
AB Choice	0.278 (0.045)	0.65 (0.051)	0.836 (0.053)	0.907 (0.056)	0.936 (0.056)

Table 1: Cumulative probability of superiority (futility in parentheses) decision at each interim (shown by total enrolment by interim)

Code

d_fig <- d_tbl_1[quant %in% c("sup", "fut_sup") & domain %in% c(1, 3, 4), .SD]
setorderv(d_fig, cols = c("scenario", "domain", "analys", "quant"), 
          order = c(1, 1, 1, -1))

d_fig[, domain := factor(domain, 
                         levels = c(1, 3, 4), 
                         labels = c("Surgical", "AB Ext-proph", "AB Choice"))]

d_fig[, quant := factor(quant, 
                        levels = c("sup", "fut_sup"), 
                        labels = c("Superiority", "Futility"))]

d_fig[, desc := factor(desc, 
                        levels = unique(d_fig$desc), 
                        labels = unique(d_fig$desc))]


ggplot(d_fig, aes(x = N, y = pr_val, group = quant, col = quant)) +
  geom_line(lwd = 0.25) +
  scale_y_continuous("") +
  scale_color_discrete("") +
  # facet_grid(desc ~ domain) + 
  # facet_grid2(desc ~ domain, render_empty = FALSE)
  ggh4x::facet_grid2(desc ~ domain , 
             labeller = labeller(desc = label_wrap_gen(35)), 
             scales = "free",
             axes = "y",
             independent = "y") +
  # theme_minimal() +
  theme(
    legend.position = "bottom",
    strip.text.y.right = element_text(angle = 0,
                                      hjust = 0,
                                      vjust = 0.2,
                                      size = 4),
    axis.ticks = element_blank(),
    strip.text.x.top = element_text(size = 4),
    panel.grid.minor = element_blank(),
    panel.grid.major = element_line(color = "grey",
                                  linewidth = 0.1,
                                  linetype = 1),
    axis.text.y = element_text(size = 4), 
    axis.text.x = element_text(size = 4), 
    axis.title.x = element_text(size = 5),
    legend.text = element_text(size = 4)
  )

Figure 1: Cumulative probabilities for superiority assessments

Table 2 shows the cumulative probability of a non-inferiority decision with futility shown in parentheses (the same information is shown in Figure 2). The results are only shown for the domains for which non-inferiority is evaluated.

The “Null effect in all domains” is actually a bit of a misnomer as the true null with regards to the NI decision would be at the NI margin, whereas the scenario refers to the setting where all effects are set to zero. Thus an inflation over the usual type-i assertion probability is to be expected.

Non-inferiority - tabulated
Non-inferiority - visualisation

Code

d_tbl_1_cur <- d_tbl_1[quant %in% c("ni", "fut_ni") & domain %in% c(2), .SD]
setorderv(d_tbl_1_cur, cols = c("scenario", "domain", "analys", "quant"), 
          order = c(1, 1, 1, -1))
d_tbl_1_cur <- dcast(d_tbl_1_cur, scenario + desc + domain ~ quant + N, value.var = "pr_val")
d_tbl_1_cur <- d_tbl_1_cur[, .SD, .SDcols = !c("scenario")]

d_tbl_1_cur[, domain := factor(domain, 
                               levels = c(2), 
                               labels = c("AB Duration"))]

g_tbl <- d_tbl_1_cur |> 
  gt(groupname_col = "desc") |> 
  gt::text_transform(
    locations = cells_row_groups(),
    fn = function(x) {
      lapply(x, function(x) {
        gt::md(paste0("*", x, "*"))
      })
    }
  ) |>
  cols_align(
    columns = 1,
    align = "left"
  )  |> 
  cols_align(
    columns = 2:ncol(d_tbl_1_cur),
    align = "center"
  )  |> 
  cols_merge(
    columns = c("ni_500", "fut_ni_500"
                ),
    pattern = "<<{1}>><< ({2})>>"
  )   |> 
  cols_merge(
    columns = c("ni_1000", "fut_ni_1000"
                ),
    pattern = "<<{1}>><< ({2})>>"
  ) |> 
  cols_merge(
    columns = c("ni_1500", "fut_ni_1500"
                ),
    pattern = "<<{1}>><< ({2})>>"
  )  |> 
  cols_merge(
    columns = c("ni_2000", "fut_ni_2000"
                ),
    pattern = "<<{1}>><< ({2})>>"
  )  |> 
  cols_merge(
    columns = c("ni_2500", "fut_ni_2500"
                ),
    pattern = "<<{1}>><< ({2})>>"
  ) |>
  tab_spanner(
    label = md("Cumulative probability of **NI** (futility) decision"),
    columns = 3:ncol(d_tbl_1_cur)
  )  |>
  cols_label(
    domain = "Domain",
    ni_500 = html("500"),
    ni_1000 = html("1000"),
    ni_1500 = html("1500"),
    ni_2000 = html("2000"),
    ni_2500 = html("2500")
  ) |>
  tab_options(
    table.font.size = "70%"
  ) |>
  fmt_number(decimals = 3, drop_trailing_zeros = TRUE)

g_tbl

Domain	Cumulative probability of NI (futility) decision
Domain	500	1000	1500	2000	2500
Null effect in all domains +silo specific effects
AB Duration	0.148 (0.082)	0.236 (0.122)	0.3 (0.159)	0.35 (0.181)	0.394 (0.194)
Moderate (OR 1.75) surgical revision effect (one and two-stage) +silo specific effects
AB Duration	0.153 (0.082)	0.286 (0.138)	0.386 (0.17)	0.451 (0.199)	0.504 (0.221)
Moderate (OR 1.75) surgical revision effect (one-stage) +silo specific effects
AB Duration	0.173 (0.096)	0.283 (0.141)	0.34 (0.17)	0.387 (0.187)	0.433 (0.203)
Moderate (OR 1.75) surgical revision effect (two-stage) +silo specific effects
AB Duration	0.151 (0.084)	0.261 (0.127)	0.344 (0.163)	0.406 (0.186)	0.458 (0.2)
Moderate (OR 1.75) antibiotic duration 6wk effect +silo specific effects
AB Duration	0.445 (0.008)	0.659 (0.01)	0.781 (0.012)	0.845 (0.014)	0.889 (0.015)
Moderate (OR 1.75) antibiotic ext-proph 12wk effect +silo specific effects
AB Duration	0.155 (0.088)	0.256 (0.14)	0.314 (0.163)	0.356 (0.178)	0.402 (0.199)
Moderate (OR 1.75) antibiotic choice rifampacin effect +silo specific effects
AB Duration	0.157 (0.083)	0.251 (0.135)	0.315 (0.169)	0.366 (0.191)	0.412 (0.203)
Moderate (OR 1.75) effects in all domains +silo specific effects
AB Duration	0.432 (0.016)	0.711 (0.016)	0.849 (0.018)	0.926 (0.018)	0.947 (0.018)

Table 2: Cumulative probability of NI (futility in parentheses) decision at each interim (shown by total enrolment by interim)

Code

d_fig <- d_tbl_1[quant %in% c("ni", "fut_ni") & domain %in% c(2), .SD]
setorderv(d_fig, cols = c("scenario", "domain", "analys", "quant"), 
          order = c(1, 1, 1, -1))

d_fig[, domain := factor(domain, 
                               levels = c(2), 
                               labels = c("AB Duration"))]

d_fig[, quant := factor(quant, 
                        levels = c("ni", "fut_ni"), 
                        labels = c("NI", "Futility for NI"))]

d_fig[, desc := factor(desc, 
                        levels = unique(d_fig$desc), 
                        labels = unique(d_fig$desc))]


ggplot(d_fig, aes(x = N, y = pr_val, group = quant, col = quant)) +
  geom_line(lwd = 0.25) +
  scale_y_continuous("", breaks = seq(0, 1, by = 0.2)) +
  scale_color_discrete("") +
  # facet_grid(desc ~ domain, space = "free") +
  ggh4x::facet_grid2(desc ~ domain , 
             labeller = labeller(desc = label_wrap_gen(35)), 
             scales = "free",
             axes = "y",
             independent = "y") +
  
  ggh4x::force_panelsizes(cols = unit(c(4), "cm"), TRUE) +
  # facet_manual(
  #   . ~ desc, design = matrix(1:17, ncol = 1),
  #   widths = unit(3, "cm")
  # ) +
  # theme_minimal() +
  theme(
    legend.position = "bottom",
    strip.text.y.right = element_text(angle = 0,
                                      hjust = 0,
                                      vjust = 0.2,
                                      size = 4),
    axis.ticks = element_blank(),
    strip.text.x.top = element_text(size = 4),
    panel.grid.minor = element_blank(),
    panel.grid.major = element_line(color = "grey",
                                  linewidth = 0.1,
                                  linetype = 1),
    axis.text.y = element_text(size = 4), 
    axis.text.x = element_text(size = 4), 
    axis.title.x = element_text(size = 5),
    legend.text = element_text(size = 4)
  )

Figure 2: Cumulative probabilities for NI assessments

Sample sizes

Table 3 shows the number of enrolments when any stopping decision is made (or for reaching the maximum 2500 sample size). This is the total number of enrolments that are expected to have occurred, not how many are contributing to the inference in a given domain.

Figure 3 shows the number of participants entering into each of the randomised comparisons by domain and scenario.

The expected cumulative sample sizes are calculated by extracting the number of participants entering into each randomised comparison (i.e. restricted to the relevant strata) and taking the cumulative sum of these over time. For example, the domain 1 (surgical) expected values are based on the participants in the late acute silo that receive randomised surgical intervention. Similarly, the domain 2 (antibiotic duration) expected values are based on the participants across all silos that received one-stage revision and then receive one of the randomised treatment allocations (i.e. are not in the non-randomised treatment group for any reason).

If a decision was made in a domain, then subsequent enrolments would be assigned to the relevant arm and so the allocation would be expected to deviate away from 1:1. For example, if a superiority decision was made for domain 1 (and the trial was still ongoing) then all subsequent participants are assigned to the superior intervention. Finally, if a decision was made for all research questions, the trial stops early. To avoid any bias in the calculation, we use LOCF to propagate the expectations forward through to the maximum number of enrolments.

Enrolments to stopping
Sample size of randomised comparisons

Code

d_tbl_2_cur <- d_tbl_2[, .(N_mu = mean(N_stopped)), keyby = .(scenario, desc, domain)]
d_tbl_2_cur <- dcast(d_tbl_2_cur, scenario + desc ~ domain, value.var = "N_mu")
d_tbl_2_cur <- d_tbl_2_cur[, .SD, .SDcols = !c("scenario")]

g_tbl <- d_tbl_2_cur |> 
  gt(groupname_col = "desc") |> 
  gt::text_transform(
    locations = cells_row_groups(),
    fn = function(x) {
      lapply(x, function(x) {
        gt::md(paste0("*", x, "*"))
      })
    }
  ) |>
  cols_align(
    columns = 1,
    align = "left"
  )  |> 
  cols_align(
    columns = 2:ncol(d_tbl_2_cur),
    align = "center"
  )  |> 
  tab_spanner(
    label = html("Expected number of total enrolments to hit stopping rule by domain"),
    columns = 2:ncol(d_tbl_2_cur)
  )  |>
  cols_label(
    `1` = "Surgical",
    `2` = "AB Duration",
    `3` = "AB Ext-proph",
    `4` = "AB choice"
  ) |>
  tab_options(
    table.font.size = "70%"
  ) |>
  fmt_number(decimals = 0, drop_trailing_zeros = TRUE)

g_tbl

Expected number of total enrolments to hit stopping rule by domain
Surgical	AB Duration	AB Ext-proph	AB choice
Null effect in all domains +silo specific effects
906	1,711	1,024	859
Moderate (OR 1.75) surgical revision effect (one and two-stage) +silo specific effects
1,004	1,568	912	862
Moderate (OR 1.75) surgical revision effect (one-stage) +silo specific effects
1,179	1,614	1,012	832
Moderate (OR 1.75) surgical revision effect (two-stage) +silo specific effects
1,198	1,640	971	852
Moderate (OR 1.75) antibiotic duration 6wk effect +silo specific effects
938	1,114	1,034	850
Moderate (OR 1.75) antibiotic ext-proph 12wk effect +silo specific effects
927	1,675	1,076	842
Moderate (OR 1.75) antibiotic choice rifampacin effect +silo specific effects
888	1,666	1,020	1,012
Moderate (OR 1.75) effects in all domains +silo specific effects
954	1,007	938	1,062

Table 3: Expected number of enrolments to hit any stopping rule (including reaching maximum sample size)

Code

d_fig_1 <- d_N_by_arm[, .(mu_N = mean(N)), keyby = .(desc, domain, arm, N_enrol)]
d_fig_1[, desc := factor(
  desc,
  levels = unique(d_N_by_arm$desc))]
d_fig_1[
  , domain := factor(
    domain,
    levels = 1:4,
    labels = c("Surgical", "AB Duration", "AB Ext-proph", "AB Choice"))]

d_fig_2 <- d_N_by_arm[
  domain == 1 & arm %in% 2:3, 
  .(N = sum(N), arm = 4), 
  keyby = .(analys, sim, domain, scenario, desc, N_enrol)]
d_fig_2 <- d_fig_2[, .(mu_N = mean(N)), keyby = .(desc, domain, arm, N_enrol)]

d_fig_2[, desc := factor(
  desc,
  levels = unique(d_N_by_arm$desc))]
d_fig_2[
  , domain := factor(
    domain,
    levels = 1:4,
    labels = c("Surgical", "AB Duration", "AB Ext-proph", "AB Choice"))]
  
d_fig_1[, arm := factor(arm)]
ggplot(d_fig_1, aes(x = N_enrol, y = mu_N, col = arm)) +
  geom_line(lwd = 0.2) +
  geom_point(size = 0.4) +
  geom_line(
    data = d_fig_2, 
    aes(x = N_enrol, y = mu_N), lwd = 0.2, lty = 2, inherit.aes = F) +
  scale_x_continuous("") +
  scale_y_continuous("Expected number of participants") +
  scale_color_discrete("Treatment arm: ") +
  ggh4x::facet_grid2(
    desc ~ domain, 
    labeller = labeller(desc = label_wrap_gen(35)), 
    scales = "free_y", independent = "y") +
  # theme_minimal() +
  theme(
    legend.position = "bottom",
    strip.text.y.right = element_text(angle = 0,
                                      hjust = 0,
                                      vjust = 0.2,
                                      size = 4),
    axis.ticks = element_blank(),
    strip.text.x.top = element_text(size = 4),
    panel.grid.minor = element_blank(),
    panel.grid.major = element_line(color = "grey",
                                  linewidth = 0.1,
                                  linetype = 1),
    axis.title.y=element_text(size = 5),
    axis.text.y = element_text(size = 4), 
    axis.text.x = element_text(size = 4), 
    axis.title.x = element_text(size = 5),
    legend.text = element_text(size = 4)
  )

Figure 3: Expected number of participants on domain arms

Parameter estimation

Figure 4 shows the median value and 95% quantiles from the posterior mean odds ratios obtained from the simulations by each domain and scenario. The estimates are unconditional, in that they ignore the stopping rule and propagate estimates forward with LOCF so that we are working from a random sample (albeit with static imputation) rather than a dependent sample of simulations.

Figure 5 shows the median value and 95% quantiles for the posterior mean risk differences obtained from the simulations for each domain and scenario. The transformation to a risk difference suggests that the odds ratios amount to effects in the order of 10-20% on the risk scale, dependent on the silo, domain etc.

The AB duration domain has high variance in the distribution of posterior means that we anticipate to observe. That is, when there is no true effect, we might still see posterior means as large as \(\pm 0.2\) on the absolute risk scale.

Treatment effects - odds ratios
Treatment effects - risk difference

Code

# ggplot2::theme_update(text = element_text(size = 8))
# ggplot2::theme_update(legend.position = "bottom")
# # ggplot2::theme_update(legend.title = element_blank())
# ggplot2::theme_update(axis.text.x = element_text(size = 8))
# ggplot2::theme_update(axis.text.y = element_text(size = 8))

d_fig <- d_tbl_3[,
                 .(or = median(exp(mu_lor)),
                   q_025 = quantile(exp(mu_lor), prob = 0.025),
                   q_975 = quantile(exp(mu_lor), prob = 0.975)), 
                 keyby = .(scenario, desc, analys, domain, N)]
# setorderv(d_fig, cols = "scenario", order = -1L)
d_fig[, desc := factor(desc, levels = unique(d_fig$desc))]
d_fig[, domain := factor(domain, 
                         levels = 1:4, 
                         labels = c("Surgical", "AB Duration", "AB Ext-proph", "AB Choice"))]

# d_tbl_3[scenario == 8, range(lor)]

ggplot(data = d_fig,  
       aes(x = N, y = or)) +
  geom_line(lwd = 0.25) +
  geom_errorbar(aes(ymin = q_025, ymax = q_975), lwd = 0.25, width = 50) +
  scale_x_continuous("") +
  scale_y_continuous("OR") +
  ggh4x::facet_grid2(desc ~ domain , 
             labeller = labeller(desc = label_wrap_gen(35)), 
             scales = "free",
             axes = "y",
             independent = "y") +
  # theme_minimal() +
  theme(text = element_text(size = 6),
        strip.text.y.right = element_text(angle = 0,
                                      hjust = 0,
                                      vjust = 0.2,
                                      size = 4),
        strip.text.x = element_text(angle = 0, size = 5),
        axis.ticks = element_blank(),
        axis.text.x = element_text(angle = 45, vjust = 1, hjust=1, size = 5),
        axis.text.y = element_text(size = 5),
        panel.grid.minor = element_blank(),
        panel.grid.major = element_line(color = "grey",
                                  linewidth = 0.1,
                                  linetype = 1))

Figure 4: Median value of posterior means for odds-ratio treatment effects by domain and simulation scenario

Code

d_fig <- d_tbl_3[,
                 .(rd = median(mu_rd),
                   q_025 = quantile(mu_rd, prob = 0.025),
                   q_975 = quantile(mu_rd, prob = 0.975)), 
                 keyby = .(scenario, desc, analys, domain, N)]
# setorderv(d_fig, cols = "scenario", order = -1L)
d_fig[, desc := factor(desc, levels = unique(d_fig$desc))]
d_fig[, domain := factor(domain, 
                         levels = 1:4, 
                         labels = c("Surgical", "AB Duration", "AB Ext-proph", "AB Choice"))]

# d_tbl_3[scenario == 8, range(lor)]

ggplot(data = d_fig,  
       aes(x = N, y = rd)) +
  geom_line(lwd = 0.25) +
  geom_errorbar(aes(ymin = q_025, ymax = q_975), lwd = 0.25, width = 50) +
  scale_x_continuous("") +
  scale_y_continuous("Risk difference") +
  ggh4x::facet_grid2(desc ~ domain , 
             labeller = labeller(desc = label_wrap_gen(35)), 
             scales = "free",
             axes = "y",
             independent = "y") +
  # theme_minimal() +
  theme(text = element_text(size = 6),
        strip.text.y.right = element_text(angle = 0,
                                      hjust = 0,
                                      vjust = 0.2,
                                      size = 4),
        strip.text.x = element_text(angle = 0, size = 5),
        axis.ticks = element_blank(),
        axis.text.x = element_text(angle = 45, vjust = 1, hjust=1, size = 5),
        axis.text.y = element_text(size = 5),
        panel.grid.minor = element_blank(),
        panel.grid.major = element_line(color = "grey",
                                  linewidth = 0.1,
                                  linetype = 1))

Figure 5: Median value of posterior means for risk difference treatment effects by domain and simulation scenario

Proportion with treatment success

Table 4 shows the empirical proportion having treatment success within the randomised comparisons subsets averaged over the simulations. For example, the values for the surgical domain shows the empirical proportion having treatment success within the late acute silo by treatment arm (dair, rev(1), rev(2)).

Code

d_tbl_4_cur <- dcast(d_tbl_4, scenario + desc + domain ~ arm, value.var = "p_hat")
d_tbl_4_cur[, domain := factor(
  domain, 
  levels = c(1, 2, 3, 4), 
  labels = c("Surgical", "AB Duration",  "AB Ext-proph", "AB Choice"))]
d_tbl_4_cur <- d_tbl_4_cur[, .(desc, domain, `1`, `2`, `3`)]

g_tbl <- d_tbl_4_cur |> 
  gt(groupname_col = "desc") |> 
  gt::text_transform(
    locations = cells_row_groups(),
    fn = function(x) {
      lapply(x, function(x) {
        gt::md(paste0("*", x, "*"))
      })
    }
  ) |>
  cols_align(
    columns = 1,
    align = "left"
  )  |> 
  cols_align(
    columns = 2:ncol(d_tbl_4_cur),
    align = "center"
  )  |> 
  cols_label(
    domain = "Domain"
  )  |>
  tab_spanner(
    label = html("Empirical risk by domain and treatment arm"),
    columns = 2:ncol(d_tbl_4_cur)
  ) |>
  tab_options(
    table.font.size = "70%"
  ) |>
  fmt_number(decimals = 2, drop_trailing_zeros = TRUE) |>
  sub_missing(
    columns = everything(),
    rows = everything(),
    missing_text = "-"
  )

g_tbl

Empirical risk by domain and treatment arm
Domain	1	2	3
Null effect in all domains +silo specific effects
Surgical	0.59	0.62	0.58
AB Duration	-	0.74	0.74
AB Ext-proph	-	0.68	0.68
AB Choice	-	0.66	0.66
Moderate (OR 1.75) surgical revision effect (one and two-stage) +silo specific effects
Surgical	0.57	0.72	0.68
AB Duration	-	0.76	0.76
AB Ext-proph	-	0.72	0.72
AB Choice	-	0.69	0.69
Moderate (OR 1.75) surgical revision effect (one-stage) +silo specific effects
Surgical	0.57	0.72	0.56
AB Duration	-	0.77	0.77
AB Ext-proph	-	0.66	0.66
AB Choice	-	0.66	0.66
Moderate (OR 1.75) surgical revision effect (two-stage) +silo specific effects
Surgical	0.57	0.6	0.68
AB Duration	-	0.7	0.7
AB Ext-proph	-	0.72	0.72
AB Choice	-	0.67	0.67
Moderate (OR 1.75) antibiotic duration 6wk effect +silo specific effects
Surgical	0.59	0.67	0.58
AB Duration	-	0.71	0.81
AB Ext-proph	-	0.68	0.68
AB Choice	-	0.67	0.67
Moderate (OR 1.75) antibiotic ext-proph 12wk effect +silo specific effects
Surgical	0.59	0.62	0.63
AB Duration	-	0.74	0.74
AB Ext-proph	-	0.66	0.77
AB Choice	-	0.67	0.67
Moderate (OR 1.75) antibiotic choice rifampacin effect +silo specific effects
Surgical	0.63	0.65	0.61
AB Duration	-	0.77	0.77
AB Ext-proph	-	0.71	0.71
AB Choice	-	0.64	0.75
Moderate (OR 1.75) effects in all domains +silo specific effects
Surgical	0.6	0.78	0.75
AB Duration	-	0.77	0.85
AB Ext-proph	-	0.72	0.82
AB Choice	-	0.69	0.79

Table 4: Expected number of enrolments to hit any stopping rule (including reaching maximum sample size)

Fraction of uncertainty resolved

Figure 6 shows the median value and 95% quantiles for the fraction of uncertainty resolved based on

\[ \begin{aligned} \text{Fraction \ Resolved} = 1 - \frac{Var(\beta_{post})}{Var(\beta_{pri})} \end{aligned} \]

where \(Var(\beta_{post})\) and \(Var(\beta_{pri})\) represent the variance associated with the prior and posterior belief for the relevant log-odds ratio for the treatment effects. This is basically just a way to compare the prior and posterior variance. When the posterior is based on negligible data, the variance will be similar to that of the prior and the fraction resolved will be very small. A low fraction resolved (e.g. less than 0.5) suggests that any decision that was made was done so with a substantial amount of uncertainty remaining (you didn’t move far from your prior belief) whereas values close to unity suggest that a lot of the uncertainty has been resolved.

What is obvious from the above plots is also obvious here, the decision made in the AB duration domain are subject to a substantial amount of uncertainty.

Code

l_cfg <- copy(l[[1]]$cfg)

# initial uncertainty
v0 <- (l_cfg$pri$b_trt[2])^2

d_fig <- copy(d_tbl_3)
d_fig[, fr_unc := 1 - (se_lor/v0)]

d_fig <- d_fig[,
                 .(fr_unc = median(fr_unc),
                   q_025 = quantile(fr_unc, prob = 0.025),
                   q_975 = quantile(fr_unc, prob = 0.975)), 
                 keyby = .(scenario, desc, analys, domain, N)]
# setorderv(d_fig, cols = "scenario", order = -1L)
d_fig[, desc := factor(desc, levels = unique(d_fig$desc))]
d_fig[, domain := factor(domain, 
                         levels = 1:4, 
                         labels = c("Surgical", "AB Duration", "AB Ext-proph", "AB Choice"))]

# d_tbl_3[scenario == 8, range(lor)]

ggplot(data = d_fig,  
       aes(x = N, y = fr_unc)) +
  geom_line(lwd = 0.25) +
  geom_errorbar(aes(ymin = q_025, ymax = q_975), lwd = 0.25, width = 50) +
  scale_x_continuous("") +
  scale_y_continuous("Fraction of uncertainty resolved (1-(V_post/V_pri))") +
  ggh4x::facet_grid2(desc ~ domain , 
             labeller = labeller(desc = label_wrap_gen(35)), 
             scales = "free",
             axes = "y",
             independent = "y") +
  # theme_clean() +
  # theme_minimal() +
  theme(text = element_text(size = 6),
        strip.text.y.right = element_text(angle = 0,
                                      hjust = 0,
                                      vjust = 0.2,
                                      size = 4),
        strip.text.x = element_text(angle = 0, size = 5),
        axis.ticks = element_blank(),
        axis.text.x = element_text(angle = 45, vjust = 1, hjust=1, size = 5),
        axis.text.y = element_text(size = 5),
        panel.grid.minor = element_blank(),
        panel.grid.major = element_line(color = "grey",
                                  linewidth = 0.1,
                                  linetype = 1))

Figure 6: Median values and quantiles for fraction of uncertainty resolved