Simulation results 5

Sequential design with early stopping (restricted action set) - risk based

Published

June 20, 2025

Modified

May 13, 2025

Warning: package 'cmdstanr' was built under R version 4.4.3
Load simulation results
# Each input file corresponds to the results from a single simulation
# scenario/configuration.
# Load all the files into a single list.

# files of interest
sim_lab <- "sim05-15"

flist <- list.files(paste0("data/", sim_lab), pattern = "sim05")
toks <- list()
l <- list()
i <- 1
for(i in 1:length(flist)){
  l[[i]] <- qs::qread(file.path(paste0("data/", sim_lab), flist[i]))
  toks[[i]] <-  unlist(tstrsplit(flist[i], "[-.]"))
}

Introduction

In the present design, the model is used to compute unit level risk (probability) and assesses decisions based on risk difference for the intervention comparisons by domain. Log-odds scale had limited interpretability for clinical users and is inconsistent in absolute terms across strata with varying baseline rates. It is also useful to explore what an absolute perspective on effectiveness translates to in terms of operating characteristics. Simulation parameters continue to be expressed as log-odds-ratios. This makes the simulation process simpler. Results are presented on both the odds and risk scale. Aim is to give us a better intuition and transparency on the magnitude of effects we are contemplating and determine if these are reasonable assumptions.

We have a single multivariable logistic regression model. One of the sets of parameters accounts for clinician preference for revision type in order to achieve conditional exchangeability across groups. At each interim, we assess the posterior and if a decision threshold is met, we act. For example, if a superiority decision is reached in one of the domains for which this decision type is relevant, then we consider that domain dealt with and all subsequent participants are assigned to receive the superior intervention. We continue to update the posterior for the stopped comparisons in subsequent interim analyses until we get to the point where all questions have been answered in all domains. Once all questions have been answered, the trial recruitment will stop.

The priors were as follows:

  • Reference log-odds of response: logistic distribution, mean 0.7 and scale 0.7
  • Silo effects: normal distribution, mean 0 and scale 1
  • Joint effects: normal distribution, mean 0 and scale 1
  • Preference effects: normal distribution, mean 0 and scale 1
  • Treatment effects: normal distribution, mean 0 and scale 1

Superiority and non-inferiority are applicable to some domains and not others, however, we define reference and threshold values for all domains, just in case.

For the superiority decision, a reference value of 0 was used and the probability thresholds were:

  • 0.92 for surgical domain
  • 0.95 for antibiotic duration domain
  • 0.95 for extended prophylaxis domain
  • 0.995 for antibiotic choice domain

For the futility decision (in relation to superiority) a reference value of 0.05 was used and the probability thresholds were:

  • 0.3 for surgical domain
  • 0.25 for antibiotic duration domain
  • 0.25 for extended prophylaxis domain
  • 0.25 for antibiotic choice domain

The above means, for example, that if the probability that the risk difference is greater than 0.05 is less than 0.3 in the surgical domain comparison, then we say the superiority goal is futile.

For the ni decision, a reference value of -0.05 was used and the probability thresholds were:

  • 0.975 for surgical domain
  • 0.925 for antibiotic duration domain
  • 0.975 for extended prophylaxis domain
  • 0.975 for antibiotic choice domain

The above means, for example, that if the probability that the risk difference is greater than -0.05 is greater than 0.925, in the antibiotic duration domain, then we will say the intervention is non-inferior.

The futility decision (in relation to non-inferiority) has a reference value of 0 and the probability thresholds were:

  • 0.25 for surgical domain
  • 0.1 for antibiotic duration domain
  • 0.25 for extended prophylaxis domain
  • 0.25 for antibiotic choice domain

This means, for example, that if the probability that the risk difference is greater than 0 is less than 0.1, in the antibiotic duration domain, then we say the non-inferiority goal is futile.

Figure 1 attempts to put the superiority rules into pictures based on possible scenarios for assessment of the posterior risk difference for an arbitrary domain where superiority is being assessed. The approach, reference values and thresholds apply to all domains where superiority is assessed.

Figure 1: Visualisation of decision rule scenarios for superiority

Analogously, Figure 2 puts the non-inferiority rule into pictures based on possible scenarios for assessment of the posterior risk difference for an arbitrary domain where non-inferiority is being assessed. The approach, reference values and thresholds apply to all domains.

Figure 2: Visualisation of decision rule scenarios for non-inferiority

For this set of simulations, the number of simulated trials per scenario was 1000, the simulation label is sim05-15.

Simulation results

Cumulative probability of each decision type
# Cumulative probability of decisions:

# Traverse the list of simulation results and for each one summarise the 
# cumulative probability of each decision type.

i <- 1
d_tbl_1 <- data.table()

# For each scenario that was simulated
for(i in 1:length(l)){
  
  # extract the decision matrix - sim, analysis, quantity, domain level decision
  d_dec_1 <- copy(l[[i]]$d_decision)
  # config for scenario
  l_cfg <- copy(l[[i]]$cfg)
  
  # number of enrolments at each interim (interim sample size sequence)
  d_N <- data.table(analys = seq_along(l_cfg$N_pt), N = l_cfg$N_pt)
  
  # long version of decisions
  d_dec_2 <- melt(d_dec_1, 
                  id.vars = c("sim", "analys", "quant"), 
                  variable.name = "domain")

  # Should be right, but just in case...
  if(any(is.na(d_dec_2$value))){
    message("Some of the decision values are NA in index ", i, " file ", flist[i])
    d_dec_2[is.na(value), value := FALSE]
  }
  d_dec_2[, domain := as.numeric(gsub("d", "", domain))]
  
  # Domains 1, 3 and 4 will stop for superiority or futility for superiority.
  # Domaain 2 will stop for NI or futility for NI.
  # No other stopping rules apply and so we only evaluate the operating 
  # characteristics on these, i.e. we do not care about the results for the 
  # cumualative probability of ni for domain 1, 3 and 4 because we would never
  # stop for this. 
  d_dec_2 <- rbind(
    d_dec_2[domain %in% c(1, 3, 4) & quant %in% c("sup", "fut_sup")],
    d_dec_2[domain %in% c(2) & quant %in% c("ni", "fut_ni")]
  )
  
  
  # compute the cumulative instances of a decision being made by sim, each 
  # decision type and by parameter
  d_dec_2[, value := as.logical(cumsum(value)>0), keyby = .(sim, quant, domain)]
  
  d_dec_2 <- merge(d_dec_2, d_N, by = "analys")
  
  # cumulative proportion for which each decision quantity has been met by 
  # analysis and domain
  d_dec_cumprob <- d_dec_2[, .(pr_val = mean(value)), keyby = .(analys, N, quant, domain)]
  
  
  
  d_tbl_1 <- rbind(
    d_tbl_1,
    cbind(scenario = i, desc = l_cfg$desc, d_dec_cumprob)
  )

}
Expected sample size
# Expected sample size

# Similar to above but focus on expected sample size

# Traverse the list of simulation results and for each one summarise the 
# sample sizes at which stopping for a domain occurs for any reason.

# All we are trying to get to is the expected sample size by domain and 
# are not really interested in what decision was made. The cumulative prob
# of each decision type is computed previously.

i <- 2
d_tbl_2 <- data.table()

for(i in 1:length(l)){
  
  # extract the decision matrix - sim, analysis, quantity, domain level decision
  d_dec_1 <- copy(l[[i]]$d_decision)
  # config for scenario
  l_cfg <- copy(l[[i]]$cfg)
  
  # interim looks
  d_N <- data.table(analys = seq_along(l_cfg$N_pt), N = l_cfg$N_pt)
  
  # long version of decision
  d_dec_2 <- melt(d_dec_1, 
                  id.vars = c("sim", "analys", "quant"), 
                  variable.name = "domain")

  # Should be right, but just in case...
  if(any(is.na(d_dec_2$value))){
    message("Some of the decision values are NA in index ", i, " file ", flist[i])
    d_dec_2[is.na(value), value := FALSE]
  }
  d_dec_2[, domain := as.numeric(gsub("d", "", domain))]
  d_dec_2 <- merge(d_dec_2, d_N, by = "analys")
  
  # First instance of any decision rule being hit by sim and domain.
  
  # Domains 1, 3 and 4 will stop for superiority or futility for superiority.
  # Domaain 2 will stop for NI or futility for NI.
  
  # Sometimes a decision rule will not be hit for a domain and it will continue
  # to the max sample size. We will deal with that in a minute.
  d_dec_stop <- rbind(
    d_dec_2[
      domain %in% c(1, 3, 4) & value == T & quant %in% c("sup", "fut_sup"), 
      .SD[1], keyby = .(sim, domain)],
    d_dec_2[
      domain %in% c(2) & value == T & quant %in% c("ni", "fut_ni"), 
      .SD[1], keyby = .(sim, domain)]
  )
  setnames(d_dec_stop, "N", "N_stopped")
  setnames(d_dec_stop, "value", "stopped_early")
  
  
  # Add in any rows for which no early stopping happened
  d_dec_stop <- merge(d_dec_stop[, .SD, .SDcols = !c("analys")], 
                      # all combinations of sim and domain
                      unique(d_dec_2[, .(sim, domain)]),
                      by = c("sim", "domain"), all.y = T)
  
  # If domain or trial not stopped then record as having run to the 
  # maximum sample size with no decision made.
  d_dec_stop[is.na(N_stopped), N_stopped := max(d_N$N)]
  d_dec_stop[is.na(stopped_early), stopped_early := F]
  # So now we know where every domain was stopped and the reason for stopping
  # within each sim. Great.
  d_dec_stop[is.na(quant), quant := "null"]
  
  d_tbl_2 <- rbind(
    d_tbl_2,
    cbind(
      scenario = i, 
      desc = l_cfg$desc, 
      d_dec_stop
      )
  )

}
Distributions of posterior means (unconditional)
# Distribution of posterior means for parameters of interest.

# Some simulated trials will have stopped prior to the maximum sample size and
# these will have NA for their posterior means. If you were to summarise these 
# posterior means, they would be conditional on the trial having 'survived' 
# until the relevant interim. This means that you have missing data at later 
# interims, which creates a selection bias in that your selection of sims at any
# given interim are not a random sample, but rather a sample conditioned on the 
# stopping rules. 
# If you do not account for this in some way then a summary can be either 
# optimistic or pessimistic depending on how the stopping rules interact 
# with the data. Here we try to account for this missingness by imputing the 
# missing posterior means with locf within each simulation.
# Note that this is really only a partial 'fix' to get a sense of whether 
# our estimates is representative of the parameter values we used to simulate
# the data.

i <- 2
d_tbl_3 <- data.table()

for(i in 1:length(l)){
  
  # config for scenario
  l_cfg <- copy(l[[i]]$cfg)
  # params
  d_pars <- copy(l[[i]]$d_post_smry_1)
  d_pars <- d_pars[par %in% c("lor", "rd")]
  
  # interim looks
  d_N <- data.table(analys = seq_along(l_cfg$N_pt), N = l_cfg$N_pt)
  
  d_pars <- dcast(d_pars, sim + id_analys + domain ~ par, value.var = c("mu", "se"))
  
  # locf
  d_pars[, `:=`(mu_lor = nafill(mu_lor, type = "locf"),
                mu_rd = nafill(mu_rd, type = "locf"),
                se_lor = nafill(se_lor, type = "locf"),
                se_rd = nafill(se_rd, type = "locf")
                ), 
         keyby = .(sim, domain)]
  setnames(d_pars, "id_analys", "analys")
  #
  
  d_pars <- merge(d_pars, d_N, by = "analys")
  
  d_tbl_3 <- rbind(
    d_tbl_3,
    cbind(
      scenario = i, desc = l_cfg$desc,
      d_pars[, .(analys, sim, domain, N, mu_lor, mu_rd, se_lor, se_rd)]
      )
  )

}
Distributions of posterior means (conditional)
# Distribution of posterior means for parameters of interest.

# By way of contrast to the above, here we summarise based on the stopping
# rules. That is, the posterior means that are included in the summary are only
# those that 'survived' until the relevant interim. This is simply to give you 
# a sense of how extreme the selection effects associated with stopping can be.

i <- 6
d_tbl_4 <- data.table()

for(i in 1:length(l)){
  
  # extract the decision matrix - sim, analysis, quantity, domain level decision
  d_dec_1 <- copy(l[[i]]$d_decision)
  # config for scenario
  l_cfg <- copy(l[[i]]$cfg)
  
  bd1 <- unlist(l_cfg$d1)
  bd2 <- unlist(l_cfg$d2)
  bd3 <- unlist(l_cfg$d3)
  bd4 <- unlist(l_cfg$d4)
  
  d_tru <- copy(l[[i]]$d_all)
  setnames(d_tru, "id_analys", "analys")
  
  d_tru <- dcast(
    d_tru[, .(N = sum(N)), keyby = .(sim, analys, pref_rev)],
    sim + analys ~ pref_rev, value.var = "N"
  )
  colnames(d_tru) <- c("sim", "analys", "N1", "N2")
  d_tru[, `:=`(prop_p1 = N1 / (N1 + N2), 
               prop_p2 = N2 / (N1 + N2))]
  d_tru[, db1 := as.numeric(
    (d_tru$prop_p1 * bd1[2] + d_tru$prop_p2 * bd1[3]) - bd1[1])]
  d_tru[, db2 := as.numeric(bd2[3] - bd2[2])]
  d_tru[, db3 := as.numeric(bd3[3] - bd3[2])]
  d_tru[, db4 := as.numeric(bd4[3] - bd4[2])]
  d_tru[, `:=`(
    N1 = NULL, N2 = NULL, prop_p1 = NULL, prop_p2 = NULL
  )]
  d_tru <- melt(
    d_tru,
    measure.vars = c("db1", "db2", "db3", "db4"),
    value.name = "lor_tru", variable.name = "domain"
  )
  d_tru[, domain := as.numeric(gsub("db", "", domain))]

  # params
  d_pars <- copy(l[[i]]$d_post_smry_1)
  d_pars <- d_pars[par %in% c("lor", "rd")]
  
  # interim looks
  d_N <- data.table(analys = seq_along(l_cfg$N_pt), N = l_cfg$N_pt)
  
  # long version of decision - assessment of each quantity by sim, domain and analysis
  d_dec_2 <- melt(d_dec_1, 
                  id.vars = c("sim", "analys", "quant"), 
                  variable.name = "domain")

  # Should be right, but just in case...
  if(any(is.na(d_dec_2$value))){
    message("Some of the decision values are NA in index ", i, " file ", flist[i])
    d_dec_2[is.na(value), value := FALSE]
  }
  d_dec_2[, domain := as.numeric(gsub("d", "", domain))]
  d_dec_2 <- merge(d_dec_2, d_N, by = "analys")
  
  # First instance of any decision rule being hit by sim and domain.
  
  # Domains 1, 3 and 4 will stop for superiority or futility for superiority.
  # Domaain 2 will stop for NI or futility for NI.
  
  # Sometimes a decision rule will not be hit for a domain and it will continue
  # to the max sample size. We will deal with that in a minute.
  d_dec_stop <- rbind(
    # value being set to TRUE indicates we hit a stopping run
    d_dec_2[
      domain %in% c(1, 3, 4) & value == T & quant %in% c("sup", "fut_sup"), 
      .SD[1], keyby = .(sim, domain)],
    d_dec_2[
      domain %in% c(2) & value == T & quant %in% c("ni", "fut_ni"), 
      .SD[1], keyby = .(sim, domain)]
  )
  setnames(d_dec_stop, "N", "N_stopped")
  setnames(d_dec_stop, "value", "stopped_early")
  
  # Add in any the domains where we did not stop early
  d_dec_stop <- merge(d_dec_stop[, .SD, .SDcols = !c("analys")], 
                      # all combinations of sim and domain
                      unique(d_dec_2[, .(sim, domain)]),
                      by = c("sim", "domain"), all.y = T)
  
  # If domain or trial not stopped then record as having run to the 
  # maximum sample size with no decision made.
  d_dec_stop[is.na(N_stopped), N_stopped := max(d_N$N)]
  # get rid of quant
  d_dec_stop[is.na(stopped_early), stopped_early := F]
  # So now we know where every domain was stopped for any reason 
  # within each sim. Great.
  
  # Add in the other looks at the data so that we have a complete set
  d_dec_stop <- merge(
    d_dec_stop,
    unique(d_dec_2[, .(analys, sim, domain, N)]),
    by = c("sim", "domain"),
    all.y = T
  )
  
  # Merge in the parameter estimates (posterior means)
  d_dec_stop <- merge(
    d_dec_stop, 
    dcast(d_pars, sim + id_analys + domain ~ par, value.var = "mu"),
    by.x = c("analys", "sim", "domain"),
    by.y = c("id_analys", "sim", "domain")
  )
  
  # so now drop any records for which we had stopped the trial
  d_dec_stop <- d_dec_stop[!is.na(lor)]
  
  d_dec_stop <- merge(
    d_dec_stop,
    d_tru,
    by = c("sim", "analys", "domain")
  )
  
  # d_dec_stop[quant == "sup", mean(type_m_ratio), keyby = .(analys, domain, N)]
  
  # out of all the decision that were made, i.e. those where quant is not NA
  # what proportion 
  
  d_tbl_4 <- rbind(
    d_tbl_4,
    cbind(
      scenario = i, desc = l_cfg$desc,
      d_dec_stop[, .(sim, analys, domain, quant, N, lor, rd, stopped_early, N_stopped)]
      )
  )

}
Number of participants for each randomised comparison over time
# 

i <- 1
d_N_by_arm <- data.table()

for(i in 1:length(l)){
  
  # extract the decision matrix - sim, analysis, quantity, domain level decision
  d_dec_1 <- copy(l[[i]]$d_decision)
  # long version of decisions
  d_dec_2 <- melt(
    d_dec_1, id.vars = c("sim", "analys", "quant"), variable.name = "domain")
  # Should be right, but just in case...
  if(any(is.na(d_dec_2$value))){
    message("Some of the decision values are NA in index ", i, " file ", flist[i])
    d_dec_2[is.na(value), value := FALSE]
  }
  d_dec_2[, domain := as.numeric(gsub("d", "", domain))]
  # Domains 1, 3 and 4 will stop for superiority or futility for superiority.
  # Domaain 2 will stop for NI or futility for NI.
  # No other stopping rules apply and so we only evaluate the operating 
  # characteristics on these, i.e. we do not care about the results for the 
  # cumualative probability of ni for domain 1, 3 and 4 because we would never
  # stop for this. 
  d_dec_2 <- rbind(
    d_dec_2[domain %in% c(1, 3, 4) & quant %in% c("sup", "fut_sup")],
    d_dec_2[domain %in% c(2) & quant %in% c("ni", "fut_ni")]
  )
  
  # config for scenario
  l_cfg <- copy(l[[i]]$cfg)
  # interim looks
  d_enrolment <- data.table(analys = seq_along(l_cfg$N_pt), N_enrol = l_cfg$N_pt)
  # observed trial data sets
  d_tru <- copy(l[[i]]$d_all)
  setnames(d_tru, "id_analys", "analys")
  
  # late acute, surgical arms
  # *** silo isn't included in the keyby because we want to average across 
  # the entire trial population, not silo specific sample sizes.
  d_1_tmp <- d_tru[
    silo == 2, .(N = sum(N), domain = 1), keyby = .(sim, analys, d1)]
  d_1_tmp[, N := cumsum(N), keyby = .(sim, d1)]
  setnames(d_1_tmp, old = "d1", "arm")
  d_1_tmp <- merge(
    d_1_tmp, 
    CJ(sim = 1:max(d_tru$sim), analys = 1:5, arm = 1:3, domain = 1), 
    by = c("sim", "analys", "arm", "domain"),
    all.y = T)
  d_1_tmp[, N := nafill(N, type = "locf"), keyby = .(sim, arm)]
  setkey(d_1_tmp, sim, analys, arm)
  
  # one-stage revision, ab duration randomised arms
  d_2_tmp <- d_tru[
    d1 == 2 & d2 %in% 2:3, 
    .(N = sum(N), domain = 2), keyby = .(sim, analys, d2)]
  d_2_tmp[, N := cumsum(N), keyby = .(sim, d2)]
  setnames(d_2_tmp, old = "d2", "arm")
  d_2_tmp <- merge(
    d_2_tmp, 
    CJ(sim = 1:max(d_tru$sim), analys = 1:5, arm = 2:3, domain = 2), 
    by = c("sim", "analys", "arm", "domain"),
    all.y = T)
  d_2_tmp[analys == 1 & is.na(N), N := 0]
  d_2_tmp[, N := nafill(N, type = "locf"), keyby = .(sim, arm)]
  setkey(d_2_tmp, sim, analys, arm)
  
  # two-stage revision, extended prophy randomised arms
  d_3_tmp <- d_tru[
    d1 == 3 & d3 %in% 2:3, 
    .(N = sum(N), domain = 3), keyby = .(sim, analys, d3)]
  d_3_tmp[, N := cumsum(N), keyby = .(sim, d3)]
  setnames(d_3_tmp, old = "d3", "arm")
  d_3_tmp <- merge(
    d_3_tmp, 
    CJ(sim = 1:max(d_tru$sim), analys = 1:5, arm = 2:3, domain = 3), 
    by = c("sim", "analys", "arm", "domain"),
    all.y = T)
  d_3_tmp[analys == 1 & is.na(N), N := 0]
  d_3_tmp[, N := nafill(N, type = "locf"), keyby = .(sim, arm)]
  setkey(d_3_tmp, sim, analys, arm)
  
  # ab choice randomised arms
  d_4_tmp <- d_tru[
    d4 %in% 2:3, .(N = sum(N), domain = 4), keyby = .(sim, analys, d4)]
  d_4_tmp[, N := cumsum(N), keyby = .(sim, d4)]
  setnames(d_4_tmp, old = "d4", "arm")
  d_4_tmp <- merge(
    d_4_tmp, 
    CJ(sim = 1:max(d_tru$sim), analys = 1:5, arm = 2:3, domain = 4), 
    by = c("sim", "analys", "arm", "domain"),
    all.y = T)
  d_4_tmp[analys == 1 & is.na(N), N := 0]
  d_4_tmp[, N := nafill(N, type = "locf"), keyby = .(sim, arm)]
  setkey(d_4_tmp, sim, analys, arm)
  
  d_arm_tmp <- rbind(d_1_tmp, d_2_tmp, d_3_tmp, d_4_tmp)
  
  # d_arm_tmp
  # d_dec_2[value == T, .SD[1], keyby = .(sim, analys, domain, quant)]
  # dcast(
  #   d_dec_1[
  #     sim == 5 & quant %in% c("sup", "fut_sup"), .(sim, analys, quant, d1)],
  #   sim + analys ~ quant, value.var = "d1"
  # )
  
  d_arm_tmp[, `:=`(
    scenario = i, desc = l_cfg$desc
  )]
  
  d_arm_tmp <- merge(
    d_arm_tmp,
    d_enrolment,
    by = "analys")
  
  d_N_by_arm <- rbind(d_N_by_arm, d_arm_tmp)

}

Probability of triggering decision

Table 1 shows the cumulative probability of a superiority decision across each of the scenarios simulated (the same information is shown in Figure 3). Operating characteristics are shown only for the relevant domains and the futility of a superiority decision is included in parentheses.

In the first eight scenarios, the surgical effects are simulated assuming equal sizes in all silos. In the second set of scenarios, silo specific heterogeneity is introduced. Under silo-specific heterogeneity there is inflation of the type-i assertion probability in the surgical domain.

Table 2 shows the cumulative probability of a non-inferiority decision with futility shown in parentheses (the same information is shown in Figure 4). The results are only shown for the domains for which non-inferiority is evaluated.

The “Null effect in all domains” is actually a bit of a misnomer as the true null with regards to the NI decision would be at the NI margin, whereas the scenario refers to the setting where all effects are set to zero. Thus an inflation over the usual type-i assertion probability is to be expected.

Sample size of randomised comparisons

Table 3 shows the number of enrolments when any stopping decision is made (or for reaching the maximum 2500 sample size). This is the total number of enrolments that are expected to have occurred, not how many are contributing to the inference in a given domain.

Figure 5 shows the number of participants entering into each of the randomised comparisons by domain and scenario.

The expected cumulative sample sizes are calculated by extracting the number of participants entering into each randomised comparison (i.e. restricted to the relevant strata) and taking the cumulative sum of these over time. For example, the domain 1 (surgical) expected values are based on the participants in the late acute silo that receive randomised surgical intervention. Similarly, the domain 2 (antibiotic duration) expected values are based on the participants across all silos that received one-stage revision and then receive one of the randomised treatment allocations (i.e. are not in the non-randomised treatment group for any reason).

If a decision was made in a domain, then subsequent enrolments would be assigned to the relevant arm and so the allocation would be expected to deviate away from 1:1. For example, if a superiority decision was made for domain 1 (and the trial was still ongoing) then all subsequent participants are assigned to the superior intervention. Finally, if a decision was made for all research questions, the trial stops early. To avoid any bias in the calculation, we use LOCF to propagate the expectations forward through to the maximum number of enrolments.

Parameter estimation

Figure 6 shows the median value and 95% quantiles from the posterior mean odds ratios obtained from the simulations by each domain and scenario. The estimates are unconditional, in that they ignore the stopping rule and propagate estimates forward with LOCF so that we are working from a random sample (albeit with static imputation) rather than a dependent sample of simulations.

Figure 7 shows the median value and 95% quantiles for the posterior mean risk differences obtained from the simulations for each domain and scenario. The transformation to a risk difference suggests that the odds ratios amount to effects in the order of 10-20% on the risk scale, dependent on the silo, domain etc.

The AB duration domain has high variance in the distribution of posterior means that we anticipate to observe. That is, when there is no true effect, we might still see posterior means as large as \(\pm 0.2\) on the absolute risk scale.

Fraction of uncertainty resolved

Figure 8 shows the median value and 95% quantiles for the fraction of uncertainty resolved based on

\[ \begin{aligned} \text{Fraction \ Resolved} = 1 - \frac{Var(\beta_{post})}{Var(\beta_{pri})} \end{aligned} \]

where \(Var(\beta_{post})\) and \(Var(\beta_{pri})\) represent the variance associated with the prior and posterior belief for the relevant log-odds ratio for the treatment effects. This is basically just a way to compare the prior and posterior variance. When the posterior is based on negligible data, the variance will be similar to that of the prior and the fraction resolved will be very small. A low fraction resolved (e.g. less than 0.5) suggests that any decision that was made was done so with a substantial amount of uncertainty remaining (you didn’t move far from your prior belief) whereas values close to unity suggest that a lot of the uncertainty has been resolved.

  1. What is obvious from the above plots is also obvious here, the decision made in the AB duration domain are subject to a substantial amount of uncertainty.
Code
l_cfg <- copy(l[[1]]$cfg)

# initial uncertainty
v0 <- (l_cfg$pri$b_trt[2])^2

d_fig <- copy(d_tbl_3)
d_fig[, fr_unc := 1 - (se_lor/v0)]

d_fig <- d_fig[,
                 .(fr_unc = median(fr_unc),
                   q_025 = quantile(fr_unc, prob = 0.025),
                   q_975 = quantile(fr_unc, prob = 0.975)), 
                 keyby = .(scenario, desc, analys, domain, N)]
# setorderv(d_fig, cols = "scenario", order = -1L)
d_fig[, desc := factor(desc, levels = unique(d_fig$desc))]
d_fig[, domain := factor(domain, 
                         levels = 1:4, 
                         labels = c("Surgical", "AB Duration", "AB Ext-proph", "AB Choice"))]

# d_tbl_3[scenario == 8, range(lor)]

ggplot(data = d_fig,  
       aes(x = N, y = fr_unc)) +
  geom_line(lwd = 0.25) +
  geom_errorbar(aes(ymin = q_025, ymax = q_975), lwd = 0.25, width = 50) +
  scale_x_continuous("") +
  scale_y_continuous("Fraction of uncertainty resolved (1-(V_post/V_pri))") +
  ggh4x::facet_grid2(desc ~ domain , 
             labeller = labeller(desc = label_wrap_gen(35)), 
             scales = "free",
             axes = "y",
             independent = "y") +
  # theme_clean() +
  # theme_minimal() +
  theme(text = element_text(size = 6),
        strip.text.y.right = element_text(angle = 0,
                                      hjust = 0,
                                      vjust = 0.2,
                                      size = 4),
        strip.text.x = element_text(angle = 0, size = 5),
        axis.ticks = element_blank(),
        axis.text.x = element_text(angle = 45, vjust = 1, hjust=1, size = 5),
        axis.text.y = element_text(size = 5),
        panel.grid.minor = element_blank(),
        panel.grid.major = element_line(color = "grey",
                                  linewidth = 0.1,
                                  linetype = 1))
Figure 8: Median values and quantiles for fraction of uncertainty resolved