Population structure

Published

April 16, 2024

Modified

April 11, 2024

The following provides some detail on the assumptions relating to important design variables. The outcome model is specified separately.

Silo

The total sample size is divided across the silos in proportion to the values described in the table below. Each simulated dataset will vary somewhat from these proportions due to the stochastic nature of the data generation process.

Silo categories (\(\pi\) denotes probability of membership)
Silo \(\pi\)
early 0.3
late 0.5
chronic 0.2

Site of infection (joint)

Each silo comprises patients assumed to have primary infection in either knee or hip (not both). The assumed proportion of infections for each joint and for each silo are shown below.

Site of infection (\(\pi\) denotes probability of site infection conditional on silo membership)
Silo Joint \(\pi\)
early knee 0.4
early hip 0.6
late knee 0.7
late hip 0.3
chronic knee 0.5
chronic hip 0.5

Randomisation into surgery domain (A)

Only the late-stage infection cohort has randomisation options for surgery type. Other silos are assigned by clinician/patient preference.

Surgery domain (A)

Only late stage patients enter the surgery domain and are randomised 1:1 to DAIR/revision. The clinician selects what type of revision to perform.

Early stage patients do not receive randomisation and are assumed to mostly receive DAIR, although they may have any form of surgery. Chronic stage patients do not receive randomisation and are assumed to receive DAIR, one-stage and two-stage based on clinician assessment.

Intended surgery

The planned surgery will be based on a range of factors including clinician preference, hospital protocols etc.

For early stage infection patients, we assume the proportion of dair, one and two-stage surgery to be 90%, 10% and 0%.

For late stage infection patients randomised to revision, we assume that the preferences for one and two-stage are 30% and 70%. For late stage infection patients randomised to dair, we assume that the preferences for dair, one and two-stage are 20%, 24% and 56%.

For chronic stage infection patients, we assume that the preferences for dair, one and two-stage are 20%, 20% and 60%.

Randomisation into duration domain (B)

Entry into domain B is dependent on the surgery that was actually received.

Surgery received is dependent on whether surgery was randomised (as in late infection patients) and what the surgical preference is (as in chronic infection patients).

The surgery received may deviate from the original randomised allocation or preferred approach (in the case where randomisation was not applicable).

Duration domain (B)

Within the duration domain, allocation is to long vs short duration, the meaning of which differs under one and two-stage surgery. Patients receiving DAIR are assumed to have 12 wk duration (not randomised). The default randomisation is 1:1 for long vs short within surgery type.

Randomisation into antibiotic choice domain (C)

The data generating process assumes that 60% of the total sample enter into this domain at random, unrelated to risk factors.

Antibiotic choice domain (C)

The treatment options and allocation probabilities for domain C are 1:1.

Encoded specification

The above specification is bundled into an R package (roadmap.data) for consistent data generation for ROADMAP.

roadmap.data::get_pop_spec()
$r_silo
  early    late chronic 
    0.3     0.5     0.2 

$r_joint
        knee hip
early    0.4 0.6
late     0.7 0.3
chronic  0.5 0.5

$r_a
$r_a$early
[1] NA

$r_a$late
dair  rev 
 0.5  0.5 

$r_a$chronic
[1] NA


$r_a_q
$r_a_q$early
dair  one  two 
 0.9  0.1  0.0 

$r_a_q$late
dair  one  two 
0.20 0.24 0.56 

$r_a_q$chronic
dair  one  two 
 0.2  0.2  0.6 


$r_b
$r_b$dair
[1] NA

$r_b$one
 long short 
  0.5   0.5 

$r_b$two
 long short 
  0.5   0.5 


$r_c
norif   rif 
  0.5   0.5 

The following function simulates the design matrix.

roadmap.data::get_design
function (N = 2500, pop_spec = NULL, idx_s = 1) 
{
    if (is.null(pop_spec)) {
        pop_spec <- get_pop_spec()
    }
    l <- sample(0:2, N, replace = T, prob = pop_spec$r_silo)
    l1 = as.numeric(l == 1)
    l2 = as.numeric(l == 2)
    j <- rbinom(N, 1, pop_spec$r_joint[l + 1, 2])
    if (all(is.na(unlist(pop_spec$r_a)))) {
        er <- rep(0, N)
    }
    else {
        er <- as.numeric(l == 1)
        i_rec <- as.logical(rbinom(er[l == 1], 1, 0.02))
        er[l == 1][i_rec] <- 0
    }
    r <- rep(NA, N)
    for (i in 1:length(pop_spec$r_a)) {
        z <- pop_spec$r_a[[i]]
        if (all(!is.na(z))) {
            r[l == (i - 1)] <- sample(0:(length(z) - 1), sum(l == 
                (i - 1)), TRUE, z)
        }
        else {
            r[l == (i - 1)] <- 0
        }
    }
    r[er == 0] <- 0
    dtmp <- data.table(cbind(l, er, r, srp = rep(NA, N)))
    for (i in 1:length(pop_spec$r_a_q)) {
        z <- pop_spec$r_a_q[[i]]
        dtmp[l == i - 1 & er == 0, `:=`(srp, sample(0:(length(z) - 
            1), .N, TRUE, z))]
        dtmp[l == i - 1 & er == 1 & r == 0, `:=`(srp, 0)]
        dtmp[l == i - 1 & er == 1 & r == 1, `:=`(srp, sample(1:(length(z) - 
            1), .N, TRUE, z[2:length(z)]/sum(z[2:length(z)])))]
    }
    srp <- dtmp$srp
    dtmp <- data.table(srp, ed = rep(0, N))
    if (all(!is.na(pop_spec$r_b$one))) {
        dtmp[srp == 1, `:=`(ed, 1)]
    }
    if (all(!is.na(pop_spec$r_b$two))) {
        dtmp[srp == 2, `:=`(ed, 1)]
    }
    ed <- dtmp$ed
    dtmp <- data.table(cbind(srp, ed, d = rep(NA, N)))
    for (i in 1:length(pop_spec$r_b)) {
        z <- pop_spec$r_b[[i]]
        if (all(!is.na(z))) {
            dtmp[srp == (i - 1), `:=`(d, sample(0:(length(z) - 
                1), .N, TRUE, z))]
        }
        else {
            dtmp[srp == (i - 1), `:=`(d, 0)]
        }
    }
    d <- dtmp$d
    if (all(is.na(pop_spec$r_c))) {
        ef <- rep(0, N)
        f <- rep(0, N)
    }
    else {
        ef <- rbinom(N, 1, 0.6)
        f <- as.numeric((ef == 1) * rbinom(N, 1, pop_spec$r_c))
    }
    D <- data.table(l, l1, l2, j, er, ed, ef, erx = 1 - er, edx = 1 - 
        ed, efx = 1 - ef, r, srp, srp0 = as.numeric(srp == 0), 
        srp1 = as.numeric(srp == 1), srp2 = as.numeric(srp == 
            2), d, f)
    D[, `:=`(id, idx_s:(N + idx_s - 1))]
    setcolorder(D, "id")
    D
}
<bytecode: 0x1319e28b8>
<environment: namespace:roadmap.data>

The data generation assumptions imply unique patient groups on which we would observe the outcome. The outcome is known to be heterogenous across these groups and yet the stated goal is to aggregate measures of effect (odds ratios) across all these groups, e.g. no rif vs rif.

d <- roadmap.data::get_design()
head(d)
   id l l1 l2 j er ed ef erx edx efx r srp srp0 srp1 srp2 d f
1:  1 2  0  1 1  0  1  0   1   0   1 0   2    0    0    1 1 0
2:  2 1  1  0 0  1  0  1   0   1   0 0   0    1    0    0 0 0
3:  3 1  1  0 0  1  0  1   0   1   0 0   0    1    0    0 0 1
4:  4 0  0  0 1  0  0  1   1   1   0 0   0    1    0    0 0 1
5:  5 1  1  0 0  1  0  0   0   1   1 0   0    1    0    0 0 0
6:  6 0  0  0 1  0  0  1   1   1   0 0   0    1    0    0 0 0