Design discussion (part 1)

Published

February 2, 2024

Modified

May 6, 2025

Warning: package 'cmdstanr' was built under R version 4.4.3

This provides some technical background with the goal of aiding insight into the limitations and challenges associated with the present design.

Complete design

Conceptually, roadmap is somewhat nested within an approximation of a complete design, see below. We have ignored the choice domain since it’s not relevant to the discussion here, neither are variations in participation and deviations from assignment. The boxes denote nodes where assignment is under statistical control, the circles are self selected.

The complete design as presented still has its quirks. There are multiple version of treatment under revision and these are self selected rather than randomised. Similarly with dair, we expect 12 weeks duration will be the recommended norm, but duration is ultimately self-selected for this group. Aspirationally, we posit 12 and 6 weeks duration following dair per the revision units.

While the duration has been specified as a unified intervention. The figure decomposes the duration domain into duration of antibiotic following the first and second stage operations as this aligns more closely with the interventional procedures. The dashed lines indicate paths which are not available in the current design.

flowchart TD
  Z[Surgery] --> T(Revision type selected) --> W[1st stage duration] --> X[2nd stage duration]
  A1[DAIR] --> A5((DAIR))
  A5 --> B0((12w))
  A5 -.-> B1((6w))

  A2[Revision] --> A3((One-stage))
  A2 --> A4((Two-stage))

  A3 --> B2[12w]
  A3 --> B3[6w]
  A4 --> B4[12w]
  A4 -.-> B5[6w]

  B2 --- STOP[ ]
  B2 --- STOP2[ ]
  B4 --> C1[12w]
  B4 --> C2[7d]
  B5 -.-> C3[12w]
  B5 -.-> C4[7d]

  linkStyle 5 stroke: red;
  linkStyle 11 stroke: red;
  linkStyle 16 stroke: red;
  linkStyle 17 stroke: red;
  linkStyle 12 fill:#FFFFFF00, stroke: #FFFFFF00
  linkStyle 13 fill:#FFFFFF00, stroke: #FFFFFF00
  style STOP  fill:#FFFFFF00, stroke:#FFFFFF00;
  style STOP2  fill:#FFFFFF00, stroke:#FFFFFF00;

Neither nested, nor fractional properly characterise the nature of the roadmap design since some cells are self-selected and others fall outsize the specified set of factor levels. For want of a better descriptor, I will just say that roadmap approximately sits within the complete design.

flowchart TD
  Z[Surgery] --> T(Revision type selected) --> W[1st stage duration] --> X[2nd stage duration]
  A1[DAIR] --> B1((DAIR))
  B1 --> C1((Usual care))

  A2[Revision] --> B2((One-stage))
  A2 --> B3((Two-stage))

  B2 --> C2[12w]
  B2 --> C3[6w]
  B3 --> C4((Usual care))
  C4 --> D1[12w]
  C4 --> D2[7d]

  C2 --- STOP[ ]
  C3 --- STOP2[ ]

  linkStyle 12 fill:#FFFFFF00, stroke: #FFFFFF00
  linkStyle 13 fill:#FFFFFF00, stroke: #FFFFFF00
  style STOP  fill:#FFFFFF00, stroke:#FFFFFF00;
  style STOP2  fill:#FFFFFF00, stroke:#FFFFFF00;

In all cases in the above Usual care following the first operation equates to 12 weeks but that there is some flexibility in this and additionally, a duration of 12 weeks may be precluded by logistical considerations. For example, if the second stage of a two-stage unit occurs within 12 weeks of the first operation then it is not logically possible to assign 12 weeks worth of antibiotic.

Nevertheless, tentatively, we might fill in usual care with 12 weeks to achieve the following structure.

flowchart TD
  Z[Surgery] --> T(Revision type selected) --> W[1st stage duration] --> X[2nd stage duration]
  A1[DAIR] --> B1((DAIR))
  B1 --> C1((12w))

  A2[Revision] --> B2((One-stage))
  A2 --> B3((Two-stage))

  B2 --> C2[12w]
  B2 --> C3[6w]
  B3 --> C4((12w))
  C4 --> D1[12w]
  C4 --> D2[7d]

  C2 --- STOP[ ]
  C3 --- STOP2[ ]

  linkStyle 12 fill:#FFFFFF00, stroke: #FFFFFF00
  linkStyle 13 fill:#FFFFFF00, stroke: #FFFFFF00
  style STOP  fill:#FFFFFF00, stroke:#FFFFFF00;
  style STOP2  fill:#FFFFFF00, stroke:#FFFFFF00;

Now, under the full design model, there would be 8 treatment cells of interest.

DAIR with 12 weeks
DAIR with 6 weeks
Revision, one-stage with 12 weeks
Revision, one-stage with 6 weeks
Revision, two-stage with 12 weeks following 1st, 12 weeks following 2nd
Revision, two-stage with 12 weeks following 1st, 7 days following 2nd
Revision, two-stage with 6 weeks following 1st, 12 weeks following 2nd
Revision, two-stage with 6 weeks following 1st, 7 days following 2nd

But in the current roadmap setup we are reduced to 5 treatment:

DAIR with 12 weeks
Revision, one-stage with 12 weeks
Revision, one-stage with 6 weeks
Revision, two-stage with 12 weeks following 1st, 12 weeks following 2nd
Revision, two-stage with 12 weeks following 1st, 7 days following 2nd

However, for 1, 4 and 5, 12 weeks following the first surgery might not be an entirely reasonable assumption. For now, we will put that thought on hold.

From the current design, although there are 5 cells, there are 3 comparisons which are of interest:

DAIR with 12 weeks vs. revision with 12 weeks following 1st and X (could be 12w, 7d, or a mixture) days following 2nd if two-stage
Revision, one stage with 12 weeks vs. with 6 weeks
Revision, two stage with 12 weeks after 1st and 2nd vs. 7 days after 2nd.

If 1 is what is meant by “Revision vs DAIR” then it is not entirely clear who would find this useful in practice since there is potential for substantial variation in the effect dependent on which components are included.

A saturated model for the full design might be

\[ \begin{aligned} \eta = &\beta_0 + \beta_1r_{\text{one-stage}} + \beta_2 r_{\text{two-stage}} + \beta_3 d_{\text{1, 6 weeks}} \\ &+ \beta_4 r_{\text{one-stage}}d_{\text{1, 6 weeks}} + \beta_5 r_{\text{two-stage}}d_{\text{1, 6 weeks}} \\ &+ \beta_6 r_{\text{two-stage}} d_{2,\text{7 days}} + \beta_7 r_{\text{two-stage}}d_{\text{1, 6 weeks}} d_{2,\text{7 days}} \end{aligned} \]

where the reference group is “DAIR with 12 weeks” and variations in participation are ignored, ditto with respect to the consideration of any level of interactions.

Under the current roadmap design, only a reduced model can be considered. A saturated model for the reduced design could be:

\[ \eta = \alpha_0 + \alpha_1r_{\text{one-stage}} + \alpha_2 r_{\text{two-stage}} + \alpha_3 r_{\text{one-stage}} d_{\text{1, 6 weeks}} + \beta_4 r_{\text{two-stage}} d_{2,\text{7 days}} \]

where the reference group is “DAIR with 12 weeks”.

A reduced model could also be considered under the full design. For example, if we assumed that the effect of first-stage duration was constant across surgery types (which it seems we don’t), and assume that second stage duration is constant across first-stage duration types, then the model would be

\[ \eta = \beta_0 + \beta_1r_{\text{one-stage}} + \beta_2 r_{\text{two-stage}} + \beta_3 d_{\text{1, 6 weeks}} + \beta_6 r_{\text{two-stage}} d_{2,\text{7 days}} \]

i.e. assuming that \(\beta_4 = \beta_5 = \beta_7\) from the earlier model are all set to zero.

This is basically equivalent to the reduced design model. The exception being that the reduced design is estimating \(\alpha_3\), the effect of duration following one-stage revision specifically. The full design would by default estimate the effect of duration using data from all surgery types, \(\beta_3\), assuming that the effect is constant across types, or additionally could target surgery specific interactions.

Why do we only consider the reduced design? Presumably, this is because of the results from other studies that DAIR with 6 weeks is inferior to DAIR with 12 weeks and that in two-stage revision, 6 weeks is considered inferior to 12 weeks. We are investigating 6w vs 12w for one-stage under the assumption that there is effect heterogenity of duration conditional on surgery type. So, inferiority of 6w amongst DAIR/two-stage is insufficient evidence to assume inferiority of 6w in one-stage.

Associational effects

One of the questions that has been raised relates to how it is that the surgical and duration domain are effectively coupled by design thereby necessitating a perspective on combined interventional strategies. To understand this it is worth stepping back for a moment and thinking about the structure and implications of the models we are using and how they are used to characterise measures of association.

Start by thinking about the representation of a linear regression versus a logistic regression and what they are aiming to achieve. This is easiest way to do so via an example so assume we have two independent factors that are predictive of independent continuous and binary outcomes. Assume that we are interested in the effect of A but B is still predictive of the outcome. In other words, assume something analogous to a randomised clinical trial setup where A corresponds to our randomised intervention set (here just control vs test) and B is some prognostic baseline characteristic (age, sex etc). For a continuous outcome, adopt a linear model of the form \(\mathbb{E}[Y | A,B] = \mu = \alpha_0 + \alpha_1 A + \alpha_2 B\) with \(Y\) normally distributed. This model encapsulates our assumptions (unknowable in real life) for the data generating process. For the binary outcome adopt a logistic model of the form \(g(\mathbb{E}[W | A,B]) = \eta = \beta_0 + \beta_1 A + \beta_2 B\) with \(W\) is bernoulli \(\eta\) being log-odds of response and \(g\) being the logit link function such that the conditional mean of the response is now given by applying the inverse link \(\mathbb{E}[W | A,B] = g^{-1}(\beta_0 + \beta_1 A + \beta_2 B)\) with \(g^{-1}(z) = \frac{1}{1+\exp(-z)} = \frac{\exp(z)}{1 + \exp(z)} = \text{expit}(z)\). This latter representation does make sense for the dichotomous response since what we are interested in is the probability of the event and that happens to equate to the expectation, e.g. the conditional expectation of \(W\), given \(X = x\):

\[ \begin{aligned} \mathbb{E}(W | X = x) &= \sum_{w \in \mathcal{W} } w Pr(W=w | X = x) \\ &= 0 Pr(W = 0 | X = x) + 1 Pr(W = 1 | X = x) \\ &= Pr(W = 1 | X = x) \end{aligned} \]

In other words, the conditional mean equates to the conditional probability; \(\mathbb{E}[W | A, B] = g^{-1}(\beta_0 + \beta_1 A + \beta_2 B) = Pr(W = 1 | A, B)\).

Now, instead of B mapping in the logistic regression relating to the presence/absence of a baseline characteristic, think of it as a second set of interventions so that we are now describing a factorial setup (conveniently ignoring the possibility of interactions). Both A and B remain independent dichotomous random variables. In a complete factorial experiment we would randomise units to all combinations of A and B and usually take parameter estimates as the effects (e.g. log-odds-ratios or odds ratios) of interest. The exponentiated \(\beta_1\) is the multiplicative change in the odds of response associated with moving from the control to the intervention group, with the important caveat that we are holding all other terms constant.

In case you need to see how this works out -

Recall that the logistic regression gives us the probability of the outcome via the inverse logit transformation: \(Pr(W = 1 | A, B) = g^{-1}(\beta_0 + \beta_1 A + \beta_2 B)\). The odds ratio for a unit change in A (i.e. a shift from control to test) is defined as:

\[ \begin{aligned} OR(a + 1, a | b) &= \frac{\frac{g^{-1}(\beta_0 + \beta_1 (a+1) + \beta_2 b)}{1-g^{-1}(\beta_0 + \beta_1 (a+1) + \beta_2 b)}}{\frac{g^{-1}(\beta_0 + \beta_1 a + \beta_2 b)}{1-g^{-1}(\beta_0 + \beta_1 a + \beta_2 b)}} \\ &= \frac{\exp(\beta_0 + \beta_1 (a+1) + \beta_2 b)}{\exp(\beta_0 + \beta_1 a + \beta_2 b)} \\ &= \exp(\beta_1) \end{aligned} \]

The \(\beta_2 b\) term cancels out so long as whatever b was, was the same for both groups, which is what the holding all other terms constant part relates to.

Another way to get to this is to take the difference between the two logged values. We know that when we take the difference between two logged values (e.g. the log-odds of response in the treatment vs control group) this equates to the log of their ratio, hence \(\beta_1\) being referred to as the log-odds-ratio.

\[ \begin{aligned} log(\phi) - \log(\psi) = \log(\frac{\phi}{\psi}) \end{aligned} \]

All of the above should be comfortably familiar, but the point was to be explicit for how things are in the context of a conditional logistic regression model and a complete factorial experiment where we have the full set of parameters irrespective of what combination of interventions we are considering.

Now imagine a situation where we remove the secondary intervention factor (B) for one level of the primary intervention factor (A). This is analogous to a simplified version of the current roadmap design where A is representing the surgical domain and B the duration domain. Here we are only thinking about dair vs one-stage revision for the surgical interventions and 12 vs 6 weeks duration following surgery, but only for those having the one-stage procedure.

Given the above setup, we might choose to model the groups independently as we do not intervene on antibiotic duration for the dair group and therefore the system has a distinct data generation processes for each group.

For dair we could model and estimate the log-odds of response in isolation, say \(\theta\) estimated from the cohort that receive dair. We take it for granted that the dair intervention is well defined in all its supportive components, which may include some flexibility in the duration of antibiotics received but it isn’t something we have control over via randomisation. Based on the literature it seems like there might be variation in the response under dair conditional on duration but again we are putting that thought on hold for now.

For the one-stage group, we might model

\[ \begin{aligned} \zeta = \gamma_0 + \gamma_1 B \end{aligned} \]

where the first term is the log-odds of response for those receiving one-stage revision and the second term relates to how the duration of antibiotics impacts that response. Based on the earlier details for how to derive the log-odds-ratio, we take the difference between the two groups:

\[ \begin{aligned} \log(OR) = (\gamma_0 + \gamma_1 B) - \theta \end{aligned} \]

Clearly, we are left with the situation where we need to make some decision on what to do with B since it hasn’t dropped out of the calculations as it did under the complete design setup discussed previously. Specifically, if we set B = 0 then we get one answer for the effect of one-stage vs dair, namely \(\gamma_0 - \theta\) and if we set B = 1 then we get another answer \(\gamma_0+\gamma_1 - \theta\). Furthermore, it doesn’t matter whether we use independent models or one large joint model, you still have to make a decision or an assumption about how to handle B. We haven’t got a single parameter to represent the odds-ratio and if such a parameter existed it would probably have a fairly unusual interpretation.

One response to the question of what to do with B is to average over B and to adopt a marginal view of the measure of association. This necessitates some assumption regarding the distribution of B. In a complete factorial 2x2 design where A and B are randomised 1:1, the distribution of B is known. However, this probably doesn’t reflect the distribution of B in the population.

Consequences of adjustment

Another thing to be aware of is the concept of collapsibility. Formally, measures of association are said to be collapsible if the marginal view equates to a weighted average of the strata specific effects. In a factorial design, for the comparison of treatments in factor A, units can be grouped in stratum based on the assignment of factor B. Odds-ratios are generally not collapsible and so this should probably be a consideration when using odds-ratios as the population-level summary measure for an estimand.

As an example, and with reference to the earlier model, assume that the OR for the treatment group in A relative to the control is 0.5 and for B the effect is 0.1. Additionally, assume that the probability of response in the strata that receive the reference level for both A and B is 0.5. The following simulates a large sample to simply demonstrate the occurrence of differing views of association dependent on the model that is adopted and thus the need to be clear in what we are trying to target in terms of the causal estimand of interest.

library(data.table)
# Large sample size
N <- 1e6

# 1:1 Randomisation of A and B
a <- rbinom(N, 1, 0.5)
b <- rbinom(N, 1, 0.5)

# No interaction by design
eta <- 0 + log(0.5)*a + log(0.1)*b
y <- rbinom(N, 1, plogis(eta))

d <- data.table(a, b, eta, y)

# Multivariable logistic regression
l1 <- glm(y ~ a + b, data = d, family = binomial)

# Conditional probabilities for each treatment group
d_pred <- CJ(a=0:1,b=0:1)
d_pred[, p := predict(l1, type = "response", newdata = d_pred)]

# Observed distribution of b (should be 0.5)
p_b_obs <- mean(b)

# Standardisation to the marginal distribution of b
d_pred[b == 1, p_b := p_b_obs]
d_pred[b == 0, p_b := 1-p_b_obs]
p_a_0 <- d_pred[a == 0, sum(p * p_b)]
p_a_1 <- d_pred[a == 1, sum(p * p_b)]

# Univariate logistic regression
l2 <- glm(y ~ a, data = d, family = binomial)

# Measure of association (all odds-ratios)

# Conditional measures
a_cond <- unname(exp(coef(l1)[2]))
# Marginal odds ratio computed from standardisation 
a_marg_std <- p_a_1/(1-p_a_1) / (p_a_0/(1-p_a_0))
# Marginal odds ratio computed from univariate regression
a_marg_reg <- unname(exp(coef(l2)[2]))

c(a_cond = a_cond, a_marg_std = a_marg_std, a_marg_reg = a_marg_reg)

    a_cond a_marg_std a_marg_reg 
 0.5052205  0.5661782  0.5661979

The conditional esitmates give the strata level odds-ratios that we anticipate, however, the marginal odds-ratios are a reflection the observed distribution of B. Clearly, this has implications for the interpretation and generalisability of the results.

Causal ambiguity

None of the above has considered the causal overlay, which is perhaps the primary motivation for running a randomised clinical trial - we want to be able to say that a change in the response was caused by an intervention rather than simply being associated with it. Some background on a popular causal framework follows.

The Neyman-Rubin (causal) model introudces potential outcomes as a means to specify the causal effects. For a treatment with two levels (e.g. control = 0 vs active = 1), the outcome has two possible (potential) values \(Y_i(0)\) and \(Y_i(1)\) for each experimental unit, \(i\). That is, a hypothetical world is posed where the outcomes under both the control and active treatment are available to us. Two assumptions are implicit in the definition of potential outcomes:

no interference - the potential outcome for unit \(i\) does not depend on any other units’ treatment
consistency¹ - if the treatment is \(T\) then the observed outcome is equal to the potential outcome under \(T\)

Using this notation we can define a unit level effect as say the difference between the potential outcomes \(\tau_i = Y_i(1) - Y_i(0)\). The problem is that we can never observe both the response to both treatment levels on a single unit. We therefore resort to group level effects for a collection of units by defining an average treatment effect \(\tau = \mathbb{E}(Y_i(1) - Y_i(0))\). Unlike the unit level effect, the average treatment effect can be identified under certain conditions and then estimated from the observed data.

One statistical solution² to the identification problem above (identification is the technical term for being able to produce a statistical estimand for a given causal estimand) is to randomly assign treatments to the experimental units so that the probability of assignment does not depend on anything other than the flip of a coin. This implies that the potential outcomes will be independent of the treatment which then means that the expected value of a potential outcome is equal to the expected value of the potential outcome conditioned on the treatment status, e.g. \(\mathbb{E}(Y(0)) = \mathbb{E}(Y(0) | T = 0)\). Formally, this is referred to as the exchaneability assumption and if the consistency assumption also holds then we can resolve the causal quantity into something we can observe and estimate, e.g. \(\mathbb{E}(Y(0) | T = 0) = \mathbb{E}(Y | T = 0)\). Of course, given that the conditional expection is defined as \(\mathbb{E}(Y | T = 0) = \sum_{y\in \mathcal{Y}} y Pr(Y | T = 0) = \sum_{y\in \mathcal{Y}} y \frac{Pr(Y , T = 0)}{Pr(T = 0)}\), there needs to be a non-zero probability of assigning \(T\), otherwise the quantities that we require will not be defined mathematically. Models will work around this, but we are taking a leap of faith (sometimes an unjustifiable one) in order to believe what they are telling us. This final assumption is known as positivity - we need to have a non-zero probability of treatment assignment for all treatments.

So, if our trial has a single domain A which is simply looking at dair (0) vs revision (1) and the outcome were binary then we might posit the causal effect of interest (causal estimand) as the ATE. For a binary outcome amounts to a risk difference, but we are not restricted to this estimand specification, it is just convenient and simple for the purposes of the discussion. Assuming that our modelling approach remains logistic regression we would have

\[ \begin{aligned} \tau &= \mathbb{E}(Y(1)) - \mathbb{E}(Y(0)) \\ &= \mathbb{E}(Y(1)|A = 1) - \mathbb{E}(Y(0)|A = 0) \\ &= \mathbb{E}(Y|A = 1) - \mathbb{E}(Y|A = 0) \\ &= g^{-1}(\beta_0 + \beta_1) - g^{-1}(\beta_0) \end{aligned} \]

which we can work with so long as the above identification assumptions are met. This would mean that what we get from our model can be given a causal interpretation.

If we have two interventional factors, e.g. surgical (A) and duration (B) then we have a larger number of causal estimands to consider under a complete 2x2 factorial design. The potential outcome for a given unit is now usually specified as \(Y_i(a_i, b_i)\) where \(a_i\) and \(b_i\) constitute specific levels of a treatment combination for unit \(i\). For concreteness, take the levels for A to be dair and one-stage and the levels for B to be 12wks and 6wks so that \(Y(1,0)\) would denote the potential outcome under dair and 12wks. A consequence of this setup is that we can no longer meaningfully discuss \(Y(A)\) in isolation as it omits a component of the treatment and is therefore inconsistent in the sense that \(Y(1)\) might refer to either \(Y(1,0)\) or \(Y(1,1)\).

For the factorial design, the exchangeability/unconfoundedness assumption required for identification needs to consider both factors, i.e. we need \((Y(a,b))_{a\in\mathcal{A},b\perp\mathcal{B}} \perp (A,B)\) which can (again) be achieved via randomisation. The final assumption of positivity is also extended so that we require a non-zero probability of assignment for all treatment combinations \(Pr(A = a, B = b) > 0\) for all \(a \in \mathcal{A}\) and \(b \in \mathcal{B}\). If you are going to adhere strictly to the framework, then I think this final point rules out the use of fractional factorial designs.

As mentioned above, with the factorial design we can consider a variety of causal estimands. In our case, these might hypothetically include:

the effect of one-stage vs dair under 12wks duration
the effect of one-stage vs dair under 6wks duration
the effect of one-stage vs dair under usual practice for duration
the effect of one-stage + 12wks vs dair + 6wks

For the sake of argument, pick (1) and use an analogous approach to earlier to identify the effect:

\[ \begin{aligned} \tau &= \mathbb{E}(Y(1,0)) - \mathbb{E}(Y(0,0)) \\ &= \mathbb{E}(Y(1,0)|A = 1,B=0) - \mathbb{E}(Y(0,0)|A = 0,B=0) \\ &= \mathbb{E}(Y|A = 1,B=0) - \mathbb{E}(Y|A = 0,B=0) \\ &= g^{-1}(\beta_0 + \beta_1) - g^{-1}(\beta_0) \end{aligned} \]

using the same model specification (\(\mathbb{E}[W | A,B] = g^{-1}(\beta_0 + \beta_1 A + \beta_2 B)\)) as was used earlier and where \(\tau\) now represents a combination effect. However, while \(Y(A)\) is ill-defined in the factorial setting, we can still address effects within a single factor through marginalisation:

\[ \begin{aligned} \psi &= \mathbb{E}_B[\mathbb{E}[(Y(1,B) - Y(0,B)]] \end{aligned} \]

where \(\psi\) is the marginal effect for the selected levels of \(A\) although this still clearly depends on the distribution of B and therefore might not generalise well to the super population if the measure of association is non-collapsible (see earlier). The central point that I am trying to get across here is that under a complete factorial design we can provide a clear specification of the causal estimands, we can identify the associated statistical estimands and this allows us to estimate effects of interest.

I do not believe that we have this level of clarity or unification within roadmap. Returning to the current simplified roadmap setup where we have dair vs one-stage and for one-stage we have 12 vs 6 wks, we have

dair + 12 wks: \(Y(0,0)\)
one + 12 wks: \(Y(1,0)\)
one + 6 wks: \(Y(1,1)\)

\(Y(0,1)\) is an impossibility by design even though it exists in principle. This looks somewhat like a violation in the positivity assumption but arguably we can work around it. Additionally, \(Y(0,0)\) is actually never formally assigned to any unit. What we have is \(Y(0,U)\) where \(U\) denotes whatever usual practice is, or perhaps a recommendation of usual practice. While we expect (and might assume) that \(Y(0,U) := Y(0,0)\) but that does not appear to be guaranteed. Again, we can potentially make assumptions that might help us with all of the above within the context of a model but I am currently uncertain as to whether the causal interpretation can truly be resolved under the present specification.

The above simplification is obviously (and purposefully a bit inaccurate) as we currently specify that the surgical domain randomises dair vs revision where the revision intervention is then selected to be either one-stage or two-stage by a clinician based on unobserved variables.

Considering a design where only the surgical domain was included, this would also seem to violate the consistency assumption. Recall that consistency connects the potential outcome with the observed data. It is defined as \(Y_i = Y_i(x)\) if \(x = x_i\); in words - the observed outcome for a unit equals the potential outcome for that unit with the intervention level set to the exposure that the unit received. However, given that what we observe under revision is consequence of either a one-stage or two-stage procedure, then there are two distinct values that singular \(Y_i(1)\) could take on, either \(Y_{one}\), the observed outcome under a one-stage revision or \(Y_{two}\), the observed outcome under a two-stage revision. The argument against this would be that the meaning of revision simply maps to a one or two-stage procedure and the distinction does not matter.

Summary

These notes are an attempt to provide some background and clarifications as well as formalise some of the thinking. Hopefully they can provide something of a common understanding in relation to some of the unusual aspects of the design.

Footnotes

Has no relation with statistical consistency.↩︎
Holland (1986) talks about others.↩︎