The theory of DAG provides graphical notation and a non-parametric probabilistic terminology to describe and evaluate causal relationships [11]. The use of DAGs in epidemiology is emergent [12] and it is especially helpful with multiple potential confounders [12, 13] that may introduce systematic bias [10, 14]. In DAGs, confounding associations between two variables may come from unblocked backdoor paths [13] that can be graphically identified because they share parent nodes. With a formal definition of backdoor path, for instance, DAG provides a general explanation of the Simpson's paradox [15], where a phenomenon appears to reverse the sign of the estimated association in disaggregated subsets in comparison to the whole population. As a framework, DAG supplies analytical tools to evaluate which adjustment is mandatory (to predict a non-causal sign reverse) and which covariate should be omitted (to estimate the causal effect), thereby enforcing the elicitation of qualitative causal assumptions [11, 12, 14].
A hypothetical DAG model with latent variable was conceived to evaluate the influence of various types of covariates on the focal association. Initially, we drew the main causal path from exposure to outcome. The DAG in Figure 1 starts from the infection by SARS-CoV-2 (exposure E) that, in some cases, leads to ‘Moderate-to-severe inflammation due to COVID-19’ (MSIC, hypothetical latent variable (E→MSIC)), and that inflammation causes two outcomes (mutual dependent relationship (H←MSIC→B)): (H) hospitalisation decision; and (B = {B1,…,Bk}) blood tests measured at hospital admission. The blood tests are selected according to their strength with hospitalisation. The focal outcomes under investigation are hospitalisation (H) and blood tests (B).
Initial hypothetical directed acyclic diagram with the main causal path of a moderate-to-severe COVID-19 inflammation (MSIC), one risk factor (RF3) and one confounder (BOC1) of the focal outcomes (H and B1). Legend: MSIC is a latent variable (unmeasured); outcomes are H: hospitalisation (H = {regular ward, semi-intensive care, ICU}); and B: blood test (B = {B1}).
Considering the initial DAG plausible, we hypothesised candidate covariates that are parents of the variables and may open back-door paths, Figure 1 shows one risk factor (RF3) and one confounder (BOC1). Figure 2 is an enhancement of the initial DAG with potential risk factors, confounders of the focal association and other covariates. Risk factors contribute directly to the development of COVID-19 inflammation (RF = {RF1,…,RFL}, mutual causation relationships (RFi→MSIC←RFj)) and they can also affect other variables. Figure 2 also distinguishes the covariates in terms of their confounding potential on the association between H and B. Covariates that affect both focal outcomes are identified as Both-Outcomes-Confounders (BOC = {BOC1,…,BOCm}), as they are correlated to the focal outcomes but not to COVID-19, and when affect one outcome as Single-Outcome-Covariate (SOC = {SOC1,…,SOCn}). These covariates are not exhaustive but to generate causal graph criteria for handling confounding factors.
Hypothetical directed acyclic diagram of a COVID-19 inflammation causal path with risk factors, confounders and other covariates. Legend: Exposure = SARS-CoV-2 (E) (acute respiratory syndrome coronavirus 2); outcomes are H: hospitalisation (H = {regular ward, semi-intensive care, ICU}), and B: blood tests (B = {B1,…,BK}); Covariates are RF: risk factor (RF = {RF1,…,RF4A, RF4B,RF5}), SOC: single outcome covariate (SOC = {SOC1,…,SOC5}) and BOC: both outcomes confounder (BOC = {BOC1,BOC2}).
Causal relationships in DAGs are defined with the do(.) operator that performs a theoretical intervention by holding constant the value of a chosen variable [11, 16]. The association caused by COVID-19 inflammation can be understood as a comparison of the conditional probabilities of hospitalisation (H) given a set of blood tests (B) under intervention to SARS-CoV-2 infection (do(SARS-CoV-2) = 1) and intervention without infection (do(SARS-CoV-2) = 0):
where P(H|B = b,do(SARS-CoV-2 = 1)) represents the population distribution of H (hospitalisation) given a set of blood tests equal to b, if everyone in the population had been infected with SARS-CoV-2. And P(H|B = b’,do(SARS-CoV-2 = 0)) if everyone in the population had not been infected. Of interest is the comparison of these distributional probabilities for each intervention.
The interventions with do(.) generate two modified DAGs:
The do(SARS-CoV-2 = 0) eliminates all arrows directed towards SARS-CoV-2 and to MSIC (Fig. 3). Ignoring the floating covariates, there are single arrow covariates pointing to hospitalisation (RF3, RF4A, SOC1, SOC3) and to blood tests (RF4B, SOC2, SOC4) and fork covariates pointing to both outcomes (BOC1, BOC2, RF5).
Modified directed acyclic diagram with intervention at no exposure (do(SARS-CoV-2 = 0)) to evaluate the influence of covariates on the focal outcomes (H and B). Legend: Exposure = SARS-CoV-2 (E) (acute respiratory syndrome coronavirus 2); Outcomes are H: hospitalisation (H = {regular ward, semi-intensive care, ICU}), and B: blood tests (B = {B1,…,BK}); Covariates are RF: risk factor (RF = {RF1,…,RF4A, RF4B,RF5}), SOC: single outcome covariate (SOC = {SOC1,…,SOC5}) and BOC: both outcomes confounder (BOC = {BOC1,BOC2}).
Similarly, the modified graph of do(SARS-CoV-2 = 1) is equal to the former by adding single arrows from RF1 and RF2 to MSIC; and converting RF3, RF4A, RF4B and RF5 to fork types with arrows directed to MSIC.
As most covariates are either unmeasured or unknown, the effect of their absence can be evaluated following the d-separation concept [11]. This concept attempts to separate (make independent) two focal sets of variables by blocking the causal ancestors (or back-door paths) and by avoiding statistical control for mutual causal descendants [11]. Differently, to preserve the association between descendants of MSIC (Fig. 2), the focal outcomes (H and B) must remain d-connected (dependent on each other only through MSIC) and their relations with other covariates (that may introduce systematic bias) should be d-separated (conditionally independent). Figure 3, at the negative stratum, shows the confounders that may introduce systematic bias into both outcomes: BOC1, BOC2, RF5. The influence of these confounders on the focal association can be estimated with the modified model at the negative strata. A strong association of the outcomes without infection can be due to these confounders and suggest efforts to measure and control for them (as they have to be d-separated). Another pragmatic possibility is to exclude the noisy exams affected by these confounders. The other covariates are single arrows or they affect only one outcome (H or B) – their absence should not be critical because they are likely to be discarded due to poor discriminative performance.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.