P(Y | do(T=t)) = ∑_z P(Y | T=t, Z=z) P(Z=z) (valid when all backdoors are closed).
Y = α + τ·T + γ′Z_pre + IndustryFE + DateFE + ε
Think of d-separation as the "traffic rules for a causal map (DAG)." These rules determine which paths are 'open' for information (correlation) to flow and which are 'blocked.' A quant's goal is to measure the pure causal effect of a factor (T) on returns (Y), and d-separation provides the core rules to achieve this.
Structure: A → M → B
Traffic Rule: Information flows from A to B. However, if you control for the variable in the middle (M), the path becomes blocked.
Quant Example: `Momentum → Institutional Flow → Returns`
To know the 'total effect' of the Momentum factor, you must not control for the intermediary `Institutional Flow`. The moment you do, you block the crucial path through which momentum affects returns via institutions, leading to an underestimation of the effect.
Structure: A ← Z → B
Traffic Rule: A common cause Z creates a flow of information between A and B, resulting in a spurious correlation. This path can only be blocked by controlling for Z.
Quant Example: `Value Factor (HML) ← Company Size → Returns` (Crucial!)
Company `Size` (Z) affects both value metrics (A) and future `Returns` (B) (e.g., the size effect). Without controlling for `Size`, you can't know if the observed performance is due to the value factor itself or just the size effect. To close this "backdoor path," you must control for `Size`.
Structure: A → C ← B
Traffic Rule: This path is naturally blocked. However, the moment you control for the collider (C), the blocked path opens up, creating a spurious correlation between A and B.
Quant Example: `High Risk → Analyst Coverage ← Institutional Ownership`
If you filter your data to analyze "only stocks covered by analysts," you are controlling for the collider `Analyst Coverage` (C). In this sample, a low-risk stock must have had high institutional ownership to get coverage. This creates a fake relationship between risk and ownership within your sample, a classic 'selection bias'. Never control for a collider.
d-separation provides a systematic answer to the question, "Which variables should I include in my regression?" The recipe is: 1) Find and close all backdoor paths (control for forks), 2) Do not block the intermediate steps of the path you want to measure (be careful with chains), and 3) Avoid the trap of creating new spurious paths (never control for colliders).