Causal DAG Framework for Factor Analysis

What is a DAG?

A DAG (Directed Acyclic Graph) is a causal map made of arrows pointing from cause to effect. 'Directed' means the arrows have a direction, and 'Acyclic' means there are no loops where you can get back to where you started.

We use it to decide what to adjust for (control) to remove confounding, and what not to adjust for to avoid introducing bias (e.g., mediators, colliders).

e.g., Value ← Size → Returns : Size is a confounder → Controlling for Size cleans up the relationship.
e.g., Risk Premium → Coverage ← Inst. Flow : Coverage is a collider → Controlling for it creates a spurious correlation.

Pearl's d-separation: A Quant Guide

Think of d-separation as the "traffic rules for a causal map (DAG)." These rules determine which paths are 'open' for information (correlation) to flow and which are 'blocked.' A quant's goal is to measure the pure causal effect of a factor (T) on returns (Y), and d-separation provides the core rules to achieve this.

🚦 The Three Core Traffic Rules

1. Chains - The Mediator Path ⛓️

Structure: A → M → B

Traffic Rule: Information flows from A to B. However, if you control for the variable in the middle (M), the path becomes blocked.

Quant Example: `Momentum → Institutional Flow → Returns`
To know the 'total effect' of the Momentum factor, you must not control for the intermediary `Institutional Flow`. The moment you do, you block the crucial path through which momentum affects returns via institutions, leading to an underestimation of the effect.

2. Forks - The Confounder Path 🍴

Structure: A ← Z → B

Traffic Rule: A common cause Z creates a flow of information between A and B, resulting in a spurious correlation. This path can only be blocked by controlling for Z.

Quant Example: `Value Factor (HML) ← Company Size → Returns` (Crucial!)
Company `Size` (Z) affects both value metrics (A) and future `Returns` (B) (e.g., the size effect). Without controlling for `Size`, you can't know if the observed performance is due to the value factor itself or just the size effect. To close this "backdoor path," you must control for `Size`.

3. Colliders - The Selection Bias Trap 💥

Structure: A → C ← B

Traffic Rule: This path is naturally blocked. However, the moment you control for the collider (C), the blocked path opens up, creating a spurious correlation between A and B.

Quant Example: `High Risk → Analyst Coverage ← Institutional Ownership`
If you filter your data to analyze "only stocks covered by analysts," you are controlling for the collider `Analyst Coverage` (C). In this sample, a low-risk stock must have had high institutional ownership to get coverage. This creates a fake relationship between risk and ownership within your sample, a classic 'selection bias'. Never control for a collider.

Conclusion: The Quant's Recipe

d-separation provides a systematic answer to the question, "Which variables should I include in my regression?" The recipe is: 1) Find and close all backdoor paths (control for forks), 2) Do not block the intermediate steps of the path you want to measure (be careful with chains), and 3) Avoid the trap of creating new spurious paths (never control for colliders).

Causal DAG Framework for Factor Analysis

📘 Overview — Show All Paths

📈 Path Statistics

Causal DAG Framework for Factor Analysis

📘 Overview — Show All Paths

📈 Path Statistics

What is a DAG?

Pearl's d-separation: A Quant Guide

🚦 The Three Core Traffic Rules

1. Chains - The Mediator Path ⛓️

2. Forks - The Confounder Path 🍴

3. Colliders - The Selection Bias Trap 💥

Conclusion: The Quant's Recipe

Node Information

✨ AI Generated Information