The symptoms are consistent:

This isn’t a scaling issue. It’s a boundary violation.

Airflow Is a Control Plane

Airflow exists to:

It does not exist to:

When orchestration and compute share the same layer, they compete for resources.

That competition is where systems degrade.

DAGs Should Describe Flow — Nothing Else

A DAG answers:

What runs, and in what order?

Not:

How does the data get processed?

Once you embed logic inside DAGs:

Clean systems separate:

The Patterns That Cause Most Failures

These are not edge cases. This is how most Airflow systems fail.

Failure Modes (They Compound Fast)

Failures don’t originate in SQL. They emerge at system boundaries.

The Correct Model

Airflow should coordinate work, not perform it.

  1. Trigger Spark / dbt / containerized jobs
  2. Wait for completion
  3. Pass references (IDs, URIs), not data

Airflow becomes thin, predictable, and stable.

What Improves in Production

Not because of better tooling. Because responsibilities are separated correctly.

The System Model

Think in layers:

If these blur, the system becomes fragile.

Final Take

Most Airflow issues are self-inflicted.

Not because Airflow is limited, but because it’s forced to do work it was never designed for.

If your DAGs are executing real computation, you don’t have a pipeline problem. You have a system design problem.

One Rule

If a task:

It does not belong in Airflow.