Idempotency: Tasks should produce the same result if run multiple times. Use INSERT OVERWRITE or delete-then-insert patterns. Never append blindly.
Templating: Use Jinja templates for dynamic values:
query = "SELECT * FROM events WHERE date = '{{ ds }}'"
{{ ds }} resolves to the execution date, enabling backfills.
Avoid top-level code: DAG files run frequently for parsing. Heavy imports or database connections at module level slow everything down.
Keep DAGs simple: Complex logic belongs in called functions, not DAG definitions.