Description
Apache Airflow version
3.1.7
If "Other Airflow 3 version" selected, which one?
No response
What happened?
DAG-level on_failure_callback does not fire in Airflow 3.1.7 with KubernetesExecutor.
Task-level on_failure_callback (via default_args) works correctly — it fires on the worker pod.
Observations
-
DagRun reaches
failedstate — confirmed viaSELECT state FROM dag_run. Thelast_scheduling_decisionandend_dateare within milliseconds of each other. -
has_on_failure_callbackis present in serialized data — confirmed viaSELECT data::text LIKE '%has_on_failure_callback%' FROM serialized_dag. -
Scheduler DEBUG logs show
"callback is empty"for every DagRun — this is the log from_send_dag_callbacks_to_processorwhencallback_to_runisNone, meaningupdate_state()returned no callback. -
DAG processor logs show no callback-related activity — only routine DAG file parsing (
Setting next_dagrun ... to None).
What you think should happen instead?
When on_failure_callback is set on a DAG and the DagRun fails, the callback should fire. This worked in Airflow 2.x.
How to reproduce
- Create a DAG with
on_failure_callbackset at the DAG level:
from datetime import datetime
from airflow.sdk import DAG
from airflow.providers.standard.operators.python import PythonOperator
def my_callback(context):
print("DAG failure callback fired")
def task_that_fails():
raise ValueError("Intentional error")
with DAG(
dag_id="test-dag-level-callback",
schedule=None,
start_date=datetime(2025, 1, 1),
catchup=False,
on_failure_callback=[my_callback],
) as dag:
PythonOperator(
task_id="task_that_fails",
python_callable=task_that_fails,
)
-
Trigger the DAG. The task fails, the DagRun is marked as
failed. -
The DAG-level
on_failure_callbacknever fires. No output in scheduler logs, DAG processor logs, or task logs. -
Enable DEBUG logging on the scheduler (
AIRFLOW__LOGGING__LOGGING_LEVEL=DEBUG) and search for"callback is empty"— it appears for every DagRun. -
For comparison, move the callback to
default_args:
with DAG(
dag_id="test-task-level-callback",
schedule=None,
start_date=datetime(2025, 1, 1),
catchup=False,
default_args={"on_failure_callback": [my_callback]},
) as dag:
...
This works — the task-level callback fires on the worker pod.
Operating System
Linux
Versions of Apache Airflow Providers
apache-airflow 3.1.7
apache-airflow-core 3.1.7
apache-airflow-providers-amazon 9.21.0
apache-airflow-providers-celery 3.15.2
apache-airflow-providers-cncf-kubernetes 10.12.3
apache-airflow-providers-common-compat 1.13.0
apache-airflow-providers-common-io 1.7.1
apache-airflow-providers-common-messaging 2.0.2
apache-airflow-providers-common-sql 1.30.4
apache-airflow-providers-docker 4.5.2
apache-airflow-providers-elasticsearch 6.4.4
apache-airflow-providers-fab 3.2.0
apache-airflow-providers-ftp 3.14.1
apache-airflow-providers-git 0.2.2
apache-airflow-providers-google 19.5.0
apache-airflow-providers-grpc 3.9.2
apache-airflow-providers-hashicorp 4.4.3
apache-airflow-providers-http 5.6.4
apache-airflow-providers-microsoft-azure 12.10.3
apache-airflow-providers-mysql 6.4.2
apache-airflow-providers-odbc 4.11.1
apache-airflow-providers-openlineage 2.10.1
apache-airflow-providers-postgres 6.5.3
apache-airflow-providers-redis 4.4.2
apache-airflow-providers-sendgrid 4.2.1
apache-airflow-providers-sftp 5.7.0
apache-airflow-providers-slack 9.6.2
apache-airflow-providers-smtp 2.4.2
apache-airflow-providers-snowflake 6.9.0
apache-airflow-providers-ssh 4.3.1
apache-airflow-providers-standard 1.11.0
apache-airflow-task-sdk 1.1.7
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct