Airflow parallelism

parallelism: not a very descriptive name. The description says it sets the maximum task instances for the airflow installation, which is a bit ambiguous — if I have two hosts running airflow workers, I’d have airflow installed on two hosts, so that should be two installations, but based on context ‘per installation’ here means ‘per Airflow state database’. I’d name this max_active_tasks.

dag_concurrency: Despite the name based on the comment this is actually the task concurrency, and it’s per worker. I’d name this max_active_tasks_for_worker (per_worker would suggest that it’s a global setting for workers, but I think you can have workers with different values set for this).

max_active_runs_per_dag: This one’s kinda alright, but since it seems to be just a default value for the matching DAG kwarg, it might be nice to reflect that in the name, something like default_max_active_runs_for_dags
So let’s move on to the DAG kwargs:

concurrency: Again, having a general name like this, coupled with the fact that concurrency is used for something different elsewhere makes this pretty confusing. I’d call this max_active_tasks.

max_active_runs: This one sounds alright to me.

source: https://issues.apache.org/jira/browse/AIRFLOW-57


max_threads gives the user some control over cpu usage. It specifies scheduler parallelism.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)