ansible/awx

acquire_lock: TOCTOU race condition on PROJECTS_ROOT with multiple task replicas

Open

#16447 opened on May 7, 2026

View on GitHub
 (1 comment) (0 reactions) (0 assignees)Python (13,071 stars) (3,333 forks)batch import
communitygood first issue

Description

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that AWX is open source software provided for free and that I might not receive a timely response.
  • I am NOT reporting a (potential) security vulnerability. (These should be emailed to security@ansible.com instead.)

Bug Summary

Summary

When running AWX with multiple task replicas (>1), jobs fail immediately with FileExistsError on /var/lib/awx/projects when triggered in parallel. The root cause is a TOCTOU race condition in acquire_lock().

AWX Version

24.6.1 (also present in latest devel branch as of 2026-05-05)

Steps to Reproduce

  1. Deploy AWX with replicas: 3 and a RWX PVC for projects (CephFS/NFS)
  2. Trigger 2+ jobs simultaneously targeting different projects
  3. Observe immediate failure on some jobs

Error

File "awx/main/tasks/jobs.py", line 379, in acquire_lock os.mkdir(settings.PROJECTS_ROOT) FileExistsError: [Errno 17] File exists: '/var/lib/awx/projects'

Root Cause

In awx/main/tasks/jobs.py, the acquire_lock() function uses a non-atomic check-then-act pattern on PROJECTS_ROOT:

# Current code - TOCTOU race condition
if not os.path.exists(settings.PROJECTS_ROOT):
    os.mkdir(settings.PROJECTS_ROOT)

With multiple task pods running concurrently, all pods can pass the os.path.exists() check simultaneously before any of them creates the directory, causing all but the first to raise FileExistsError.

Note: the per-project locking mechanism using fcntl.lockf() is correctly implemented and unaffected by this bug.

Proposed Fix

Replace the non-atomic pattern with the atomic os.makedirs():

# Fix - atomic and idempotent
os.makedirs(settings.PROJECTS_ROOT, exist_ok=True)

This is a one-line fix. exist_ok=True makes the call a no-op if the directory already exists, eliminating the race condition entirely.

Workaround

Reduce task replicas to 1. This eliminates the race condition but removes task HA.

Additional Context

  • Confirmed present in devel branch as of 2026-05-05
  • PVC access mode: ReadWriteMany (CephFS)
  • Operator version: 2.19.1
  • The bug is triggered even when parallel jobs target different projects, since all jobs pass through this PROJECTS_ROOT check before reaching their individual project lock path

AWX version

24.6.1

Select the relevant components

  • UI
  • UI (tech preview)
  • API
  • Docs
  • Collection
  • CLI
  • Other

Installation method

kubernetes

Modifications

no

Ansible version

No response

Operating system

No response

Web browser

No response

Steps to reproduce

Steps to Reproduce

  1. Deploy AWX with replicas: 3 on Kubernetes
  2. Configure a RWX PVC for projects storage (CephFS or NFS)
  3. Create 2+ job templates pointing to different projects
  4. Trigger all jobs simultaneously (e.g. via scheduled jobs or API calls at the same time)
  5. Observe that some jobs fail immediately before playbook execution

Expected Behavior

All jobs should start normally regardless of how many task replicas are running or how many jobs are triggered simultaneously.

Actual Behavior

Some jobs fail immediately with: File "awx/main/tasks/jobs.py", line 379, in acquire_lock os.mkdir(settings.PROJECTS_ROOT) FileExistsError: [Errno 17] File exists: '/var/lib/awx/projects'

The failure rate increases with the number of task replicas and the number of simultaneous jobs.

Expected results

File "awx/main/tasks/jobs.py", line 379, in acquire_lock os.mkdir(settings.PROJECTS_ROOT) FileExistsError: [Errno 17] File exists: '/var/lib/awx/projects'

Actual results

File "awx/main/tasks/jobs.py", line 379, in acquire_lock os.mkdir(settings.PROJECTS_ROOT) FileExistsError: [Errno 17] File exists: '/var/lib/awx/projects'

Additional information

No response

Contributor guide