apache/dolphinscheduler

[Improvement][Scheduler] Rerun workflow instance should follow the specified workerGroup parameter

Open

#17794 opened on Dec 14, 2025

View on GitHub
 (1 comment) (0 reactions) (0 assignees)Java (11,659 stars) (4,324 forks)batch import
backendhelp wantedimprovement

Description

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

Description

The current rerun mechanism of workflow instances ignores the pre-configured workerGroup parameter, leading to random assignment of tasks to idle workers instead of the specified worker group. This breaks resource isolation and scheduling rules, making it impossible to control task execution nodes as expected during rerun scenarios.

Issue Description

When re-running a workflow instance, the system does not follow the specified workerGroup in the startup parameters, but randomly assigns the task to any idle worker node instead. This violates the expected resource isolation and scheduling rules, and cannot guarantee the consistency of task execution environment between the first run and rerun.

What version of DolphinScheduler are you using?

Version: 3.3.2

What Operating System are you using?

OS: Debian 12

What happened?

  1. Create a workflow and set a specific workerGroup (e.g., "w1") in the startup parameters when running the workflow for the first time;
  2. The first run correctly executes on the nodes in the specified workerGroup;
  3. When re-running the failed/finished workflow instance (via "Rerun" button), the system ignores the workerGroup parameter;
  4. The re-run task is assigned to any idle worker node, not the specified workerGroup;

What you expected to happen?

  1. When re-running a workflow instance, the system should inherit and use the workerGroup parameter specified in the original startup parameters;
  2. The rerun task must be executed only on the nodes in the specified workerGroup, consistent with the first run;
  3. If the specified workerGroup has no idle nodes, the task should wait in the queue instead of being randomly assigned to other worker groups.

How to reproduce it (as minimally and clearly as possible)?

  1. Prepare a DolphinScheduler cluster with at least two independent worker groups (e.g., group A: node1/node2, group B: node3/node4);
  2. Create a simple test workflow (e.g., a shell task that prints the worker node name);
  3. Submit the workflow instance with startup parameter workerGroup=group A;
  4. Confirm the first run executes on node1/node2 (group A) by checking the task log;
  5. After the instance finishes/fails, click the "Rerun" button to re-execute the instance (without modifying any parameters);
  6. Check the task execution node: the rerun task runs on node3/node4 (group B) instead of group A;

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Contributor guide