[Stability][WATCH-001] Plan/search loop may continue after enough information · bytedance/deer-flow#3122

(0 comments) (0 reactions) (0 assignees)Python (67,767 stars) (9,005 forks)batch import

help wanted

Description

Parent stability dashboard: #3107

This issue tracks WATCH-001 from #3107. It is not yet confirmed as a consistent release blocker, but it needs targeted regression testing.

Problem

A normal research prompt can continue issuing search/fetch tool calls after the model has already reasoned that it has enough information to summarize.

Representative prompt:

总结本周体育新闻

Observed failure shape:

many web_search and web_fetch calls;
reasoning indicated enough information had been collected;
the model still issued more search/fetch calls;
no final answer was produced before manual interruption;
token usage reached roughly the 200K class.

A later run of the same prompt succeeded with much lower token usage, so this appears intermittent.

Additional evidence

A small local HTML comparison report was generated for this item:

plan-search-loop-token-report.zip

The zip contains the HTML report. It can be downloaded and opened locally to inspect the failing/successful run comparison, the point where the model appeared to have enough information, and the subsequent extra search calls.

Impact

Common research tasks can burn tokens without a final answer.
Users have no clear signal that the agent is looping.

Expected behavior

Once the agent determines it has enough information, it should produce the answer instead of continuing search.
Tool-call loops should have a convergence or budget guard.
If the agent cannot finish, it should return a clear partial/failure response.

Contributor guide

Tech stack
Domain
Issue type
Difficulty
Estimated time
Activity status
Clarity
Prerequisites
Newbie friendliness
Research direction