bytedance/deer-flow

[Stability][WATCH-001] Plan/search loop may continue after enough information

Closed

#3122 opened on May 21, 2026

View on GitHub
 (0 comments) (0 reactions) (0 assignees)Python (67,767 stars) (9,005 forks)batch import
help wanted

Description

Parent stability dashboard: #3107

This issue tracks WATCH-001 from #3107. It is not yet confirmed as a consistent release blocker, but it needs targeted regression testing.

Problem

A normal research prompt can continue issuing search/fetch tool calls after the model has already reasoned that it has enough information to summarize.

Representative prompt:

总结本周体育新闻

Observed failure shape:

  • many web_search and web_fetch calls;
  • reasoning indicated enough information had been collected;
  • the model still issued more search/fetch calls;
  • no final answer was produced before manual interruption;
  • token usage reached roughly the 200K class.

A later run of the same prompt succeeded with much lower token usage, so this appears intermittent.

Additional evidence

A small local HTML comparison report was generated for this item:

plan-search-loop-token-report.zip

The zip contains the HTML report. It can be downloaded and opened locally to inspect the failing/successful run comparison, the point where the model appeared to have enough information, and the subsequent extra search calls.

Impact

  • Common research tasks can burn tokens without a final answer.
  • Users have no clear signal that the agent is looping.

Expected behavior

  • Once the agent determines it has enough information, it should produce the answer instead of continuing search.
  • Tool-call loops should have a convergence or budget guard.
  • If the agent cannot finish, it should return a clear partial/failure response.

Contributor guide