[Stability][WATCH-001] Plan/search loop may continue after enough information
#3122 opened on May 21, 2026
Description
Parent stability dashboard: #3107
This issue tracks WATCH-001 from #3107. It is not yet confirmed as a consistent release blocker, but it needs targeted regression testing.
Problem
A normal research prompt can continue issuing search/fetch tool calls after the model has already reasoned that it has enough information to summarize.
Representative prompt:
总结本周体育新闻
Observed failure shape:
- many
web_searchandweb_fetchcalls; - reasoning indicated enough information had been collected;
- the model still issued more search/fetch calls;
- no final answer was produced before manual interruption;
- token usage reached roughly the 200K class.
A later run of the same prompt succeeded with much lower token usage, so this appears intermittent.
Additional evidence
A small local HTML comparison report was generated for this item:
plan-search-loop-token-report.zip
The zip contains the HTML report. It can be downloaded and opened locally to inspect the failing/successful run comparison, the point where the model appeared to have enough information, and the subsequent extra search calls.
Impact
- Common research tasks can burn tokens without a final answer.
- Users have no clear signal that the agent is looping.
Expected behavior
- Once the agent determines it has enough information, it should produce the answer instead of continuing search.
- Tool-call loops should have a convergence or budget guard.
- If the agent cannot finish, it should return a clear partial/failure response.