[Stability][BUG-003] Large write_file failures amplify token usage · bytedance/deer-flow#3114

(3 comments) (0 reactions) (1 assignee)Python (67,767 stars) (9,005 forks)batch import

help wanted

Description

Parent stability dashboard: #3107

This issue tracks BUG-003 from #3107.

Problem

When generating a large HTML artifact, write_file can fail because the model output is truncated or tool arguments become incomplete. The failure path can echo large attempted file contents back into the conversation state, causing subsequent model calls to carry much larger context.

Evidence

Source: gateway log, token usage middleware.

LLM token usage: input=29324 output=8192 total=37516
LLM token usage: input=46564 output=8192 total=54756
LLM token usage: input=63274 output=3903 total=67177
LLM token usage: input=71117 output=2682 total=73799

Source: checkpoint/state inspection of write_file tool messages.

write_file error payload: ~23.7K chars
write_file error payload: ~24.1K chars
write_file error payload: ~10.6K chars

Another observed shape:

Source: checkpoint/state inspection of AI message usage + following write_file tool result.

write_file output=8192 finish_reason=length
write_file missing required path
tool error echoed ~23K chars of attempted HTML content

Suspected mechanism

The model tries to generate a large HTML report as one write_file call.
Output hits a limit or tool args become incomplete.
write_file fails.
The tool error includes a large portion of the attempted content.
That large error becomes part of conversation state.
The next LLM call has a much larger input context.
The agent retries with another writing strategy.

Impact

Token usage can grow from a normal large task into a million-token class run.
Runtime cost becomes hard for users to predict.
Persistence/checkpoint writes also increase.
The final artifact may eventually succeed, but after expensive retries.

Expected behavior

Tool errors should not echo large content arguments back into model context.
Large artifact generation should use a bounded, reliable writing strategy.
If an artifact cannot be written, the error returned to the model should be concise and structured.

Contributor guide

Tech stack: python
Domain: backendperformance
Issue type: bug
Difficulty: 3
Estimated time: 1-3 hours
Activity status: blocked
Clarity: clear
Prerequisites: PythonLLM token usagewrite file toolerror handling
Newbie friendliness: 25
Research direction: Investigate the write file tool implementation to understand how errors are structured and echoed into conversation state. Look at the parent issue #3107 for context on the stability dashboard. Examine the token usage middleware and checkpoint/state inspection logs to trace the amplification. Consider modifying the tool to truncate large error payloads or implement a bounded writing strategy for large artifacts.