bytedance/deer-flow

[Stability][BUG-003] Large write_file failures amplify token usage

Open

#3114 opened on May 21, 2026

View on GitHub
 (3 comments) (0 reactions) (1 assignee)Python (67,767 stars) (9,005 forks)batch import
help wanted

Description

Parent stability dashboard: #3107

This issue tracks BUG-003 from #3107.

Problem

When generating a large HTML artifact, write_file can fail because the model output is truncated or tool arguments become incomplete. The failure path can echo large attempted file contents back into the conversation state, causing subsequent model calls to carry much larger context.

Evidence

Source: gateway log, token usage middleware.

LLM token usage: input=29324 output=8192 total=37516
LLM token usage: input=46564 output=8192 total=54756
LLM token usage: input=63274 output=3903 total=67177
LLM token usage: input=71117 output=2682 total=73799

Source: checkpoint/state inspection of write_file tool messages.

write_file error payload: ~23.7K chars
write_file error payload: ~24.1K chars
write_file error payload: ~10.6K chars

Another observed shape:

Source: checkpoint/state inspection of AI message usage + following write_file tool result.

write_file output=8192 finish_reason=length
write_file missing required path
tool error echoed ~23K chars of attempted HTML content

Suspected mechanism

  1. The model tries to generate a large HTML report as one write_file call.
  2. Output hits a limit or tool args become incomplete.
  3. write_file fails.
  4. The tool error includes a large portion of the attempted content.
  5. That large error becomes part of conversation state.
  6. The next LLM call has a much larger input context.
  7. The agent retries with another writing strategy.

Impact

  • Token usage can grow from a normal large task into a million-token class run.
  • Runtime cost becomes hard for users to predict.
  • Persistence/checkpoint writes also increase.
  • The final artifact may eventually succeed, but after expensive retries.

Expected behavior

  • Tool errors should not echo large content arguments back into model context.
  • Large artifact generation should use a bounded, reliable writing strategy.
  • If an artifact cannot be written, the error returned to the model should be concise and structured.

Contributor guide