bytedance/deer-flow
View on GitHub[Stability][BUG-005] Active run token and raw stream observability is insufficient
Closed
#3116 opened on May 21, 2026
help wanted
Description
Parent stability dashboard: #3107
This issue tracks BUG-005 from #3107.
Problem
During a long run, checkpoint message metadata can already contain substantial token usage while the run row/API still shows zero totals.
Evidence
Source: database inspection while the run was still active.
runs.status=running
runs.total_tokens=0
runs.llm_call_count=0
runs.message_count=0
At the same time, checkpoint state already summed to hundreds of thousands of tokens.
Raw stream was also not durably available:
Source: gateway log, run worker stream-mode setup.
'events' stream_mode not supported in gateway (requires astream_events + checkpoint callbacks). Skipping.
Actual stream modes:
['messages', 'custom', 'updates', 'values']
run_events was memory-backed, and no durable run event rows were available after the fact.
Impact
- Operators cannot monitor runaway token cost from normal run records while the run is active.
- After-the-fact debugging depends on checkpoints/logs rather than a durable raw event stream.
- It is difficult to tell whether a long-running task is healthy, stuck, or burning budget.
Expected behavior
- Active runs should expose current token/LLM-call/message counters.
- Raw stream or equivalent trace should be optionally persisted for debugging long tasks.
- Cost visibility should not require manually inspecting checkpoint internals.