TOML delegator filters (just/mise/task) truncate wrapped tool output, hiding test failures
#1,065 opened on Apr 7, 2026
Repository metrics
- Stars
- (48,085 stars)
- PR merge metrics
- (Avg merge 11d 1h) (45 merged PRs in 30d)
Description
Summary
Built-in TOML filters for task-runner delegators — just, mise, task, and the in-flight poe (#1062) — apply max_lines = 50 to the wrapped tool's raw output. For tasks that wrap pytest / cargo test / eslint / pyright / etc., this routinely truncates the summary line and failure tracebacks — exactly the part the LLM needs to act on.
In some cases the filter also performs zero useful stripping on the wrapped output and only contributes the 50-line cap, making it a pure regression vs. running with RTK_NO_TOML=1.
Concrete repro: rtk just test wrapping pytest
Given a Justfile like:
test:
pytest -n auto -m 'not integration and not flaky'
Real pytest output for a project with a few hundred tests and one failure:
============================= test session starts ==============================
platform darwin -- Python 3.13.x, pytest-8.x.x, pluggy-1.x.x
rootdir: /Users/me/proj
configfile: pyproject.toml
plugins: xdist-3.x, asyncio-0.21, mock-3.14, ...
8 workers [342 items]
........................................ [ 11%]
........................................ [ 23%]
........................................ [ 35%]
... (35+ more lines of progress dots) ...
=================================== FAILURES ===================================
___________________________ test_pacing_calculation ____________________________
[20-line traceback]
========================== short test summary info =============================
FAILED tests/budget_pacer/test_pacer.py::test_pacing_calculation - AssertionError
======================= 1 failed, 341 passed in 14.52s =========================
After rtk just test is processed by src/filters/just.toml:
strip_ansi✓strip_lines_matching— matches nothing in pytest output (the patterns only matchjust --list's own help banner)truncate_lines_at = 150— harmlessmax_lines = 50— keeps the header (~6 lines) + ~44 lines of progress dots, then cuts off- The LLM never sees
FAILURES, the traceback, the FAILED line, or the1 failed, 341 passedsummary
The result is worse than no filter at all: pytest looks like it hung mid-run instead of failing. The agent has no signal to act on.
Why this happens
When rtk just test falls into run_fallback (src/main.rs:1054), the first TOML filter whose match_command regex hits is selected — ^just\b. The TOML pipeline (src/core/toml_filter.rs) then applies generic line filtering with no awareness that just test actually executed pytest underneath. There is no routing from a delegator's output back to the wrapped tool's dedicated filter.
Affected filters
| File | max_lines |
Stripping useful for wrapped output? |
|---|---|---|
src/filters/just.toml |
50 | No — patterns only match just --list |
src/filters/mise.toml |
50 | Partial — strips mise install noise, useless for mise run <task> |
src/filters/task.toml |
50 | Partial — strips task: [name] cmd headers |
src/filters/poe.toml (#1062) |
50 | Partial — strips Poe => headers |
All four follow the same delegator pattern and share the same architectural problem. poe is on track to inherit it via #1062.
Proposed direction (sketch, not prescriptive)
A delegator filter type that, after stripping the wrapper's own preamble, routes the remaining stdout through whatever filter would have applied to the wrapped command. Concretely for rtk just test running pytest:
- TOML filter matches
^just\b - Strip
just-specific preamble (currently a no-op) - Detect or declare the wrapped tool (
pytest) - Re-apply RTK's pytest filter (Rust module,
src/cmds/python/) to the remaining output - Return final filtered output
Recommended primary approach: parse the project file
The cleanest path is parsing the delegator's own project file to resolve task → wrapped command before the task runs. Each delegator has exactly one well-known config file format:
| Delegator | Project file | Where the wrapped command lives |
|---|---|---|
just |
Justfile (or justfile, .justfile) |
recipe body lines |
mise |
.mise.toml / mise.toml / .config/mise.toml |
[tasks.<name>] run = "..." |
task |
Taskfile.yml / Taskfile.yaml |
tasks.<name>.cmds |
poe (#1062) |
pyproject.toml |
[tool.poe.tasks] <name> = "..." or {cmd = "..."} |
Why this is the right primary path:
- Unambiguous and authoritative. The project file is the source of truth for what a task does. No guessing, no parsing tool stdout, no relying on the wrapper to echo the command.
- Resolved before execution. RTK can decide which downstream filter to apply before spawning the child, which means it can pick the right
Stdiostrategy (streamed vs. buffered) based on the wrapped tool. This also fixes the streaming-server problem (uvicorn --reloadetc.) for free — if the resolved command is a streaming server, skip TOML buffering. - No new TOML schema. No need to add
wraps = ...to every filter or every task. Filters stay declarative and small. - Resilient to missing project files. If parsing fails or the file isn't found, fall back to the current line-stripping behavior — strictly no regression.
- Works for chained commands.
just lint-fix→ruff check --fix && ruff formatresolves to two commands; RTK picks the dominant filter (or applies sequentially) instead of giving up.
Sketch of the flow for rtk just test:
1. run_fallback sees `just test`
2. Match TOML filter `^just\b` (existing behavior)
3. NEW: parse ./Justfile, find recipe `test`, extract command line `pytest -n auto -m '...'`
4. NEW: classify the resolved command via the existing `find_matching_filter` / Clap dispatch
5. NEW: if a Rust filter matches (`pytest` → src/cmds/python/), spawn `just test` and pipe output through that filter instead of the generic `just` TOML pipeline
6. NEW: if no specific filter matches, fall back to the current `just` TOML pipeline (today's behavior)
The new logic is additive: existing filter behavior is the fallback, so it's a strict improvement.
Alternative detection paths (for reference, not recommended as primary)
- Parse the delegator's own stdout. Some delegators echo the command they're about to run (
task: [build] go build ./...,Poe => pytest ...). Works after the fact but can't informStdiochoices, and not all delegators echo (justdoesn't by default). - Declarative TOML config. Add
wraps = "pytest"per filter. Requires authors to manually map every task — doesn't scale across projects with custom recipes. - Output-format heuristics. Sniff for
===== test session starts =====etc. Fragile and order-dependent.
These can supplement the primary path (e.g. as fallbacks for unparseable project files), but should not be the main mechanism.
Other open questions
- Where does routing live? A second pass through
find_matching_filterafter parsing the project file would suffice, but it breaks the current "one filter per command" mental model insrc/core/toml_filter.rs. Probably wants its own dispatch helper. - Caching. Project file parsing on every invocation adds startup cost. A cheap mtime-based cache (parse once, invalidate when the file changes) keeps RTK under the <10ms startup target.
- Trust boundary.
Justfile/Taskfile/.mise.toml/pyproject.tomlare checked into the repo and are not under RTK's existing.rtk/filters.tomltrust gate. Reading them to decide which RTK filter to apply is safe (no execution, no replace/match_output rules), but worth noting in SECURITY.md so it's intentional. - Interaction with the rewrite hook.
just/mise/task/poearen't insrc/discover/rules.rsRULES, so they're never auto-rewritten — only invoked when an agent explicitly typesrtk just test. The fix should not change rewrite behavior.
Interim workarounds (please document if delegator routing is out of scope)
- Don't go through the delegator. Call the wrapped tool directly:
rtk pytest -n auto -m '...'instead ofrtk just test. This is the right answer today but isn't documented anywhere — agents will reach for the task-runner alias. - Project-local override in
.rtk/filters.tomlwith much highermax_lines(e.g. 500). Trades meaningful compression for not-actively-harmful, requires per-userrtk trust. - Bypass.
RTK_NO_TOML=1 just testorrtk proxy just test.
Why this matters
The whole point of RTK is to give agents useful compressed output. Truncating before the FAILED summary line silently inverts that goal — the agent sees a passing-looking truncated run and moves on, when in reality the build is broken. This is the worst failure mode for an LLM proxy.
Happy to take a stab at the fix if there's agreement on direction (declarative wraps = ... config seems lowest-risk to me). Flagging the architectural question first since it affects four filters and would shape how future delegator filters are written.
cc related: #1062 (poe filter) — would inherit the same fix for free.