TOML delegator filters (just/mise/task) truncate wrapped tool output, hiding test failures · rtk-ai/rtk#1065

(2 comments) (0 reactions) (0 assignees)Rust (2,914 forks)batch import

area:clibugeffort-largefilter-qualityhelp wantedpriority:high

Repository metrics

Stars: (48,085 stars)
PR merge metrics: (Avg merge 11d 1h) (45 merged PRs in 30d)

Description

Summary

Built-in TOML filters for task-runner delegators — just, mise, task, and the in-flight poe (#1062) — apply max_lines = 50 to the wrapped tool's raw output. For tasks that wrap pytest / cargo test / eslint / pyright / etc., this routinely truncates the summary line and failure tracebacks — exactly the part the LLM needs to act on.

In some cases the filter also performs zero useful stripping on the wrapped output and only contributes the 50-line cap, making it a pure regression vs. running with RTK_NO_TOML=1.

Concrete repro: `rtk just test` wrapping pytest

Given a Justfile like:

test:
    pytest -n auto -m 'not integration and not flaky'

Real pytest output for a project with a few hundred tests and one failure:

============================= test session starts ==============================
platform darwin -- Python 3.13.x, pytest-8.x.x, pluggy-1.x.x
rootdir: /Users/me/proj
configfile: pyproject.toml
plugins: xdist-3.x, asyncio-0.21, mock-3.14, ...
8 workers [342 items]
........................................                                 [ 11%]
........................................                                 [ 23%]
........................................                                 [ 35%]
... (35+ more lines of progress dots) ...
=================================== FAILURES ===================================
___________________________ test_pacing_calculation ____________________________
[20-line traceback]
========================== short test summary info =============================
FAILED tests/budget_pacer/test_pacer.py::test_pacing_calculation - AssertionError
======================= 1 failed, 341 passed in 14.52s =========================

After rtk just test is processed by src/filters/just.toml:

strip_ansi ✓
strip_lines_matching — matches nothing in pytest output (the patterns only match just --list's own help banner)
truncate_lines_at = 150 — harmless
max_lines = 50 — keeps the header (~6 lines) + ~44 lines of progress dots, then cuts off
The LLM never sees FAILURES, the traceback, the FAILED line, or the 1 failed, 341 passed summary

The result is worse than no filter at all: pytest looks like it hung mid-run instead of failing. The agent has no signal to act on.

Why this happens

When rtk just test falls into run_fallback (src/main.rs:1054), the first TOML filter whose match_command regex hits is selected — ^just\b. The TOML pipeline (src/core/toml_filter.rs) then applies generic line filtering with no awareness that just test actually executed pytest underneath. There is no routing from a delegator's output back to the wrapped tool's dedicated filter.

Affected filters

File	`max_lines`	Stripping useful for wrapped output?
`src/filters/just.toml`	50	No — patterns only match `just --list`
`src/filters/mise.toml`	50	Partial — strips `mise install` noise, useless for `mise run <task>`
`src/filters/task.toml`	50	Partial — strips `task: [name] cmd` headers
`src/filters/poe.toml` (#1062)	50	Partial — strips `Poe =>` headers

All four follow the same delegator pattern and share the same architectural problem. poe is on track to inherit it via #1062.

Proposed direction (sketch, not prescriptive)

A delegator filter type that, after stripping the wrapper's own preamble, routes the remaining stdout through whatever filter would have applied to the wrapped command. Concretely for rtk just test running pytest:

TOML filter matches ^just\b
Strip just-specific preamble (currently a no-op)
Detect or declare the wrapped tool (pytest)
Re-apply RTK's pytest filter (Rust module, src/cmds/python/) to the remaining output
Return final filtered output

Recommended primary approach: parse the project file

The cleanest path is parsing the delegator's own project file to resolve task → wrapped command before the task runs. Each delegator has exactly one well-known config file format:

Delegator	Project file	Where the wrapped command lives
`just`	`Justfile` (or `justfile`, `.justfile`)	recipe body lines
`mise`	`.mise.toml` / `mise.toml` / `.config/mise.toml`	`[tasks.<name>] run = "..."`
`task`	`Taskfile.yml` / `Taskfile.yaml`	`tasks.<name>.cmds`
`poe` (#1062)	`pyproject.toml`	`[tool.poe.tasks] <name> = "..."` or `{cmd = "..."}`

Why this is the right primary path:

Unambiguous and authoritative. The project file is the source of truth for what a task does. No guessing, no parsing tool stdout, no relying on the wrapper to echo the command.
Resolved before execution. RTK can decide which downstream filter to apply before spawning the child, which means it can pick the right Stdio strategy (streamed vs. buffered) based on the wrapped tool. This also fixes the streaming-server problem (uvicorn --reload etc.) for free — if the resolved command is a streaming server, skip TOML buffering.
No new TOML schema. No need to add wraps = ... to every filter or every task. Filters stay declarative and small.
Resilient to missing project files. If parsing fails or the file isn't found, fall back to the current line-stripping behavior — strictly no regression.
Works for chained commands. just lint-fix → ruff check --fix && ruff format resolves to two commands; RTK picks the dominant filter (or applies sequentially) instead of giving up.

Sketch of the flow for rtk just test:

1. run_fallback sees `just test`
2. Match TOML filter `^just\b` (existing behavior)
3. NEW: parse ./Justfile, find recipe `test`, extract command line `pytest -n auto -m '...'`
4. NEW: classify the resolved command via the existing `find_matching_filter` / Clap dispatch
5. NEW: if a Rust filter matches (`pytest` → src/cmds/python/), spawn `just test` and pipe output through that filter instead of the generic `just` TOML pipeline
6. NEW: if no specific filter matches, fall back to the current `just` TOML pipeline (today's behavior)

The new logic is additive: existing filter behavior is the fallback, so it's a strict improvement.

Alternative detection paths (for reference, not recommended as primary)

Parse the delegator's own stdout. Some delegators echo the command they're about to run (task: [build] go build ./..., Poe => pytest ...). Works after the fact but can't inform Stdio choices, and not all delegators echo (just doesn't by default).
Declarative TOML config. Add wraps = "pytest" per filter. Requires authors to manually map every task — doesn't scale across projects with custom recipes.
Output-format heuristics. Sniff for ===== test session starts ===== etc. Fragile and order-dependent.

These can supplement the primary path (e.g. as fallbacks for unparseable project files), but should not be the main mechanism.

Interim workarounds (please document if delegator routing is out of scope)

Don't go through the delegator. Call the wrapped tool directly: rtk pytest -n auto -m '...' instead of rtk just test. This is the right answer today but isn't documented anywhere — agents will reach for the task-runner alias.
Project-local override in .rtk/filters.toml with much higher max_lines (e.g. 500). Trades meaningful compression for not-actively-harmful, requires per-user rtk trust.
Bypass. RTK_NO_TOML=1 just test or rtk proxy just test.

Why this matters

The whole point of RTK is to give agents useful compressed output. Truncating before the FAILED summary line silently inverts that goal — the agent sees a passing-looking truncated run and moves on, when in reality the build is broken. This is the worst failure mode for an LLM proxy.

Happy to take a stab at the fix if there's agreement on direction (declarative wraps = ... config seems lowest-risk to me). Flagging the architectural question first since it affects four filters and would shape how future delegator filters are written.

cc related: #1062 (poe filter) — would inherit the same fix for free.

Contributor guide

Research direction: Trace the filter selection and output processing pipeline in src/filters/ and src/core/toml filter.rs. Implement a mechanism to parse delegator project files (Justfile, etc.) to identify the wrapped tool and apply its dedicated filter.
Tech stack: rust
Domain: clitoolingdevops
Issue type: Bug
Difficulty: 3
Estimated time: 1-2 days
Activity status: Active
Clarity: Clear
Prerequisites: RustGitCLI development
Newbie friendliness: 30