Hook adoption (18%) + document-quality concerns + scope-limit-to-coding proposal from a document-production workload · rtk-ai/rtk#1698

(1 commento) (0 reazioni) (0 assegnatari)Rust (2914 fork)batch import

area:configarea:docsenhancementhelp wantedpriority:medium

Metriche repository

Star: (48.085 star)
Metriche merge PR: (Merge medio 11g 1h) (45 PR mergiate in 30 g)

Descrizione

Context

rtk discover over the last 30 days reports only 18.0% adoption (3,636 of 20,236 Bash commands routed through RTK), with ~857K tokens of recoverable savings sitting unrewritten. After looking at the unhandled-command list, I think there are actually two distinct issues bundled here — and the second one is the one I'd most like maintainer guidance on.

Issue 1 — hook-rewriting blind spots

The "missed savings" table is dominated by commands RTK already handles:

Command	Count	RTK Equivalent	Est. Savings
`grep -n`	3,241	`rtk grep`	~516K tokens
`ls "/abs/path"`	2,225	`rtk ls`	~160K tokens
`cat "/abs/path"`	248	`rtk read`	~89K tokens
`find "/abs/path"`	547	`rtk find`	~62K tokens
`git log`	244	`rtk git`	~14K tokens
`gh pr`	102	`rtk gh`	~10K tokens
`wc -l`	153	`rtk wc`	~3.5K tokens
Total	6,765	—	~857K tokens

Two patterns dominate the unrewritten commands and I suspect they're hook regex blind spots:

Quoted absolute paths: ls "/Users/imcapple/o...", cat "/Users/imcapple/...", find "/Users/imcapple/...". The hook's command-rewrite regex may not be matching when the first arg is a quoted absolute path.
Heredocs and compound forms: python3 << 'EOF', \\npython3, echo "=== — commands wrapped in shell constructs (heredocs, line continuations, here-strings) the hook's matcher commonly misses.

A diagnostic the maintainers can run:

echo '{"tool_name":"Bash","tool_input":{"command":"ls \"/Users/imcapple/test\""}}' | rtk hook claude

If the output isn't rewritten to rtk ls ..., that's the bug.

Issue 2 — top unhandled commands and the workload-shape question

python3                   4614    python3 ~/.claude/skills/.../course_tools.py ...
python3 <<                 226    python3 << 'EOF' ...
stat                        45    stat -f '%m %N' ...
git checkout                37    git checkout -b ...
unzip                       19    unzip -o -q "/Users/imcapple/obsidian/..."
pandoc                      16    pandoc -t markdown "M3_L3.1_..." 
gh search                   11    gh search issues ...
git tag                     10    git tag --list --sort=-v:refname

Most of these (python3 script invocations against the Obsidian vault, pandoc conversions on book chapters, unzip of vault archives, heredocs operating on course content) are document production, not code work. That makes me wonder whether the 18% adoption rate isn't just a hook-coverage gap — it's also reflecting the nature of the work.

Issue 3 (the big one) — document-quality concern from compression-by-default

I have concerns about the quality of the documents being emitted under RTK's compression. For pure git operations, build/test runs, and short-output commands, RTK's filtering is genuinely lossless-for-purpose. For prose / lesson content / quiz items / manuscript text, it's a riskier optimization with real failure modes:

rtk read strips content. The agent treats the filtered view as the source of truth; an author would treat dropped footnotes / callouts / exact phrasing / transition paragraphs as load-bearing.
rtk grep groups and truncates (default ~200 results / ~25 per file). A consistency check across a 10-volume series may silently miss material the agent should have seen — and then write something that contradicts what it didn't see.
rtk ls collapses listings. Fine for code repos. For a vault where filenames carry content (lesson IDs, version markers, dates), losing detail in the listing means the agent may misidentify the next file to edit.
The compression is invisible to the agent. It doesn't know it received a filtered view, so it has no instinct to re-fetch unfiltered. There's no signal that filtering happened.

This isn't speculation about my workload — I run a 10-volume crypto book series + paid courses, and most agent cycles are spent reading, editing, and emitting prose. The 18% adoption number may be a feature, not a bug, given the workload mix.

Proposed direction: scope-limit RTK rather than compound the rewriting

Rather than push for the hook to rewrite more commands (which would compound the document-quality risk), the right move may be to limit the RTK whitelist to GitHub and coding-related tasks and exclude document generation/review entirely. Concretely:

Keep RTK on: git, gh, build/test commands (npm, cargo, pytest), short-output structural commands like wc, du.
Exclude RTK from: cat/rtk read, head/tail, grep against vault paths, ls against vault paths, find against vault paths, python3 invocations whose first argument is a script under a vault directory or whose argv contains a vault path.

Possible implementations to consider:

Path-based exclusion in the hook config — e.g. [hooks] exclude_paths = ["/Users/imcapple/obsidian/**", "/Users/imcapple/.claude/skills/**"] that bypasses rewriting for any command whose argv contains a matching path. This is the cleanest fit for my workload.
Per-tool opt-out via [hooks] exclude_commands = ["cat", "ls", "grep", "find"] — if this already exists in 0.38+ / 0.39 RC, document it more prominently.
Scope-aware adoption profile — a --scope=code-only or --scope=docs-permitted flag for rtk init that picks defaults appropriate to the user's workload mix (similar to how some linters ship workload presets).
Per-command-class filter aggression — rtk read against ~/.claude/skills/** could pass through unfiltered while rtk read against /usr/lib/** keeps current aggression.

What I'd find useful from maintainers

Confirm or deny the quoted-absolute-path blind spot in the hook regex (testable via the diagnostic above).
Document the existing exclude_commands / path-filtering options if they're already implemented in 0.38+ / 0.39 RC. The conversation in another session referenced [hooks] exclude_commands but I haven't found it in the published docs.
Consider whether the python3-heredoc family of commands (4,614 + 226 invocations in my data) warrants a dedicated wrapper, a configurable opt-in, or simply explicit guidance that this class of command is out-of-scope.
Guidance on the document-production workload trade-off: is the right answer "scope down RTK", "tune filters per-file-type", or "different filter aggression by command class"?

Filed as a single issue rather than three

Filing this as one because the three threads are interconnected: the missed savings and the unhandled commands are both about adoption, but the document-quality concern is about whether more adoption is even desirable for my workload. The right answer may be "yes, fix the regex so the hook catches more — AND add a path-based opt-out so vault paths bypass filtering entirely." Splitting into separate issues would lose that connection.

Happy to split if maintainers prefer.

Reproduction context: discover output captured 2026-05-04 from a heavy document-production workload (Obsidian vault, course-builder skill, multi-volume book series). Latest rtk version known to be in flight at time of filing: 0.38.0 (released April 29) with 0.39 RCs in development.

Guida contributor

Direzione di ricerca: Riprodurre il punto cieco dell'espressione regolare dell'hook usando il comando diagnostico fornito. Quindi esplorare il codice di configurazione dell'hook per capire come implementare exclude paths o exclude commands. Esaminare la documentazione esistente per le opzioni di filtraggio dell'hook.
Tech stack: rust
Dominio: clideveloper experience
Tipo issue: Ricerca
Difficoltà: 3
Tempo stimato: 1-2 giorni
Stato attività: Attiva
Chiarezza: Chiara
Prerequisiti: RustGit
Adatta ai principianti: 70