Hook adoption (18%) + document-quality concerns + scope-limit-to-coding proposal from a document-production workload · rtk-ai/rtk#1698

(1 comment) (0 reactions) (0 assignees)Rust (2.914 forks)batch import

area:configarea:docsenhancementhelp wantedpriority:medium

Métricas do repositório

Stars: (48.085 stars)
Métricas de merge de PR: (Mesclagem média 11d 1h) (45 fundiu PRs em 30d)

Description

Context

rtk discover over the last 30 days reports only 18.0% adoption (3,636 of 20,236 Bash commands routed through RTK), with ~857K tokens of recoverable savings sitting unrewritten. After looking at the unhandled-command list, I think there are actually two distinct issues bundled here — and the second one is the one I'd most like maintainer guidance on.

Issue 1 — hook-rewriting blind spots

The "missed savings" table is dominated by commands RTK already handles:

Command	Count	RTK Equivalent	Est. Savings
`grep -n`	3,241	`rtk grep`	~516K tokens
`ls "/abs/path"`	2,225	`rtk ls`	~160K tokens
`cat "/abs/path"`	248	`rtk read`	~89K tokens
`find "/abs/path"`	547	`rtk find`	~62K tokens
`git log`	244	`rtk git`	~14K tokens
`gh pr`	102	`rtk gh`	~10K tokens
`wc -l`	153	`rtk wc`	~3.5K tokens
Total	6,765	—	~857K tokens

Two patterns dominate the unrewritten commands and I suspect they're hook regex blind spots:

Quoted absolute paths: ls "/Users/imcapple/o...", cat "/Users/imcapple/...", find "/Users/imcapple/...". The hook's command-rewrite regex may not be matching when the first arg is a quoted absolute path.
Heredocs and compound forms: python3 << 'EOF', \\npython3, echo "=== — commands wrapped in shell constructs (heredocs, line continuations, here-strings) the hook's matcher commonly misses.

A diagnostic the maintainers can run:

echo '{"tool_name":"Bash","tool_input":{"command":"ls \"/Users/imcapple/test\""}}' | rtk hook claude

If the output isn't rewritten to rtk ls ..., that's the bug.

Issue 2 — top unhandled commands and the workload-shape question

python3                   4614    python3 ~/.claude/skills/.../course_tools.py ...
python3 <<                 226    python3 << 'EOF' ...
stat                        45    stat -f '%m %N' ...
git checkout                37    git checkout -b ...
unzip                       19    unzip -o -q "/Users/imcapple/obsidian/..."
pandoc                      16    pandoc -t markdown "M3_L3.1_..." 
gh search                   11    gh search issues ...
git tag                     10    git tag --list --sort=-v:refname

Most of these (python3 script invocations against the Obsidian vault, pandoc conversions on book chapters, unzip of vault archives, heredocs operating on course content) are document production, not code work. That makes me wonder whether the 18% adoption rate isn't just a hook-coverage gap — it's also reflecting the nature of the work.

Issue 3 (the big one) — document-quality concern from compression-by-default

I have concerns about the quality of the documents being emitted under RTK's compression. For pure git operations, build/test runs, and short-output commands, RTK's filtering is genuinely lossless-for-purpose. For prose / lesson content / quiz items / manuscript text, it's a riskier optimization with real failure modes:

rtk read strips content. The agent treats the filtered view as the source of truth; an author would treat dropped footnotes / callouts / exact phrasing / transition paragraphs as load-bearing.
rtk grep groups and truncates (default ~200 results / ~25 per file). A consistency check across a 10-volume series may silently miss material the agent should have seen — and then write something that contradicts what it didn't see.
rtk ls collapses listings. Fine for code repos. For a vault where filenames carry content (lesson IDs, version markers, dates), losing detail in the listing means the agent may misidentify the next file to edit.
The compression is invisible to the agent. It doesn't know it received a filtered view, so it has no instinct to re-fetch unfiltered. There's no signal that filtering happened.

This isn't speculation about my workload — I run a 10-volume crypto book series + paid courses, and most agent cycles are spent reading, editing, and emitting prose. The 18% adoption number may be a feature, not a bug, given the workload mix.

Proposed direction: scope-limit RTK rather than compound the rewriting

Rather than push for the hook to rewrite more commands (which would compound the document-quality risk), the right move may be to limit the RTK whitelist to GitHub and coding-related tasks and exclude document generation/review entirely. Concretely:

Keep RTK on: git, gh, build/test commands (npm, cargo, pytest), short-output structural commands like wc, du.
Exclude RTK from: cat/rtk read, head/tail, grep against vault paths, ls against vault paths, find against vault paths, python3 invocations whose first argument is a script under a vault directory or whose argv contains a vault path.

Possible implementations to consider:

Path-based exclusion in the hook config — e.g. [hooks] exclude_paths = ["/Users/imcapple/obsidian/**", "/Users/imcapple/.claude/skills/**"] that bypasses rewriting for any command whose argv contains a matching path. This is the cleanest fit for my workload.
Per-tool opt-out via [hooks] exclude_commands = ["cat", "ls", "grep", "find"] — if this already exists in 0.38+ / 0.39 RC, document it more prominently.
Scope-aware adoption profile — a --scope=code-only or --scope=docs-permitted flag for rtk init that picks defaults appropriate to the user's workload mix (similar to how some linters ship workload presets).
Per-command-class filter aggression — rtk read against ~/.claude/skills/** could pass through unfiltered while rtk read against /usr/lib/** keeps current aggression.

What I'd find useful from maintainers

Confirm or deny the quoted-absolute-path blind spot in the hook regex (testable via the diagnostic above).
Document the existing exclude_commands / path-filtering options if they're already implemented in 0.38+ / 0.39 RC. The conversation in another session referenced [hooks] exclude_commands but I haven't found it in the published docs.
Consider whether the python3-heredoc family of commands (4,614 + 226 invocations in my data) warrants a dedicated wrapper, a configurable opt-in, or simply explicit guidance that this class of command is out-of-scope.
Guidance on the document-production workload trade-off: is the right answer "scope down RTK", "tune filters per-file-type", or "different filter aggression by command class"?

Filed as a single issue rather than three

Filing this as one because the three threads are interconnected: the missed savings and the unhandled commands are both about adoption, but the document-quality concern is about whether more adoption is even desirable for my workload. The right answer may be "yes, fix the regex so the hook catches more — AND add a path-based opt-out so vault paths bypass filtering entirely." Splitting into separate issues would lose that connection.

Happy to split if maintainers prefer.

Reproduction context: discover output captured 2026-05-04 from a heavy document-production workload (Obsidian vault, course-builder skill, multi-volume book series). Latest rtk version known to be in flight at time of filing: 0.38.0 (released April 29) with 0.39 RCs in development.

Guia do colaborador

Direção de pesquisa: Reproduza o ponto cego da regex do hook usando o comando de diagnóstico fornecido. Em seguida, explore o código de configuração do hook para entender como implementar exclude paths ou exclude commands. Revise a documentação existente para opções de filtragem do hook.
Pilha de tecnologia: rust
Domain: clideveloper experience
Tipo Issue: Pesquisa
Difficulty: 3
Tempo estimado: 1-2 dias
Status da atividade: Ativo
Clarity: Claro
Prerequisites: RustGit
Simpatia para novatos: 70