NF01: rtk-rewrite.sh hook adds ~56ms overhead on Apple Silicon (vs documented <5ms)
#1 375 ouverte le 18 avr. 2026
Métriques du dépôt
- Stars
- (48 085 stars)
- Métriques de merge PR
- (Merge moyen 11j 1h) (45 PRs mergées en 30 j)
Description
Summary
On Apple Silicon (M-series), rtk-rewrite.sh hook invocations measure ~56 ms average / ~84 ms p95 per call in production use, versus the documented "<5 ms overhead" target. A 100-invocation benchmark collected during a downstream project's enforcement phase identifies the root cause as a subprocess fork of rtk for command-prefix classification inside the hook.
The latency is imperceptible in interactive use (dwarfed by typical bash wall-time) but violates the documented NF01 budget and adds up across high-frequency sessions (~5,000 hook firings / day → ~4.7 min/day of classification overhead).
Environment
- rtk version: 0.35.0
- Platform: Apple Silicon (M-series), macOS 14+
- Invocation:
PreToolUse:Bashhook (Claude Code harness) piping stdin viartk-rewrite.sh - Shell: zsh / bash — fresh subshell per invocation
Measurement
100-invocation benchmark:
avg: 56.14 ms
p50: 54.9 ms
p95: ~84 ms (estimated)
p99: ~120 ms
Harness forks a fresh bash per invocation. In-session runtime (no bash re-fork) is ~30–40 ms lower because shell startup amortizes, but the rtk-side cost remains ~30–40 ms per call.
Suspected root cause
rtk-rewrite.sh forks rtk as a subprocess to classify the incoming command prefix. On Apple Silicon, cold-start of the rtk binary accounts for the bulk of the 56 ms — process setup + dylib resolution + arg parse before any classification logic runs.
Proposed upstream mitigations (any one is sufficient)
- In-shell prefix parser — ship a small bash/sh function covering the top ~10 prefixes (git, grep, find, ls, cat, head, tail, wc, curl, docker) that short-circuits without forking
rtk. Falls through tortksubprocess for the long tail. - Persistent classifier daemon — optional
rtk-daemonbinding to a unix socket;rtk-rewrite.shusesnc -Ufor classification, falls back to subprocess fork if daemon is absent. Preserves zero-install default. - Split classifier binary — ship
rtk-classifyas a focused, smaller binary without the full CLI surface.
Why this matters at scale
At ~5,000 hook firings / day, 56 ms/call × 5,000 = ~4.7 minutes/day of latency accrued purely from classification subprocess forks. Perceptible only on tight loops (shell-pipe-heavy scripts that trigger hooks repeatedly), but measurable.
Out of scope for this issue
- Hook correctness and rewrite accuracy — functioning correctly, no regression.
rtkstandalone invocation latency (rtk git status, etc.) — not measured here, may share root cause.
Downstream disposition
We've accepted the deviation internally by revising our target from p95 < 25 ms → p95 < 100 ms, and deferred a local prefix-parser optimization pending upstream response — we don't want to ship a divergent rtk-rewrite.sh that would later conflict with the upstream fix.
Happy to contribute a PR on any of the three mitigation paths if a maintainer signals preference.