rtk-ai/rtk

NF01: rtk-rewrite.sh hook adds ~56ms overhead on Apple Silicon (vs documented <5ms)

Open

#1 375 ouverte le 18 avr. 2026

Voir sur GitHub
 (2 commentaires) (0 réactions) (0 assignés)Rust (2 914 forks)batch import
area:clibugeffort-mediumenhancementhelp wantedplatform:macospriority:medium

Métriques du dépôt

Stars
 (48 085 stars)
Métriques de merge PR
 (Merge moyen 11j 1h) (45 PRs mergées en 30 j)

Description

Summary

On Apple Silicon (M-series), rtk-rewrite.sh hook invocations measure ~56 ms average / ~84 ms p95 per call in production use, versus the documented "<5 ms overhead" target. A 100-invocation benchmark collected during a downstream project's enforcement phase identifies the root cause as a subprocess fork of rtk for command-prefix classification inside the hook.

The latency is imperceptible in interactive use (dwarfed by typical bash wall-time) but violates the documented NF01 budget and adds up across high-frequency sessions (~5,000 hook firings / day → ~4.7 min/day of classification overhead).

Environment

  • rtk version: 0.35.0
  • Platform: Apple Silicon (M-series), macOS 14+
  • Invocation: PreToolUse:Bash hook (Claude Code harness) piping stdin via rtk-rewrite.sh
  • Shell: zsh / bash — fresh subshell per invocation

Measurement

100-invocation benchmark:

avg:  56.14 ms
p50:  54.9  ms
p95:  ~84   ms (estimated)
p99:  ~120  ms

Harness forks a fresh bash per invocation. In-session runtime (no bash re-fork) is ~30–40 ms lower because shell startup amortizes, but the rtk-side cost remains ~30–40 ms per call.

Suspected root cause

rtk-rewrite.sh forks rtk as a subprocess to classify the incoming command prefix. On Apple Silicon, cold-start of the rtk binary accounts for the bulk of the 56 ms — process setup + dylib resolution + arg parse before any classification logic runs.

Proposed upstream mitigations (any one is sufficient)

  1. In-shell prefix parser — ship a small bash/sh function covering the top ~10 prefixes (git, grep, find, ls, cat, head, tail, wc, curl, docker) that short-circuits without forking rtk. Falls through to rtk subprocess for the long tail.
  2. Persistent classifier daemon — optional rtk-daemon binding to a unix socket; rtk-rewrite.sh uses nc -U for classification, falls back to subprocess fork if daemon is absent. Preserves zero-install default.
  3. Split classifier binary — ship rtk-classify as a focused, smaller binary without the full CLI surface.

Why this matters at scale

At ~5,000 hook firings / day, 56 ms/call × 5,000 = ~4.7 minutes/day of latency accrued purely from classification subprocess forks. Perceptible only on tight loops (shell-pipe-heavy scripts that trigger hooks repeatedly), but measurable.

Out of scope for this issue

  • Hook correctness and rewrite accuracy — functioning correctly, no regression.
  • rtk standalone invocation latency (rtk git status, etc.) — not measured here, may share root cause.

Downstream disposition

We've accepted the deviation internally by revising our target from p95 < 25 ms → p95 < 100 ms, and deferred a local prefix-parser optimization pending upstream response — we don't want to ship a divergent rtk-rewrite.sh that would later conflict with the upstream fix.

Happy to contribute a PR on any of the three mitigation paths if a maintainer signals preference.

Guide contributeur