Repository metrics

Stars: (48,085 stars)
PR merge metrics: (平均マージ 11d 1h) (30d で 45 merged PRs)

説明

Empirical benchmark: 5 repos, 2,100 measurements — how do actual savings compare to claims?

Related: #590, #538, #545, #827

Context

RTK is a genuinely useful tool — ls, docker ps, and verbose git output compress well, and the Rust implementation is fast. Impressive work for a small team shipping at this pace.

This issue isn't about whether RTK has value (it does), but about closing the gap between what the README promises and what users actually experience. We ran a benchmark to quantify that gap — and we'd like to help close it.

Setup

RTK v0.33.1 tested across 5 repos (83–74K files, Shell/Bash/Python), 9 categories, 10 iterations, 3 independent runs (2,100 total measurements). Results are deterministic: 699/700 ops showed <10-byte variance per run, and cross-run results are byte-identical when repo state is unchanged.

Results

Category	N	Actual	Claimed	Notes
`ls`	100	72%	80%	Matches well, especially `ls -laR` on larger repos
`git-log`	150	98%	80%	Truncation rather than compression — 544K→1.8K
`git-status`	100	46%	80%	Verbose format compresses; `-s`/`--porcelain` pass through
`git-diff`	100	20%	75%	Full diff compresses somewhat; `--stat` passes through
`docker`	20	38%	80%	`docker ps` = 75%; `docker ps -a` = 0%
`tree`	50	4%	80%	Minimal compression
`cat`	50	0%	70%	Passes through unchanged
`grep`	100	-0%	80%	Slightly larger output (RTK overhead)
`ruff`	30	-1%	80%	Slightly larger output

Also tested separately: pytest --co -q = 0% (185,962B unchanged, claimed -90%).

The gap in the aggregate claim

The README's savings table includes several categories that show no compression in practice. These account for over half the claimed total:

Category	Claimed saved/session	Measured
cat/read (20x)	28,000	0%
grep/rg (8x)	12,800	-0%
pytest (4x)	7,200	0%
ruff (3x)	2,400	slightly negative
Subtotal	50,400 (54% of total claim)	~0

The overall 72% grand total in our benchmark is real, but it's driven almost entirely by git log truncation and ls -laR on a large repo — not broad compression across all categories.

Environment & Repos

RTK 0.33.1, Linux 6.8.0-1044-azure x86_64, Bash 5.2.37
Direct CLI (command vs rtk command) — no CC hook involved

Repo	Files	Size	Language
dotfiles	83	568K	Shell
gha-github-mirror-action	194	1.4M	Bash/YAML
ralph-template	1,732	12M	Bash/Python
so101-biolab-automation	8,371	365M	Python
Agents-eval	74,785	8.2G	Python

Per-repo detail tables and raw output files available on request. The benchmark script (bench.sh) can be shared for independent reproduction.

How we can help

We'd be happy to contribute to closing the gap between claims and reality — either by improving the docs or the tool itself:

README accuracy: We can PR an updated savings table with measured ranges per command, replacing the single-point estimates. Honest numbers build more trust than optimistic ones.
Benchmark script as CI: Our bench.sh could be adapted into a regression test that runs on tagged releases, keeping the README table in sync with actual performance.
Compression for passthrough commands: cat, grep, and tree currently pass through unchanged. If there's interest, we can explore compression strategies for these (e.g., grep result deduplication, tree structure compaction, cat truncation with line-count summaries).
ruff/pytest rewrite rules: These currently add overhead or pass through. Happy to help design rewrite rules — we have deep experience with Python tooling output formats.

RTK does real work for the commands it supports. The headline just outpaced the feature coverage — which is understandable at this growth pace. We'd rather help fix it than just point it out.

コントリビューターガイド

調査方針: テストを再現してベンチマーク結果を検証します。次に、コマンドごとに測定範囲でREADMEの節約表を更新し、CIとしてベンチマークスクリプトを提供することを検討してください。
技術スタック: rust
領域: clitooling
Issue 種別: 調査
難度: 3
推定時間: 1-2日
活動状況: アクティブ
明確さ: 明確
前提条件: RustGitBenchmarking
初心者向け度: 50