Metriche repository

Star: (48.085 star)
Metriche merge PR: (Merge medio 11g 1h) (45 PR mergiate in 30 g)

Descrizione

Empirical benchmark: 5 repos, 2,100 measurements — how do actual savings compare to claims?

Related: #590, #538, #545, #827

Context

RTK is a genuinely useful tool — ls, docker ps, and verbose git output compress well, and the Rust implementation is fast. Impressive work for a small team shipping at this pace.

This issue isn't about whether RTK has value (it does), but about closing the gap between what the README promises and what users actually experience. We ran a benchmark to quantify that gap — and we'd like to help close it.

Setup

RTK v0.33.1 tested across 5 repos (83–74K files, Shell/Bash/Python), 9 categories, 10 iterations, 3 independent runs (2,100 total measurements). Results are deterministic: 699/700 ops showed <10-byte variance per run, and cross-run results are byte-identical when repo state is unchanged.

Results

Category	N	Actual	Claimed	Notes
`ls`	100	72%	80%	Matches well, especially `ls -laR` on larger repos
`git-log`	150	98%	80%	Truncation rather than compression — 544K→1.8K
`git-status`	100	46%	80%	Verbose format compresses; `-s`/`--porcelain` pass through
`git-diff`	100	20%	75%	Full diff compresses somewhat; `--stat` passes through
`docker`	20	38%	80%	`docker ps` = 75%; `docker ps -a` = 0%
`tree`	50	4%	80%	Minimal compression
`cat`	50	0%	70%	Passes through unchanged
`grep`	100	-0%	80%	Slightly larger output (RTK overhead)
`ruff`	30	-1%	80%	Slightly larger output

Also tested separately: pytest --co -q = 0% (185,962B unchanged, claimed -90%).

The gap in the aggregate claim

The README's savings table includes several categories that show no compression in practice. These account for over half the claimed total:

Category	Claimed saved/session	Measured
cat/read (20x)	28,000	0%
grep/rg (8x)	12,800	-0%
pytest (4x)	7,200	0%
ruff (3x)	2,400	slightly negative
Subtotal	50,400 (54% of total claim)	~0

The overall 72% grand total in our benchmark is real, but it's driven almost entirely by git log truncation and ls -laR on a large repo — not broad compression across all categories.

Environment & Repos

RTK 0.33.1, Linux 6.8.0-1044-azure x86_64, Bash 5.2.37
Direct CLI (command vs rtk command) — no CC hook involved

Repo	Files	Size	Language
dotfiles	83	568K	Shell
gha-github-mirror-action	194	1.4M	Bash/YAML
ralph-template	1,732	12M	Bash/Python
so101-biolab-automation	8,371	365M	Python
Agents-eval	74,785	8.2G	Python

Per-repo detail tables and raw output files available on request. The benchmark script (bench.sh) can be shared for independent reproduction.

How we can help

We'd be happy to contribute to closing the gap between claims and reality — either by improving the docs or the tool itself:

README accuracy: We can PR an updated savings table with measured ranges per command, replacing the single-point estimates. Honest numbers build more trust than optimistic ones.
Benchmark script as CI: Our bench.sh could be adapted into a regression test that runs on tagged releases, keeping the README table in sync with actual performance.
Compression for passthrough commands: cat, grep, and tree currently pass through unchanged. If there's interest, we can explore compression strategies for these (e.g., grep result deduplication, tree structure compaction, cat truncation with line-count summaries).
ruff/pytest rewrite rules: These currently add overhead or pass through. Happy to help design rewrite rules — we have deep experience with Python tooling output formats.

RTK does real work for the commands it supports. The headline just outpaced the feature coverage — which is understandable at this growth pace. We'd rather help fix it than just point it out.

Guida contributor

Direzione di ricerca: Verifica i risultati del benchmark riproducendo i test. Quindi aggiorna la tabella dei risparmi nel README con gli intervalli misurati per ogni comando e considera di contribuire con uno script di benchmark come CI.
Tech stack: rust
Dominio: clitooling
Tipo issue: Ricerca
Difficoltà: 3
Tempo stimato: 1-2 giorni
Stato attività: Attiva
Chiarezza: Chiara
Prerequisiti: RustGitBenchmarking
Adatta ai principianti: 50