perf(tracking): enabled tracking adds ~7ms synchronous SQLite write to every command's hot path · rtk-ai/rtk#2208

(1 comment) (0 reactions) (0 assignees)Rust (2,914 forks)batch import

area:performanceenhancementhelp wantedpriority:high

Repository metrics

Stars: (48,085 stars)
PR merge metrics: (Avg merge 11d 1h) (45 merged PRs in 30d)

Description

Summary

With tracking enabled, every recorded command (read, ls, grep, git, …) spends ~7 ms in TimedExecution::track() on a synchronous SQLite write, while the actual work (read + filter + output) is <0.05 ms. The write dominates per-call latency on fsync-expensive filesystems.

Adjacent but distinct from existing work:

#1375 — hook-layer cold-start (~56 ms); explicitly scopes standalone-command latency out.
#1987 (open, not merged) and #1176 (open) — both touch this write path but only add ways to disable tracking (tracking.enabled gate / --no-track). Neither addresses why tracking is slow when enabled, which is what this issue is about.

Core claim — reproducible on stock rtk, no source changes

track() is called by read/ls/grep/… but not by --version. Diffing the two isolates the tracking cost from process spawn (stock 0.40.0, best-of-5 × 100):

rtk --version (no track): 0.83 ms
rtk read      (+track)  : 7.97 ms
net track cost          : 7.14 ms

PRAGMA sweep — requires the patch in the gist

Net track cost (read − --version) across journal/sync settings, via an RTK_JOURNAL/RTK_SYNC env switch added to Tracker::new():

journal_mode	synchronous	net track cost	DB-corruption risk
WAL	FULL (current)	7.26 ms	none
WAL	NORMAL	6.77 ms	none
DELETE	NORMAL	4.53 ms	none
TRUNCATE	NORMAL	4.48 ms	none
MEMORY	OFF	0.29 ms	corruption on crash/power-loss

In-process Instant probing puts ~all of track()'s time in the SQLite connection-open + INSERT-commit path; read/filter/output stay <0.05 ms every run.

Repro harness + patch: https://gist.github.com/coseto6125/19855fc80e113bd5772dbb2fef263077

Notes / caveats

Measured on WSL2, where fsync is unusually expensive. On native Linux / APFS the absolute numbers are likely lower — independent reproduction on a native FS would be valuable before treating 7 ms as universal.
Per-call cost is independent of DB size (22 MB vs a fresh 52 KB DB → same ~8 ms wall), so it's the per-call open→write→sync cycle, not query/scan cost.
The persisted data (token-savings stats) is non-critical and reconstructible — rtk reads none of it for correctness.

Why config/PRAGMA tuning isn't enough

synchronous=NORMAL is within noise. The only setting reaching sub-millisecond (MEMORY+OFF) risks DB corruption and is unacceptable. Safe PRAGMA changes cap at ~38%.

Suggested direction

The order-of-magnitude win needs an architectural change, not a PRAGMA: keep the synchronous disk write off the hot path — e.g. buffer to an append-only file and batch-import on rtk gain, a background writer thread, or amortize one connection across a session. Since #1987 / #1176 already refactor record() / track(), folding batched/async writes into that work would close this in the same pass.

Contributor guide

Research direction: Profile the synchronous SQLite write in track() and explore batching, async writes, or append only file to move the write off the hot path.
Tech stack: rust
Domain: cli
Issue type: Performance
Difficulty: 4
Estimated time: 3-5 days
Activity status: Active
Clarity: Clear
Prerequisites: RustSQLite
Newbie friendliness: 30