rtk-ai/rtk

feat(rewrite): canonical command digests — equivalence-aware hashing for dedup and caching

Open

#1 054 ouverte le 6 avr. 2026

Voir sur GitHub
 (1 commentaire) (0 réactions) (0 assignés)Rust (2 914 forks)batch import
area:clieffort-largeenhancementhelp wantedpriority:medium

Métriques du dépôt

Stars
 (48 085 stars)
Métriques de merge PR
 (Merge moyen 11j 1h) (45 PRs mergées en 30 j)

Description

Problem

rtk rewrite normalizes command names but doesn't normalize flags or produce structured output. This means semantically equivalent commands get different representations:

$ rtk rewrite "grep -rn pattern src/"
rtk grep -rn pattern src/

$ rtk rewrite "rg -n pattern src/"
rtk grep -n pattern src/
# Different strings despite being semantically identical

Same for:

  • git log --oneline vs git log --pretty=oneline --abbrev-commit
  • cat foo.txt vs head foo.txt vs tail foo.txt (all just "read file")

Proposed Feature

Add rtk canonicalize (or extend rtk rewrite --format json) that outputs a structured canonical form with a deterministic digest. Equivalent commands produce the same digest.

$ rtk canonicalize "grep -rn pattern src/"
{
  "tool": "grep",
  "flags": {"line-number": ""},
  "args": ["pattern", "src/"],
  "digest": "deaa1527537114cf"
}

$ rtk canonicalize "rg -n pattern src/"
{
  "tool": "grep",
  "flags": {"line-number": ""},
  "args": ["pattern", "src/"],
  "digest": "deaa1527537114cf"   # ← same digest!
}

Normalization rules

  • Tool aliases: cat/head/tail → read, rg/ag → grep, fd → find
  • Flag canonicalization: short → long form (-n → --line-number), sorted by key
  • Combined flag expansion: -rn → -r + -n
  • Tool-specific: grep -r stripped (canonical grep is recursive), git --oneline → --format=oneline
  • Sensitive masking: API keys and long tokens replaced with [MASKED]
  • Chain/pipe decomposition: && and | parsed into separate canonical segments

Use Cases

  1. Caching: Same digest = same command = cache hit. Agents re-reading the same file across turns get instant results.
  2. Telemetry dedup: Group execution events by canonical digest instead of raw strings. "How often do agents run grep?" works across rg/ag/grep variants.
  3. Loop detection: Two semantically identical commands with different syntax get the same fingerprint, catching loops that raw string comparison misses.
  4. Compression routing: Knowing the canonical tool lets you pick the right RTK filter even for aliased commands.

Proof of Concept

We built this in Go as the canon package in the Chitin kernel. Working equivalence tests:

cat foo.txt  ≡ head foo.txt  ≡ tail foo.txt   → digest 95d0e907bc6c155e
grep -rn X . ≡ rg -n X .                      → digest deaa1527537114cf
git log --oneline ≡ git log --pretty=oneline   → digest bd750a13fbce75f7

14 tests covering equivalence classes, chain/pipe parsing, env var prefixes, sensitive masking, and JSON round-tripping.

Happy to port to Rust if there's interest. The core is ~400 lines: tokenizer + flag expander + tool alias map + normalizer + SHA256 digest.

Relation to Existing Issues

  • Extends #154 (migrate rewrite to Rust) with structured output
  • Addresses part of #820 (rewrite normalization) at the flag level
  • Complements #569 (distill/compress) since canonical tool knowledge enables schema-aware compression

Guide contributeur