Prompt Replay Cache to Reduce Redundant Model Calls · google-gemini/gemini-cli#21570

(6 comments) (0 reactions) (1 assignee)TypeScript (13,657 forks)batch import

area/agenthelp wantedkind/enhancementkind/featurepriority/p3status/bot-triaged

Repository metrics

Stars: (103,992 stars)
PR merge metrics: (Avg merge 4d 2h) (55 merged PRs in 30d)

Description

What would you like to be added?

Introduce a Prompt Replay Cache mechanism that stores responses for previously executed prompts and reuses them when the same prompt is issued again within the same project context.

Currently, identical prompts always trigger a new model request even if the same prompt was executed moments earlier. A caching layer would allow the CLI to check if a prompt has already been processed and return the cached response instead of making another API call.

Proposed workflow:

User Prompt
→ Check local cache
→ If cached response exists → return cached result
→ If not → call the model → store response in cache

Example cache structure:

.cache/ prompt-cache.json

Each entry could store:

prompt hash
original prompt
response
timestamp
project path (optional for project scoping)

This would be implemented as a lightweight cache layer before the model invocation step.

Why is this needed?

Currently, repeated prompts cause repeated API calls, which leads to:

Increased latency for users
Unnecessary API usage
Higher operational costs
Repeated computation for identical queries

During development workflows, users often repeat prompts such as:

explaining a file
summarizing code
debugging errors

A prompt replay cache would significantly improve the developer experience by making repeated interactions faster while reducing unnecessary load on the model API.

This feature also helps improve CLI responsiveness during iterative workflows.

Additional context

Possible implementation considerations:

Generate a hash from the prompt + project directory to uniquely identify cache entries
Store cache files in a local directory (e.g., .cache/)
Implement cache expiration (TTL) or LRU eviction to prevent unbounded growth
Optionally log cache hits/misses in debug mode

Example pseudo-code:

const key = hash(prompt + projectPath);

if (cache.has(key)) { return cache.get(key); }

const response = await callModel(prompt); cache.set(key, response);

This feature would be relatively lightweight to implement and could significantly improve performance for repeated prompts.

Contributor guide

Research direction: Implement a caching layer that stores and retrieves model responses based on a hash of the prompt and project path. Use a local cache file with optional TTL.
Tech stack: typescriptnodejs
Domain: cli
Issue type: Feature
Difficulty: 3
Estimated time: Half day
Activity status: Active
Clarity: Clear
Prerequisites: Node.jsGit
Newbie friendliness: 80