google-gemini/gemini-cli

Prompt Replay Cache to Reduce Redundant Model Calls

Open

#21570 opened on Mar 7, 2026

View on GitHub
 (4 comments) (0 reactions) (0 assignees)TypeScript (103,992 stars) (13,657 forks)batch import
area/agenthelp wantedkind/enhancementkind/featurepriority/p3status/bot-triaged

Description

What would you like to be added?

Introduce a Prompt Replay Cache mechanism that stores responses for previously executed prompts and reuses them when the same prompt is issued again within the same project context.

Currently, identical prompts always trigger a new model request even if the same prompt was executed moments earlier. A caching layer would allow the CLI to check if a prompt has already been processed and return the cached response instead of making another API call.

Proposed workflow:

User Prompt
→ Check local cache
→ If cached response exists → return cached result
→ If not → call the model → store response in cache

Example cache structure:

.cache/ prompt-cache.json

Each entry could store:

  • prompt hash
  • original prompt
  • response
  • timestamp
  • project path (optional for project scoping)

This would be implemented as a lightweight cache layer before the model invocation step.

Why is this needed?

Currently, repeated prompts cause repeated API calls, which leads to:

  • Increased latency for users
  • Unnecessary API usage
  • Higher operational costs
  • Repeated computation for identical queries

During development workflows, users often repeat prompts such as:

  • explaining a file
  • summarizing code
  • debugging errors

A prompt replay cache would significantly improve the developer experience by making repeated interactions faster while reducing unnecessary load on the model API.

This feature also helps improve CLI responsiveness during iterative workflows.

Additional context

Possible implementation considerations:

  • Generate a hash from the prompt + project directory to uniquely identify cache entries
  • Store cache files in a local directory (e.g., .cache/)
  • Implement cache expiration (TTL) or LRU eviction to prevent unbounded growth
  • Optionally log cache hits/misses in debug mode

Example pseudo-code:

const key = hash(prompt + projectPath);

if (cache.has(key)) { return cache.get(key); }

const response = await callModel(prompt); cache.set(key, response);

This feature would be relatively lightweight to implement and could significantly improve performance for repeated prompts.

Contributor guide