google-gemini/gemini-cli

Prompt Replay Cache to Reduce Redundant Model Calls

Open

#21,570 opened on 2026年3月7日

GitHub で見る
 (4 comments) (0 reactions) (0 assignees)TypeScript (103,992 stars) (13,657 forks)batch import
area/agenthelp wantedkind/enhancementkind/featurepriority/p3status/bot-triaged

説明

What would you like to be added?

Introduce a Prompt Replay Cache mechanism that stores responses for previously executed prompts and reuses them when the same prompt is issued again within the same project context.

Currently, identical prompts always trigger a new model request even if the same prompt was executed moments earlier. A caching layer would allow the CLI to check if a prompt has already been processed and return the cached response instead of making another API call.

Proposed workflow:

User Prompt
→ Check local cache
→ If cached response exists → return cached result
→ If not → call the model → store response in cache

Example cache structure:

.cache/ prompt-cache.json

Each entry could store:

  • prompt hash
  • original prompt
  • response
  • timestamp
  • project path (optional for project scoping)

This would be implemented as a lightweight cache layer before the model invocation step.

Why is this needed?

Currently, repeated prompts cause repeated API calls, which leads to:

  • Increased latency for users
  • Unnecessary API usage
  • Higher operational costs
  • Repeated computation for identical queries

During development workflows, users often repeat prompts such as:

  • explaining a file
  • summarizing code
  • debugging errors

A prompt replay cache would significantly improve the developer experience by making repeated interactions faster while reducing unnecessary load on the model API.

This feature also helps improve CLI responsiveness during iterative workflows.

Additional context

Possible implementation considerations:

  • Generate a hash from the prompt + project directory to uniquely identify cache entries
  • Store cache files in a local directory (e.g., .cache/)
  • Implement cache expiration (TTL) or LRU eviction to prevent unbounded growth
  • Optionally log cache hits/misses in debug mode

Example pseudo-code:

const key = hash(prompt + projectPath);

if (cache.has(key)) { return cache.get(key); }

const response = await callModel(prompt); cache.set(key, response);

This feature would be relatively lightweight to implement and could significantly improve performance for repeated prompts.

コントリビューターガイド

Prompt Replay Cache to Reduce Redundant Model Calls · google-gemini/gemini-cli#21570 | Good First Issue