Prompt Replay Cache to Reduce Redundant Model Calls
#21570 opened on Mar 7, 2026
Description
What would you like to be added?
Introduce a Prompt Replay Cache mechanism that stores responses for previously executed prompts and reuses them when the same prompt is issued again within the same project context.
Currently, identical prompts always trigger a new model request even if the same prompt was executed moments earlier. A caching layer would allow the CLI to check if a prompt has already been processed and return the cached response instead of making another API call.
Proposed workflow:
User Prompt
→ Check local cache
→ If cached response exists → return cached result
→ If not → call the model → store response in cache
Example cache structure:
.cache/ prompt-cache.json
Each entry could store:
- prompt hash
- original prompt
- response
- timestamp
- project path (optional for project scoping)
This would be implemented as a lightweight cache layer before the model invocation step.
Why is this needed?
Currently, repeated prompts cause repeated API calls, which leads to:
- Increased latency for users
- Unnecessary API usage
- Higher operational costs
- Repeated computation for identical queries
During development workflows, users often repeat prompts such as:
- explaining a file
- summarizing code
- debugging errors
A prompt replay cache would significantly improve the developer experience by making repeated interactions faster while reducing unnecessary load on the model API.
This feature also helps improve CLI responsiveness during iterative workflows.
Additional context
Possible implementation considerations:
- Generate a hash from the prompt + project directory to uniquely identify cache entries
- Store cache files in a local directory (e.g.,
.cache/) - Implement cache expiration (TTL) or LRU eviction to prevent unbounded growth
- Optionally log cache hits/misses in debug mode
Example pseudo-code:
const key = hash(prompt + projectPath);
if (cache.has(key)) { return cache.get(key); }
const response = await callModel(prompt); cache.set(key, response);
This feature would be relatively lightweight to implement and could significantly improve performance for repeated prompts.