kvcache-ai/Mooncake

[Feature Request]: Tiered Caching for Mooncake Store

Open

#954 opened on Oct 23, 2025

View on GitHub
 (1 comment) (0 reactions) (0 assignees)C++ (5,470 stars) (803 forks)auto 404
good first issue

Description

Describe your feature request

Tiered Caching for Mooncake Store

The design is based on a two-level metadata management strategy to ensure high performance and scalability by minimizing communication with the central Master node.


1. Core Concepts & Goals

  • Goal: To leverage a hierarchy of storage media (GPU VRAM, DRAM, SSD, etc.) across multiple nodes to create a large, fast, and cost-effective distributed cache.

  • Two-Level Metadata:

    • Global Metadata (Master Node): The Master node will only track the coarse-grained location of each cache key, i.e., which NodeID currently owns it. This keeps the Master lightweight and reduces its role as a bottleneck.
    • Local Metadata (Worker Nodes): Each worker node will be responsible for managing the precise location of its own segments within its local storage tiers (e.g., Tier 0: VRAM, Tier 2: DRAM).
  • Synchronous Communication (Initial Phase): To ensure simplicity and correctness in the initial implementation, all metadata updates that require cross-node communication (i.e., changing a key's owner NodeID) will be done synchronously via blocking RPC calls. This guarantees strong consistency at the cost of higher latency during inter-node evictions.

  • Path to Asynchronicity: The architecture will be designed with a future transition to asynchronous communication in mind. The synchronous communication points will be encapsulated, allowing them to be replaced with an asynchronous, queue-based mechanism in a later phase without a major system redesign.

2. System Components

2.1. MasterService

  • Maintains a global, concurrent map: std::unordered_map<key, NodeID> global_segment_locator_.
  • Exposes RPC endpoints for clients to query the NodeID for a given key.
  • Exposes an internal RPC endpoint for worker nodes to synchronously update the NodeID of a key during an inter-node eviction.

2.2. TieredCacheBackend (Worker Node Component)

  • Manages a list of CacheTier objects, ordered from fastest to slowest.
  • Implements the core logic for Put, Get, and Delete operations.
  • Handles Intra-Node Movement: When a Segment is moved between tiers within the same node (e.g., VRAM -> DRAM), it updates its own local metadata. No communication with the Master is required.
  • Handles Inter-Node Eviction (Synchronous):
    1. When a cache needs to be evicted to another node (e.g., Node A -> Node B), Node A's TieredCacheBackend transfers the data to Node B via the TransferEngine.
    2. After successful data transfer, Node A makes a blocking RPC call to the Master: UpdateOwner(key, NewNodeID).
    3. The Master updates its global_segment_locator_ and returns an acknowledgment.
    4. Only after receiving the acknowledgment does the Evict operation complete on Node A.

2.3. CacheTier (Interface)

  • The abstract base class (cache_tier.h) defining the contract for all storage tiers.
  • Concrete implementations will include: GpuTier, DramTier, SsdTier, and remote proxy tiers like RemoteGpuTier.

2.4. Client (Client class)

  • Get Workflow:
    1. Sends a GetLocation(key) request to the MasterService.
    2. Master replies with the owner NodeID.
    3. Client connects directly to the owner node and issues a GetData(key) request.
    4. The worker node's TieredStorageBackend finds the data in one of its local tiers, promotes it to the highest tier if necessary, and returns the data to the client.

3. Architectural Diagram

Before submitting a new issue...

  • Make sure you already searched for relevant issues and read the documentation

Contributor guide