Descrizione
Describe your feature request
Tiered Caching for Mooncake Store
The design is based on a two-level metadata management strategy to ensure high performance and scalability by minimizing communication with the central Master node.
1. Core Concepts & Goals
-
Goal: To leverage a hierarchy of storage media (GPU VRAM, DRAM, SSD, etc.) across multiple nodes to create a large, fast, and cost-effective distributed cache.
-
Two-Level Metadata:
- Global Metadata (Master Node): The Master node will only track the coarse-grained location of each cache
key, i.e., whichNodeIDcurrently owns it. This keeps the Master lightweight and reduces its role as a bottleneck. - Local Metadata (Worker Nodes): Each worker node will be responsible for managing the precise location of its own segments within its local storage tiers (e.g.,
Tier 0: VRAM,Tier 2: DRAM).
- Global Metadata (Master Node): The Master node will only track the coarse-grained location of each cache
-
Synchronous Communication (Initial Phase): To ensure simplicity and correctness in the initial implementation, all metadata updates that require cross-node communication (i.e., changing a
key's ownerNodeID) will be done synchronously via blocking RPC calls. This guarantees strong consistency at the cost of higher latency during inter-node evictions. -
Path to Asynchronicity: The architecture will be designed with a future transition to asynchronous communication in mind. The synchronous communication points will be encapsulated, allowing them to be replaced with an asynchronous, queue-based mechanism in a later phase without a major system redesign.
2. System Components
2.1. MasterService
- Maintains a global, concurrent map:
std::unordered_map<key, NodeID> global_segment_locator_. - Exposes RPC endpoints for clients to query the
NodeIDfor a givenkey. - Exposes an internal RPC endpoint for worker nodes to synchronously update the
NodeIDof akeyduring an inter-node eviction.
2.2. TieredCacheBackend (Worker Node Component)
- Manages a list of
CacheTierobjects, ordered from fastest to slowest. - Implements the core logic for
Put,Get, andDeleteoperations. - Handles Intra-Node Movement: When a
Segmentis moved between tiers within the same node (e.g., VRAM -> DRAM), it updates its own local metadata. No communication with the Master is required. - Handles Inter-Node Eviction (Synchronous):
- When a cache needs to be evicted to another node (e.g., Node A -> Node B), Node A's
TieredCacheBackendtransfers the data to Node B via theTransferEngine. - After successful data transfer, Node A makes a blocking RPC call to the Master:
UpdateOwner(key, NewNodeID). - The Master updates its
global_segment_locator_and returns an acknowledgment. - Only after receiving the acknowledgment does the
Evictoperation complete on Node A.
- When a cache needs to be evicted to another node (e.g., Node A -> Node B), Node A's
2.3. CacheTier (Interface)
- The abstract base class (
cache_tier.h) defining the contract for all storage tiers. - Concrete implementations will include:
GpuTier,DramTier,SsdTier, and remote proxy tiers likeRemoteGpuTier.
2.4. Client (Client class)
- Get Workflow:
- Sends a
GetLocation(key)request to theMasterService. - Master replies with the owner
NodeID. - Client connects directly to the owner node and issues a
GetData(key)request. - The worker node's
TieredStorageBackendfinds the data in one of its local tiers, promotes it to the highest tier if necessary, and returns the data to the client.
- Sends a
3. Architectural Diagram
Before submitting a new issue...
- Make sure you already searched for relevant issues and read the documentation