[ML] Report actual memory usage for trained model deployments in TrainedModelSizeStats · elastic/elasticsearch#139233

(6 comments) (0 reactions) (0 assignees)Java (25,882 forks)batch import

:ml>enhancementTeam:MLgood first issue

Repository metrics

Stars: (76,700 stars)
PR merge metrics: (Avg merge 2d) (1,000 merged PRs in 30d)

Description

Extend TrainedModelSizeStats to include the actual OS-reported (on Linux) memory usage for trained model deployments (pytorch_inference process), instead of relying solely on estimated memory values.

Background

Currently, TrainedModelSizeStats reports:

model_size_bytes - the size of the model definition
required_native_memory_bytes - an estimated memory requirement calculated from the model definition length

This estimated value is computed in TransportGetTrainedModelsStatsAction.java:

long estimatedMemoryUsageBytes = totalDefinitionLength > 0L
    ? StartTrainedModelDeploymentAction.estimateMemoryUsageBytes(
        model.getModelId(),
        totalDefinitionLength,
        model.getPerDeploymentMemoryBytes(),
        model.getPerAllocationMemoryBytes(),
        numberOfAllocations
    )
    : 0L;

For anomaly detection jobs, PR #131981 (corresponding to ml-cpp#2846) added actual OS memory reporting via getrusage RSS values. This provides much more accurate information about real memory consumption.

Proposed Changes

Add new fields to TrainedModelSizeStats:
- runtime_native_memory_bytes - current resident set size as reported by the OS
- max_runtime_native_memory_bytes - peak resident set size as reported by the OS
Update the Java side to consume these values from the pytorch_inference native process output (requires corresponding ml-cpp changes).
Update the stats retrieval logic in TransportGetTrainedModelsStatsAction and related classes to populate and return these values for running deployments.
Consider backward compatibility: The estimated required_native_memory_bytes should remain for deployments that haven't reported actual usage yet, or for models that aren't currently deployed.

Files likely to be modified

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/TrainedModelSizeStats.java - Add new fields
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportGetTrainedModelsStatsAction.java - Populate actual memory values
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/pytorch/process/PyTorchResultProcessor.java - Process memory stats from native process
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/pytorch/results/PyTorchResult.java - Parse memory stats if included in results
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/deployment/DeploymentManager.java - Store/expose memory stats

API Changes

The GET _ml/trained_models/<model_id>/_stats API response would include additional fields:

{
  "model_size_stats": {
    "model_size_bytes": 438123456,
    "required_native_memory_bytes": 876246912,
    "system_memory_bytes": 892452864,
    "max_system_memory_bytes": 923845632
  }
}

Benefits

Provides accurate, real-time memory usage information for capacity planning
Helps users understand actual vs. estimated memory consumption
Enables better monitoring and alerting based on real memory footprint
Aligns trained model deployment monitoring with anomaly detection job monitoring

Dependencies

This issue depends on the corresponding ml-cpp changes elastic/ml-cpp#2885 to report actual memory usage from the pytorch_inference process.

Contributor guide

Research direction: Explore how the pytorch inference native process reports memory usage via getrusage and examine the Java side to consume these values, starting with PyTorchResultProcessor and TrainedModelSizeStats.
Tech stack: java
Domain: backendapimachine learning
Issue type: Feature
Difficulty: 3
Estimated time: 1-2 days
Activity status: Active
Clarity: Clear
Prerequisites: JavaMachine Learning concepts
Newbie friendliness: 30

Repository metrics

Description

Contributor guide

Get fresh easy issues in your inbox.