elastic/elasticsearch

[ML] Report actual memory usage for trained model deployments in TrainedModelSizeStats

Open

#139233 opened on Dec 9, 2025

View on GitHub
 (5 comments) (0 reactions) (0 assignees)Java (76,700 stars) (25,882 forks)batch import
:ml>enhancementTeam:MLgood first issue

Description

Extend TrainedModelSizeStats to include the actual OS-reported (on Linux) memory usage for trained model deployments (pytorch_inference process), instead of relying solely on estimated memory values.

Background

Currently, TrainedModelSizeStats reports:

  • model_size_bytes - the size of the model definition
  • required_native_memory_bytes - an estimated memory requirement calculated from the model definition length

This estimated value is computed in TransportGetTrainedModelsStatsAction.java:

long estimatedMemoryUsageBytes = totalDefinitionLength > 0L
    ? StartTrainedModelDeploymentAction.estimateMemoryUsageBytes(
        model.getModelId(),
        totalDefinitionLength,
        model.getPerDeploymentMemoryBytes(),
        model.getPerAllocationMemoryBytes(),
        numberOfAllocations
    )
    : 0L;

For anomaly detection jobs, PR #131981 (corresponding to ml-cpp#2846) added actual OS memory reporting via getrusage RSS values. This provides much more accurate information about real memory consumption.

Proposed Changes

  1. Add new fields to TrainedModelSizeStats:

    • runtime_native_memory_bytes - current resident set size as reported by the OS
    • max_runtime_native_memory_bytes - peak resident set size as reported by the OS
  2. Update the Java side to consume these values from the pytorch_inference native process output (requires corresponding ml-cpp changes).

  3. Update the stats retrieval logic in TransportGetTrainedModelsStatsAction and related classes to populate and return these values for running deployments.

  4. Consider backward compatibility: The estimated required_native_memory_bytes should remain for deployments that haven't reported actual usage yet, or for models that aren't currently deployed.

Files likely to be modified

  • x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/TrainedModelSizeStats.java - Add new fields
  • x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportGetTrainedModelsStatsAction.java - Populate actual memory values
  • x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/pytorch/process/PyTorchResultProcessor.java - Process memory stats from native process
  • x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/pytorch/results/PyTorchResult.java - Parse memory stats if included in results
  • x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/deployment/DeploymentManager.java - Store/expose memory stats

API Changes

The GET _ml/trained_models/<model_id>/_stats API response would include additional fields:

{
  "model_size_stats": {
    "model_size_bytes": 438123456,
    "required_native_memory_bytes": 876246912,
    "system_memory_bytes": 892452864,
    "max_system_memory_bytes": 923845632
  }
}

Benefits

  • Provides accurate, real-time memory usage information for capacity planning
  • Helps users understand actual vs. estimated memory consumption
  • Enables better monitoring and alerting based on real memory footprint
  • Aligns trained model deployment monitoring with anomaly detection job monitoring

Dependencies

This issue depends on the corresponding ml-cpp changes elastic/ml-cpp#2885 to report actual memory usage from the pytorch_inference process.

Contributor guide