[ML] Report actual memory usage for trained model deployments in TrainedModelSizeStats
#139233 opened on Dec 9, 2025
Description
Extend TrainedModelSizeStats to include the actual OS-reported (on Linux) memory usage for trained model deployments (pytorch_inference process), instead of relying solely on estimated memory values.
Background
Currently, TrainedModelSizeStats reports:
model_size_bytes- the size of the model definitionrequired_native_memory_bytes- an estimated memory requirement calculated from the model definition length
This estimated value is computed in TransportGetTrainedModelsStatsAction.java:
long estimatedMemoryUsageBytes = totalDefinitionLength > 0L
? StartTrainedModelDeploymentAction.estimateMemoryUsageBytes(
model.getModelId(),
totalDefinitionLength,
model.getPerDeploymentMemoryBytes(),
model.getPerAllocationMemoryBytes(),
numberOfAllocations
)
: 0L;
For anomaly detection jobs, PR #131981 (corresponding to ml-cpp#2846) added actual OS memory reporting via getrusage RSS values. This provides much more accurate information about real memory consumption.
Proposed Changes
-
Add new fields to
TrainedModelSizeStats:runtime_native_memory_bytes- current resident set size as reported by the OSmax_runtime_native_memory_bytes- peak resident set size as reported by the OS
-
Update the Java side to consume these values from the
pytorch_inferencenative process output (requires corresponding ml-cpp changes). -
Update the stats retrieval logic in
TransportGetTrainedModelsStatsActionand related classes to populate and return these values for running deployments. -
Consider backward compatibility: The estimated
required_native_memory_bytesshould remain for deployments that haven't reported actual usage yet, or for models that aren't currently deployed.
Files likely to be modified
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/TrainedModelSizeStats.java- Add new fieldsx-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportGetTrainedModelsStatsAction.java- Populate actual memory valuesx-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/pytorch/process/PyTorchResultProcessor.java- Process memory stats from native processx-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/pytorch/results/PyTorchResult.java- Parse memory stats if included in resultsx-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/deployment/DeploymentManager.java- Store/expose memory stats
API Changes
The GET _ml/trained_models/<model_id>/_stats API response would include additional fields:
{
"model_size_stats": {
"model_size_bytes": 438123456,
"required_native_memory_bytes": 876246912,
"system_memory_bytes": 892452864,
"max_system_memory_bytes": 923845632
}
}
Benefits
- Provides accurate, real-time memory usage information for capacity planning
- Helps users understand actual vs. estimated memory consumption
- Enables better monitoring and alerting based on real memory footprint
- Aligns trained model deployment monitoring with anomaly detection job monitoring
Dependencies
This issue depends on the corresponding ml-cpp changes elastic/ml-cpp#2885 to report actual memory usage from the pytorch_inference process.