domain/deep-learningenhancementhelp wanted
説明
Willingness to contribute
No. I cannot contribute this feature at this time.
Proposal Summary
- Introduce support for logging events and stages during a run,
- and display them as timeline annotations on the system metrics plots (e.g., CPU, GPU usage).
Motivation
MLflow currently lacks the ability to visualise when key pipeline stages occur relative to resource usage. This feature would enable easier debugging, profiling, and optimisation.
Details
Stage 1: Manual Event/Stage Logging API
Add a new function such as:
mlflow.log_event(
name: str,
time: Optional[float] = None,
start_time: Optional[float] = None,
)
- If only
nameis given: logs a point-in-time, using current timestamp. - If
timeis provided: logs an event at the specified time. - If
start_timeis provided: logs a stage (duration), fromstart_time(provided) tilltime(can be provided, but defaults to current timestamp)
Stage 2: Timeline Annotations in the UI
Enhance the System Metrics tab with annotated overlays:
- Events: vertical lines at the logged timestamp, with hover text showing event name.
- Stages: shaded regions between
start_timeandtime, labeled with stage name. - Consider legends and filters to show/hide particular events/stages.
Stage 3: Optional Automatic Logging
Offer an optional mode where MLflow automatically logs high-level stages (e.g. data loading, model fitting) using a profiler, without explicit calls:
- No manual logging required by the user.
- Logging remains non-recursive (top-level stages only).
What component(s) does this bug affect?
-
area/artifacts: Artifact stores and artifact logging -
area/build: Build and test infrastructure for MLflow -
area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations -
area/docs: MLflow documentation pages -
area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflows -
area/examples: Example code -
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry -
area/models: MLmodel format, model serialization/deserialization, flavors -
area/projects: MLproject format, project running backends -
area/prompt: MLflow prompt engineering features, prompt templates, and prompt management -
area/scoring: MLflow Model server, model deployment tools, Spark UDFs -
area/server-infra: MLflow Tracking server backend -
area/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionality -
area/tracking: Tracking Service, tracking client APIs, autologging
What interface(s) does this bug affect?
-
area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server -
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models -
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry -
area/windows: Windows support
What language(s) does this bug affect?
-
language/r: R APIs and clients -
language/java: Java APIs and clients -
language/new: Proposals for new client languages
What integration(s) does this bug affect?
-
integrations/azure: Azure and Azure ML integrations -
integrations/sagemaker: SageMaker integrations -
integrations/databricks: Databricks integrations