mlflow/mlflow

[FR] Event & Stage Logging with Timeline Annotations

Open

#16,892 opened on 2025年7月25日

GitHub で見る
 (3 comments) (0 reactions) (0 assignees)Python (17,127 stars) (3,904 forks)batch import
domain/deep-learningenhancementhelp wanted

説明

Willingness to contribute

No. I cannot contribute this feature at this time.

Proposal Summary

  • Introduce support for logging events and stages during a run,
  • and display them as timeline annotations on the system metrics plots (e.g., CPU, GPU usage).

Motivation

MLflow currently lacks the ability to visualise when key pipeline stages occur relative to resource usage. This feature would enable easier debugging, profiling, and optimisation.

Details

Stage 1: Manual Event/Stage Logging API

Add a new function such as:

mlflow.log_event(
  name: str,
  time: Optional[float] = None,
  start_time: Optional[float] = None,
)
  • If only name is given: logs a point-in-time, using current timestamp.
  • If time is provided: logs an event at the specified time.
  • If start_time is provided: logs a stage (duration), from start_time (provided) till time (can be provided, but defaults to current timestamp)

Stage 2: Timeline Annotations in the UI

Enhance the System Metrics tab with annotated overlays:

  • Events: vertical lines at the logged timestamp, with hover text showing event name.
  • Stages: shaded regions between start_time and time, labeled with stage name.
  • Consider legends and filters to show/hide particular events/stages.

Stage 3: Optional Automatic Logging

Offer an optional mode where MLflow automatically logs high-level stages (e.g. data loading, model fitting) using a profiler, without explicit calls:

  • No manual logging required by the user.
  • Logging remains non-recursive (top-level stages only).

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • area/docs: MLflow documentation pages
  • area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflows
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/prompt: MLflow prompt engineering features, prompt templates, and prompt management
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionality
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

コントリビューターガイド

[FR] Event & Stage Logging with Timeline Annotations · mlflow/mlflow#16892 | Good First Issue