meilisearch/MeiliSearch

Missing progress trace in facet post-processing stats

Open

#5654 opened on Jun 11, 2025

View on GitHub
 (5 comments) (0 reactions) (0 assignees)Rust (20,887 stars) (733 forks)batch import
good first issuemaintenance

Description

When analyzing the batch processing stats, we noticed a significant discrepancy in the timing traces for facet post-processing. Specifically, while we have detailed timings for individual operations, there's a large gap in the total time that's unaccounted for:

"processing tasks > indexing > post processing facets > strings incremental": "169.89ms",
"processing tasks > indexing > post processing facets > numbers incremental": "957.28ms",
"processing tasks > indexing > post processing facets > facet search": "60.29µs",
"processing tasks > indexing > post processing facets": "7.10s",
  {
    "details": {
      "receivedDocuments": 5311,
      "indexedDocuments": 5311
    },
    "stats": {
      "totalNbTasks": 10,
      "status": {
        "succeeded": 10
      },
      "types": {
        "documentAdditionOrUpdate": 10
      },
      "indexUids": {
        "binderProducts_production": 10
      },
      "progressTrace": {
        "processing tasks > retrieving config": "145.87µs",
        "processing tasks > computing document changes > preparing payloads > payload": "23.60ms",
        "processing tasks > computing document changes > preparing payloads": "295.15ms",
        "processing tasks > computing document changes": "295.17ms",
        "processing tasks > indexing > extracting documents > document": "5.85ms",
        "processing tasks > indexing > extracting documents": "5.85ms",
        "processing tasks > indexing > extracting facets > document": "2.13ms",
        "processing tasks > indexing > extracting facets": "2.18ms",
        "processing tasks > indexing > merging facet caches": "627.02ms",
        "processing tasks > indexing > extracting words > document": "2.36ms",
        "processing tasks > indexing > extracting words": "2.41ms",
        "processing tasks > indexing > merging word caches > word docids": "451.67ms",
        "processing tasks > indexing > merging word caches > word field id docids": "363.96ms",
        "processing tasks > indexing > merging word caches > exact word docids": "545.45µs",
        "processing tasks > indexing > merging word caches > word position docids": "356.16ms",
        "processing tasks > indexing > merging word caches > field id word count docids": "144.38ms",
        "processing tasks > indexing > merging word caches": "1.32s",
        "processing tasks > indexing > extracting word proximity > document": "911.13µs",
        "processing tasks > indexing > extracting word proximity": "990.90µs",
        "processing tasks > indexing > merging word proximity": "206.09ms",
        "processing tasks > indexing > waiting for database writes": "970.33ms",
        "processing tasks > indexing > waiting for extractors": "61.66µs",
        "processing tasks > indexing > writing embeddings to database": "106.89µs",
        "processing tasks > indexing > post processing facets > strings incremental": "169.89ms",
        "processing tasks > indexing > post processing facets > numbers incremental": "957.28ms",
        "processing tasks > indexing > post processing facets > facet search": "60.29µs",
        "processing tasks > indexing > post processing facets": "7.10s",
        "processing tasks > indexing > post processing words > word fst": "1.18s",
        "processing tasks > indexing > post processing words > word prefix docids": "830.83ms",
        "processing tasks > indexing > post processing words > exact word prefix docids": "16.01ms",
        "processing tasks > indexing > post processing words > word prefix field id docids": "1.31s",
        "processing tasks > indexing > post processing words > word prefix position docids": "1.35s",
        "processing tasks > indexing > post processing words": "4.69s",
        "processing tasks > indexing > finalizing > committing": "5.60s",
        "processing tasks > indexing > finalizing > computing stats": "10.41ms",
        "processing tasks > indexing > finalizing": "5.65s",
        "processing tasks > indexing": "21.06s",
        "processing tasks": "21.35s",
        "writing tasks to disk > task": "987.33µs",
        "writing tasks to disk": "1.07ms"
      },
      "writeChannelCongestion": {
        "attempts": 12411,
        "blocking_attempts": 0,
        "blocking_ratio": 0
      },
      "internalDatabaseSizes": {
        "fieldIdDocidFacetF64S": "12.23 GiB (+8 KiB)",
        "fieldIdDocidFacetStrings": "48.35 GiB (+32 KiB)",
        "documents": "44.91 GiB (+20 KiB)"
      }
    },
    "duration": "PT21.353340990S",
  }

As shown above, while the total time for post-processing facets is 7.10s, the sum of the traced operations (strings incremental + numbers incremental + facet search) only accounts for about 1.13s. This means approximately 5.97s of processing time is not being traced.

Technical Details

Looking at the codebase:

  1. The post-processing facets step is defined in milli/src/update/new/steps.rs with the following stages:

    • StringsBulk
    • StringsIncremental
    • NumbersBulk
    • NumbersIncremental
    • FacetSearch
  2. The main facet processing happens in:

  3. All these files have tracing instrumentation, but there seems to be a gap in capturing all the processing time.

  4. this missing gap may come from the facet data preparation

Impact

This missing trace makes it difficult to:

  1. Identify performance bottlenecks in facet processing
  2. Debug performance issues when they occur
  3. Understand where the majority of the processing time is spent

Proposed Solution

We should:

  1. Add additional tracing points to capture all facet processing operations
  2. Ensure all major operations within the facet post-processing step are properly instrumented

Contributor guide