opensearch-project/OpenSearch

[Feature Request] Paginate snapshot indices status fetching

Open

#16,985 opened on Jan 9, 2025

View on GitHub
 (19 comments) (0 reactions) (1 assignee)Java (8,123 stars) (1,505 forks)batch import
Storage:Snapshotsenhancementgood first issue

Description

Is your feature request related to a problem? Please describe

Our customers depend on the snapshot status API to access information about snapshot indices, like store size, number of docs, etc. The TransportSnapshotStatusAction utilizes a single Generic thread to retrieve repository data, snapshot information, snapshot index metadata, and shard snapshot status if the specified snapshot(s) is not currently running. However, when the specified snapshot contains a large number of indices, the execution time for this action becomes significantly prolonged.

In one of the snapshot which has 15000+ shards, snapshot status fetching was taking 8min.

Describe the solution you'd like

Provide a new API (_snapshot/{repository}/{snapshot}/_list/indices) to paginate snapshot indices status like we did in #14258. The new API works only for indexes belonging to a specific snapshot. Since the order of indices in SnapshotInfo is settled, we can simply use from + size to paginate. If the specified snapshot is running, then the paginating parameters will have no effect.

Related component

Storage:Snapshots

Describe alternatives you've considered

Using the snapshot thread pool to parallelize indices snapshot status fetching. But the snapshot thread pool might be blocked on long running tasks. Moreover, the maximum number of threads in the snapshot thread pool is only 5, so the speedup effect may be limited

Additional context

No response

Contributor guide