quickwit-oss/tantivy

Generate meaningful SegmentIDs instead of pure random

Open

#971 创建于 2021年1月7日

在 GitHub 查看
 (5 评论) (0 反应) (0 负责人)Rust (8,354 star) (499 fork)batch import
enhancementgood first issuehigh priorityquickwit

描述

Is your feature request related to a problem? Please describe. Related to #969 I would like to suggest a cheap feature which will help debugging in the future. Now SegmentIDs are generated randomly but it is a waste of 16 bytes which could be used to embed debugging info otherwise.

Describe the solution you'd like Generate SegmentID containing the following info:

  • timestamp of segment creation
  • segment origin (merging or writing new data)
  • hash of hostname, probably useful for those who will implement sharding/replication paired with Tantivy.

Additionally here we should ensure that there is left enough randomness to avoid any possibility of collisions on the one hand, and that names are not too long to avoid metadata bloating (? not sure if it is actual, the number of segments is supposed to be relatively low by design afaik) on the other hand.

贡献者指南