Fail index analyzer that contains a graph token filter · elastic/elasticsearch#24396

(3 comments) (0 reactions) (0 assignees)Java (25,882 forks)batch import

:Search Relevance/Analysis>enhancementTeam:Search Relevancehelp wanted

Repository metrics

Stars: (76,700 stars)
PR merge metrics: (平均マージ 2d) (30d で 1,000 merged PRs)

説明

Currently it is possible to set a synonym_graph or a word_delimiter_graph token filter in an analyzer that is used at index time. Though these filters can produce side-paths that will break the positions in the index and make phrase query matching impossible on the field. The flatten_graph token filter is supposed to handle this situation but it can only flatten the graph which is also a lossy operation. So whether the user adds a flatten_graph filter at the end of the analyzer or not the positions of the terms in the index will not be accurate. Instead we could try to detect these situation and fail the mapping if a graph filter is used in an index analyzer. This would allow us to remove the flatten_graph filter and also help users to not shoot themselves in the foot. Here is an hopefully exhaustive list of token filters that should be impacted by this:

synonym_graph_filter
word_delimiter_graph_filter
shingles (only when output_unigram:true or min_size < max_size)
cjk (only when output_unigram:true)
ngram tokenizer when min_gram < max_gram
common_gram
kuromoji_tokenizer when (nbest_cost or nbest_example > 1).

コントリビューターガイド

調査方針: Elasticsearchのマッピング解析コードで、インデックスアナライザーのトークンフィルターが検証される場所を特定します。リストされたグラフトークンフィルター（synonym graph、word delimiter graph、output unigram:trueまたはmin gram < max gramのshinglesなど）のチェックを追加し、それらが存在する場合、明確なエラーメッセージでマッピングの作成を失敗させます。
技術スタック: java
領域: backend
Issue 種別: 機能
難度: 2
推定時間: 1-3時間
活動状況: アクティブ
明確さ: 明確
前提条件: GitJava
初心者向け度: 75

Repository metrics

説明

コントリビューターガイド

新着 Easy issues をメールで受け取る。