apache/lucene
GitHub で見るCreate a simple "real world" regexp benchmark [LUCENE-9986]
Open
#11,025 opened on 2021年6月2日
good first issuelegacy-jira-priority:Majortype:enhancement
説明
For issues like #11022, where we are struggling to decide which low-level optimizations to make for our (complicated!) determinize method, it would really help to have a large, real-world corpus of regexps to evaluate performance metrics of our automata operations, like CPU and HEAP required to parse the regexp and determinize.
Does anyone know of such an existing, hopefully compatibly licensed, corpus?
Probably we would add these benchmarks to luceneutil.
Migrated from LUCENE-9986 by Michael McCandless (@mikemccand)