apache/lucene
View on GitHubCreate a simple "real world" regexp benchmark [LUCENE-9986]
Open
#11,025 opened on Jun 2, 2021
good first issuelegacy-jira-priority:Majortype:enhancement
Description
For issues like #11022, where we are struggling to decide which low-level optimizations to make for our (complicated!) determinize method, it would really help to have a large, real-world corpus of regexps to evaluate performance metrics of our automata operations, like CPU and HEAP required to parse the regexp and determinize.
Does anyone know of such an existing, hopefully compatibly licensed, corpus?
Probably we would add these benchmarks to luceneutil.
Migrated from LUCENE-9986 by Michael McCandless (@mikemccand)