apache/lucene

Create a simple "real world" regexp benchmark [LUCENE-9986]

Open

#11,025 创建于 2021年6月2日

在 GitHub 查看
 (6 评论) (0 反应) (1 负责人)Java (2,179 star) (879 fork)batch import
good first issuelegacy-jira-priority:Majortype:enhancement

描述

For issues like #11022, where we are struggling to decide which low-level optimizations to make for our (complicated!) determinize method, it would really help to have a large, real-world corpus of regexps to evaluate performance metrics of our automata operations, like CPU and HEAP required to parse the regexp and determinize.

Does anyone know of such an existing, hopefully compatibly licensed, corpus?

Probably we would add these benchmarks to luceneutil.


Migrated from LUCENE-9986 by Michael McCandless (@mikemccand)

贡献者指南

Create a simple "real world" regexp benchmark [LUCENE-9986] · apache/lucene#11025 | Good First Issue