confluentinc/ksql

ksqlDB should optimize pull queries for streams for time ranges

Open

#9,181 建立於 2022年6月7日

在 GitHub 查看
 (1 留言) (0 反應) (0 負責人)Java (5,739 star) (1,048 fork)batch import
enhancementgood first issueperformancequery-engine

描述

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Currently, ksqlDB causes a full topic scan whenever performing a pull query over a stream. This is inefficient when looking up specific sets of data, but necessary due to how pull queries are implemented over streams.

Describe the solution you'd like A clear and concise description of what you want to happen.

Ideally, ksqlDB should be able to perform optimizations on the pull query to make it more performant according to a defined time range of the query. For example:

-- Should only scan from 1654618081 
SELECT * FROM STREAM WHERE ROWTIME > 1654618081;

-- Should only scan between 1654618081 and 1654618080
SELECT * FROM STREAM WHERE ROWTIME < 1654618081 AND ROWTIME > 1654618080 ;

-- Should only scan to 1654618081
SELECT * FROM STREAM WHERE ROWTIME < 1654618081;

This should be possible given Kafka allows to seek to an offset according to their timestamp (this optimization may not be possible with user-defined custom ROWTIMEs).

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

No real alternative here.

Additional context Add any other context or screenshots about the feature request here.

貢獻者指南