confluentinc/ksql

ksqlDB should optimize pull queries for streams for time ranges

Open

#9,181 创建于 2022年6月7日

在 GitHub 查看
 (1 评论) (0 反应) (0 负责人)Java (5,739 star) (1,048 fork)batch import
enhancementgood first issueperformancequery-engine

描述

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Currently, ksqlDB causes a full topic scan whenever performing a pull query over a stream. This is inefficient when looking up specific sets of data, but necessary due to how pull queries are implemented over streams.

Describe the solution you'd like A clear and concise description of what you want to happen.

Ideally, ksqlDB should be able to perform optimizations on the pull query to make it more performant according to a defined time range of the query. For example:

-- Should only scan from 1654618081 
SELECT * FROM STREAM WHERE ROWTIME > 1654618081;

-- Should only scan between 1654618081 and 1654618080
SELECT * FROM STREAM WHERE ROWTIME < 1654618081 AND ROWTIME > 1654618080 ;

-- Should only scan to 1654618081
SELECT * FROM STREAM WHERE ROWTIME < 1654618081;

This should be possible given Kafka allows to seek to an offset according to their timestamp (this optimization may not be possible with user-defined custom ROWTIMEs).

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

No real alternative here.

Additional context Add any other context or screenshots about the feature request here.

贡献者指南

ksqlDB should optimize pull queries for streams for time ranges · confluentinc/ksql#9181 | Good First Issue