ksqlDB should optimize pull queries for streams for time ranges · confluentinc/ksql#9181

(1 留言) (0 反應) (0 負責人)Java (5,739 star) (1,048 fork)batch import

enhancementgood first issueperformancequery-engine

描述

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Currently, ksqlDB causes a full topic scan whenever performing a pull query over a stream. This is inefficient when looking up specific sets of data, but necessary due to how pull queries are implemented over streams.

Describe the solution you'd like A clear and concise description of what you want to happen.

Ideally, ksqlDB should be able to perform optimizations on the pull query to make it more performant according to a defined time range of the query. For example:

-- Should only scan from 1654618081 
SELECT * FROM STREAM WHERE ROWTIME > 1654618081;

-- Should only scan between 1654618081 and 1654618080
SELECT * FROM STREAM WHERE ROWTIME < 1654618081 AND ROWTIME > 1654618080 ;

-- Should only scan to 1654618081
SELECT * FROM STREAM WHERE ROWTIME < 1654618081;

This should be possible given Kafka allows to seek to an offset according to their timestamp (this optimization may not be possible with user-defined custom ROWTIMEs).

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

No real alternative here.

Additional context Add any other context or screenshots about the feature request here.

貢獻者指南

技術棧: javasqlkafka
領域: dataperformancebackend
議題類型: feature
難度: 4
預計時間: 3-5 days
活動狀態: fresh
清晰度: mostly clear
前置要求: Kafka fundamentalsksqlDB pull queriesJavaStream processing concepts
新手友善度: 20
研究方向: Investigate how current pull queries are executed in ksqlDB, particularly in the pull query executor. The goal is to modify the reading of Kafka topics to use KafkaConsumer's offsetsForTimes method to seek based on ROWTIME conditions (e.g., WHERE ROWTIME > timestamp). This requires understanding the query plan transformation and the Kafka stream reader. Review existing code in the ksql engine for filtering optimizations and consider implementing a time range predicate pushdown.