ksqlDB should optimize pull queries for streams for time ranges · confluentinc/ksql#9181

Repository metrics

Stars: (5,739 stars)
PR merge metrics: (平均マージ 4d 20h) (30d で 6 merged PRs)

説明

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Currently, ksqlDB causes a full topic scan whenever performing a pull query over a stream. This is inefficient when looking up specific sets of data, but necessary due to how pull queries are implemented over streams.

Describe the solution you'd like A clear and concise description of what you want to happen.

Ideally, ksqlDB should be able to perform optimizations on the pull query to make it more performant according to a defined time range of the query. For example:

-- Should only scan from 1654618081 
SELECT * FROM STREAM WHERE ROWTIME > 1654618081;

-- Should only scan between 1654618081 and 1654618080
SELECT * FROM STREAM WHERE ROWTIME < 1654618081 AND ROWTIME > 1654618080 ;

-- Should only scan to 1654618081
SELECT * FROM STREAM WHERE ROWTIME < 1654618081;

This should be possible given Kafka allows to seek to an offset according to their timestamp (this optimization may not be possible with user-defined custom ROWTIMEs).

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

No real alternative here.

Additional context Add any other context or screenshots about the feature request here.

コントリビューターガイド

調査方針: 現在の ksqlDB におけるプルクエリの実行方法、特にプルクエリエグゼキュータでの実行方法を調査します。目標は、Kafka トピックの読み取り方法を変更し、KafkaConsumer の offsetsForTimes メソッドを使用して、ROWTIME 条件（例：WHERE ROWTIME > timestamp）に基づいてシークすることです。これには、クエリプランの変換と Kafka ストリームリーダーの理解が必要です。ksql エンジン内の既存のフィルタリング最適化コードを確認し、時間範囲述語のプッシュダウンの実装を検討します。
技術スタック: java
領域: databasebackend
Issue 種別: 機能
難度: 3
推定時間: 1-2日
活動状況: アクティブ
明確さ: 明確
前提条件: KafkaJava
初心者向け度: 60

Repository metrics

説明

コントリビューターガイド

新着 Easy issues をメールで受け取る。