rapidsai/cudf

[FEA] Add a method to check if row group stats are available

Open

#17,864 opened on 2025年1月29日

GitHub で見る
 (0 comments) (0 reactions) (0 assignees)C++ (6,000 stars) (735 forks)batch import
0 - Backloggood first issue

説明

Is your feature request related to a problem? Please describe. Currently, there isn't a method in predicate_pushdown.cpp (filter_row_groups) to check if row group stats aren't available. Hence, we can't ever set num_surviving_row_groups.after_stats_filter to std::nullopt if stats are unavailable.

Describe the solution you'd like We should figure out a relatively cheaper way to distinguish if row group stats aren't available if possible.

Describe alternatives you've considered We currently build the entire statsAST table in predicate_pushdown and try to filter row groups with it. In case no row groups are filtered, we can't distinguish if this was due to ineffective filter or missing stats.

Additional context Originally posted by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/17594#discussion_r1932968249

コントリビューターガイド