rapidsai/cudf

[FEA] Add a method to check if row group stats are available

Open

#17,864 建立於 2025年1月29日

在 GitHub 查看
 (0 留言) (0 反應) (0 負責人)C++ (6,000 star) (735 fork)batch import
0 - Backloggood first issue

描述

Is your feature request related to a problem? Please describe. Currently, there isn't a method in predicate_pushdown.cpp (filter_row_groups) to check if row group stats aren't available. Hence, we can't ever set num_surviving_row_groups.after_stats_filter to std::nullopt if stats are unavailable.

Describe the solution you'd like We should figure out a relatively cheaper way to distinguish if row group stats aren't available if possible.

Describe alternatives you've considered We currently build the entire statsAST table in predicate_pushdown and try to filter row groups with it. In case no row groups are filtered, we can't distinguish if this was due to ineffective filter or missing stats.

Additional context Originally posted by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/17594#discussion_r1932968249

貢獻者指南