rapidsai/cudf

[FEA] Add a method to check if row group stats are available

Open

#17,864 opened on Jan 29, 2025

View on GitHub
 (0 comments) (0 reactions) (0 assignees)C++ (6,000 stars) (735 forks)batch import
0 - Backloggood first issue

Description

Is your feature request related to a problem? Please describe. Currently, there isn't a method in predicate_pushdown.cpp (filter_row_groups) to check if row group stats aren't available. Hence, we can't ever set num_surviving_row_groups.after_stats_filter to std::nullopt if stats are unavailable.

Describe the solution you'd like We should figure out a relatively cheaper way to distinguish if row group stats aren't available if possible.

Describe alternatives you've considered We currently build the entire statsAST table in predicate_pushdown and try to filter row groups with it. In case no row groups are filtered, we can't distinguish if this was due to ineffective filter or missing stats.

Additional context Originally posted by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/17594#discussion_r1932968249

Contributor guide

[FEA] Add a method to check if row group stats are available · rapidsai/cudf#17864 | Good First Issue