pola-rs/polars

scan_delta file skip predicate not working for bool dtype

Open

#26,290 创建于 2026年1月26日

在 GitHub 查看
 (7 评论) (1 反应) (0 负责人)Rust (38,496 star) (2,826 fork)batch import
A-io-deltaA-io-parquetP-lowenhancementgood first issueperformancepythonupstream issue

描述

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

tmp_path = "./tmp"
df = pl.DataFrame(
    {
        "p": [10, 10, 20, 20],
        "a": [1, 2, 3, None],
        "b": [False, False, True, None]
    }
)

df.write_delta(
    tmp_path,
    delta_write_options={"partition_by": "p"},
)

expr = pl.col.a.is_null()
out = pl.scan_delta(tmp_path).filter(expr).collect()

# filter works
expr = pl.col.b.is_null()
out = pl.scan_delta(tmp_path).filter(expr).collect()

# filter does not work
expr = pl.col.b == pl.lit(False)
out = pl.scan_delta(tmp_path).filter(expr).collect()

Log output

$ POLARS_VERBOSE=1 pp issue_bool_mre.py 2>&1  | grep skipping
initialize_scan_predicate: Predicate pushdown allows skipping 1 / 2 files
initialize_scan_predicate: Predicate pushdown allows skipping 1 / 2 files
initialize_scan_predicate: Predicate pushdown allows skipping 0 / 2 files

Issue description

Equality on booleans is not supported in predicate pushdown of delta files.

Expected behavior

No clear reason why it would not be supported. Snapshot of the delta json file with statistics:

{"add":{"path":"p=10/part-00000-059a7e15-37b4-450c-acf9-c52cb10d4c59-c000.snappy.parquet","partitionValues":{"p":"10"},"size":688,"modificationTime":1769434619583,"dataChange":true,"stats":"{\"numRecords\":2,\"minValues\":{\"a\":1,\"b\":false},\"maxValues\":{\"b\":false,\"a\":2},\"nullCount\":{\"a\":0,\"b\":0}}","tags":null,"baseRowId":null,"defaultRowCommitVersion":null,"clusteringProvider":null}}

{"add":{"path":"p=20/part-00000-d0d5a29b-6f86-44d5-ae3a-794d33c73da2-c000.snappy.parquet","partitionValues":{"p":"20"},"size":677,"modificationTime":1769434619583,"dataChange":true,"stats":"{\"numRecords\":2,\"minValues\":{\"b\":true,\"a\":3},\"maxValues\":{\"b\":true,\"a\":3},\"nullCount\":{\"b\":1,\"a\":1}}","tags":null,"baseRowId":null,"defaultRowCommitVersion":null,"clusteringProvider":null}}

To be confirmed.

Installed versions

Latest main (803b8e4cb1), post 1.37.1

贡献者指南