pola-rs/polars
Vedi su GitHubscan_delta file skip predicate not working for bool dtype
Open
#26.290 aperta il 26 gen 2026
A-io-deltaA-io-parquetP-lowenhancementgood first issueperformancepythonupstream issue
Metriche repository
- Star
- (38.496 star)
- Metriche merge PR
- (Merge medio 3g 18h) (175 PR mergiate in 30 g)
Descrizione
Checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of Polars.
Reproducible example
import polars as pl
tmp_path = "./tmp"
df = pl.DataFrame(
{
"p": [10, 10, 20, 20],
"a": [1, 2, 3, None],
"b": [False, False, True, None]
}
)
df.write_delta(
tmp_path,
delta_write_options={"partition_by": "p"},
)
expr = pl.col.a.is_null()
out = pl.scan_delta(tmp_path).filter(expr).collect()
# filter works
expr = pl.col.b.is_null()
out = pl.scan_delta(tmp_path).filter(expr).collect()
# filter does not work
expr = pl.col.b == pl.lit(False)
out = pl.scan_delta(tmp_path).filter(expr).collect()
Log output
$ POLARS_VERBOSE=1 pp issue_bool_mre.py 2>&1 | grep skipping
initialize_scan_predicate: Predicate pushdown allows skipping 1 / 2 files
initialize_scan_predicate: Predicate pushdown allows skipping 1 / 2 files
initialize_scan_predicate: Predicate pushdown allows skipping 0 / 2 files
Issue description
Equality on booleans is not supported in predicate pushdown of delta files.
Expected behavior
No clear reason why it would not be supported. Snapshot of the delta json file with statistics:
{"add":{"path":"p=10/part-00000-059a7e15-37b4-450c-acf9-c52cb10d4c59-c000.snappy.parquet","partitionValues":{"p":"10"},"size":688,"modificationTime":1769434619583,"dataChange":true,"stats":"{\"numRecords\":2,\"minValues\":{\"a\":1,\"b\":false},\"maxValues\":{\"b\":false,\"a\":2},\"nullCount\":{\"a\":0,\"b\":0}}","tags":null,"baseRowId":null,"defaultRowCommitVersion":null,"clusteringProvider":null}}
{"add":{"path":"p=20/part-00000-d0d5a29b-6f86-44d5-ae3a-794d33c73da2-c000.snappy.parquet","partitionValues":{"p":"20"},"size":677,"modificationTime":1769434619583,"dataChange":true,"stats":"{\"numRecords\":2,\"minValues\":{\"b\":true,\"a\":3},\"maxValues\":{\"b\":true,\"a\":3},\"nullCount\":{\"b\":1,\"a\":1}}","tags":null,"baseRowId":null,"defaultRowCommitVersion":null,"clusteringProvider":null}}
To be confirmed.
Installed versions
Latest main (803b8e4cb1), post 1.37.1