pola-rs/polars
Voir sur GitHubscan_delta file skip predicate not working for bool dtype
Open
#26 290 ouverte le 26 janv. 2026
A-io-deltaA-io-parquetP-lowenhancementgood first issueperformancepythonupstream issue
Métriques du dépôt
- Stars
- (38 496 stars)
- Métriques de merge PR
- (Merge moyen 3j 18h) (175 PRs mergées en 30 j)
Description
Checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of Polars.
Reproducible example
import polars as pl
tmp_path = "./tmp"
df = pl.DataFrame(
{
"p": [10, 10, 20, 20],
"a": [1, 2, 3, None],
"b": [False, False, True, None]
}
)
df.write_delta(
tmp_path,
delta_write_options={"partition_by": "p"},
)
expr = pl.col.a.is_null()
out = pl.scan_delta(tmp_path).filter(expr).collect()
# filter works
expr = pl.col.b.is_null()
out = pl.scan_delta(tmp_path).filter(expr).collect()
# filter does not work
expr = pl.col.b == pl.lit(False)
out = pl.scan_delta(tmp_path).filter(expr).collect()
Log output
$ POLARS_VERBOSE=1 pp issue_bool_mre.py 2>&1 | grep skipping
initialize_scan_predicate: Predicate pushdown allows skipping 1 / 2 files
initialize_scan_predicate: Predicate pushdown allows skipping 1 / 2 files
initialize_scan_predicate: Predicate pushdown allows skipping 0 / 2 files
Issue description
Equality on booleans is not supported in predicate pushdown of delta files.
Expected behavior
No clear reason why it would not be supported. Snapshot of the delta json file with statistics:
{"add":{"path":"p=10/part-00000-059a7e15-37b4-450c-acf9-c52cb10d4c59-c000.snappy.parquet","partitionValues":{"p":"10"},"size":688,"modificationTime":1769434619583,"dataChange":true,"stats":"{\"numRecords\":2,\"minValues\":{\"a\":1,\"b\":false},\"maxValues\":{\"b\":false,\"a\":2},\"nullCount\":{\"a\":0,\"b\":0}}","tags":null,"baseRowId":null,"defaultRowCommitVersion":null,"clusteringProvider":null}}
{"add":{"path":"p=20/part-00000-d0d5a29b-6f86-44d5-ae3a-794d33c73da2-c000.snappy.parquet","partitionValues":{"p":"20"},"size":677,"modificationTime":1769434619583,"dataChange":true,"stats":"{\"numRecords\":2,\"minValues\":{\"b\":true,\"a\":3},\"maxValues\":{\"b\":true,\"a\":3},\"nullCount\":{\"b\":1,\"a\":1}}","tags":null,"baseRowId":null,"defaultRowCommitVersion":null,"clusteringProvider":null}}
To be confirmed.
Installed versions
Latest main (803b8e4cb1), post 1.37.1