delta-io/delta

fix: Java Kernel data skipping uses case-sensitive column matching

Open

#6,247 建立於 2026年3月11日

在 GitHub 查看
 (2 留言) (0 反應) (0 負責人)Scala (8,807 star) (2,100 fork)batch import
buggood first issue

描述

Description

Delta column names are case-insensitive per the protocol spec ("All column names must be unique regardless of casing"). Delta Spark uses equalsIgnoreCase when resolving predicate column references against the table schema in the data skipping path (via findNestedFieldIgnoreCase).

However, Java Kernel's StatsSchemaHelper uses case-sensitive matching. The Column class uses Arrays.equals(names, other.getNames()) for equality, and the HashMap lookups in StatsSchemaHelper.getLogicalToPhysicalColumnAndDataType() are therefore case-sensitive. This means a predicate like col > 5 will fail to match a schema column named Col, and data skipping will not be applied.

Steps to reproduce

  1. Create a Delta table with a column named Value (mixed case)
  2. Query with a predicate using a differently-cased column name, e.g., value > 100
  3. Data skipping will not be applied because the column lookup fails

Expected behavior

Case-insensitive column matching in the data skipping path, consistent with Delta Spark which uses equalsIgnoreCase in findNestedFieldIgnoreCase.

Relevant code

  • kernel-api/src/main/java/io/delta/kernel/internal/skipping/StatsSchemaHelper.java — builds column maps using exact field names, HashMap lookups are case-sensitive
  • kernel/expressions/Column.javaequals() uses Arrays.equals(names, other.getNames()) (case-sensitive)

References

貢獻者指南