delta-io/delta

[Kernel][Data skipping] Add the STARTS_WITH expression and support data skipping for it

Open

#2,539 opened on Jan 18, 2024

View on GitHub
 (8 comments) (0 reactions) (1 assignee)Scala (8,807 stars) (2,100 forks)batch import
enhancementgood first issuekernel

Description

Feature request

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Overview

Currently Kernel supports a limited set of expressions. We should 1) add the STARTS_WITH expression and 2) use file statistics to prune files based on the expression.

Motivation

Better file pruning.

Further details

This means we should

  1. add STARTS_WITH to the Kernel Predicate and support it in the kernel-defaults project
  2. Generate a data skipping filter according to the same rules we use in delta-spark

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

  • Yes. I can contribute this feature independently.
  • Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
  • No. I cannot contribute this feature at this time.

Contributor guide