delta-io/delta

[Kernel][Data skipping] Add the STARTS_WITH expression and support data skipping for it

Open

#2,539 建立於 2024年1月18日

在 GitHub 查看
 (8 留言) (0 反應) (1 負責人)Scala (8,807 star) (2,100 fork)batch import
enhancementgood first issuekernel

描述

Feature request

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Overview

Currently Kernel supports a limited set of expressions. We should 1) add the STARTS_WITH expression and 2) use file statistics to prune files based on the expression.

Motivation

Better file pruning.

Further details

This means we should

  1. add STARTS_WITH to the Kernel Predicate and support it in the kernel-defaults project
  2. Generate a data skipping filter according to the same rules we use in delta-spark

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

  • Yes. I can contribute this feature independently.
  • Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
  • No. I cannot contribute this feature at this time.

貢獻者指南