bazelbuild/bazel

Feature request: add include/exclude path filters to repository_ctx.extract and repository_ctx.download_and_extract

Open

#28858 opened on Mar 2, 2026

View on GitHub
 (7 comments) (1 reaction) (0 assignees)Java (25,384 stars) (4,465 forks)batch import
P2help wantedteam-ExternalDepsteam-Rules-APItype: feature request

Description

Description of the feature request:

Add include/exclude filtering semantics to both:

  • repository_ctx.extract(...)
  • repository_ctx.download_and_extract(...)

Today these APIs extract the full archive. For large upstream source archives, this is expensive even when only a small subset is needed.

Proposed direction (one possible shape):

  • include = [] (optional list of glob patterns)
  • exclude = [] (optional list of glob patterns)
  • default behavior unchanged when both are empty
  • exclude takes precedence over include
  • matching is against archive entry paths

This mirrors existing archive tooling behavior (for example bsdtar --include/--exclude).

Which category does this issue belong to?

External Dependency, Rules API, Performance

What underlying problem are you trying to solve with this feature?

In repository rules, sometimes we must download a large upstream archive but only need a few subtrees.

Real case: LLVM source release archive (llvm-project-21.1.8.src.tar.xz).

  • Full extraction: ~16.24s locally
  • Selective extraction (4 subprojects): ~1.89s locally

Measured locally:

Full extraction:

  • user: 10.40s
  • sys: 5.47s
  • wall: 16.24s

Selective extraction:

  • user: 0.77s
  • sys: 1.45s
  • wall: 1.89s

On cloud runners the gap is larger. Example trace on a 2-core runner:

  • Extracting llvm-project-21.1.8.src.tar.xz 519s

Current workaround is repacking/rehosting trimmed archives, but that is rigid and adds maintenance overhead. I prefer paying download bytes once, then extracting only needed paths.

Which operating system are you running Bazel on?

osx, linux, windows

What is the output of bazel info release?

9.0.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

Have you found anything relevant by searching the web?

bsdtar supports include/exclude archive entry filtering (--include, --exclude)

Any other information, logs, or outputs that you want to share?

If this direction is acceptable, I can try to work a PR with:

  • API proposal for both methods
  • tests for include-only, exclude-only, include+exclude precedence
  • docs updates for matching semantics and defaults

Contributor guide