Feature request: add include/exclude path filters to repository_ctx.extract and repository_ctx.download_and_extract
#28858 opened on Mar 2, 2026
Description
Description of the feature request:
Add include/exclude filtering semantics to both:
repository_ctx.extract(...)repository_ctx.download_and_extract(...)
Today these APIs extract the full archive. For large upstream source archives, this is expensive even when only a small subset is needed.
Proposed direction (one possible shape):
include = [](optional list of glob patterns)exclude = [](optional list of glob patterns)- default behavior unchanged when both are empty
excludetakes precedence overinclude- matching is against archive entry paths
This mirrors existing archive tooling behavior (for example bsdtar --include/--exclude).
Which category does this issue belong to?
External Dependency, Rules API, Performance
What underlying problem are you trying to solve with this feature?
In repository rules, sometimes we must download a large upstream archive but only need a few subtrees.
Real case: LLVM source release archive (llvm-project-21.1.8.src.tar.xz).
- Full extraction: ~16.24s locally
- Selective extraction (4 subprojects): ~1.89s locally
Measured locally:
Full extraction:
- user: 10.40s
- sys: 5.47s
- wall: 16.24s
Selective extraction:
- user: 0.77s
- sys: 1.45s
- wall: 1.89s
On cloud runners the gap is larger. Example trace on a 2-core runner:
Extracting llvm-project-21.1.8.src.tar.xz 519s
Current workaround is repacking/rehosting trimmed archives, but that is rigid and adds maintenance overhead. I prefer paying download bytes once, then extracting only needed paths.
Which operating system are you running Bazel on?
osx, linux, windows
What is the output of bazel info release?
9.0.0
If bazel info release returns development version or (@non-git), tell us how you built Bazel.
No response
What's the output of git remote get-url origin; git rev-parse HEAD ?
Have you found anything relevant by searching the web?
bsdtar supports include/exclude archive entry filtering (--include, --exclude)
Any other information, logs, or outputs that you want to share?
If this direction is acceptable, I can try to work a PR with:
- API proposal for both methods
- tests for include-only, exclude-only, include+exclude precedence
- docs updates for matching semantics and defaults