sgl-project/sglang
View on GitHub[Feature] Unified JIT / Precompilation Cache Directory
Open
#19,612 opened on Mar 1, 2026
good first issue
Description
Checklist
- If this is not a feature request but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- Please use English. Otherwise, it will be closed.
Motivation
Summary
Request: Unify all JIT and precompilation cache paths under a single configurable root so that users and operators can manage cache location, size, and persistence in one place. The current situation is fragmented across multiple env vars and default paths (including /tmp), which makes it hard to reason about where compiled artifacts live and to reuse caches across runs or machines.
Current State (Fragmented)
1. Triton JIT cache
| What | Env var / source | Default / behavior |
|---|---|---|
Direct @triton.jit (e.g. allocator, MoE kernels) |
TRITON_CACHE_DIR (Triton runtime) |
Triton default: ~/.triton/cache; often overwritten by Inductor to a path under /tmp/torchinductor_* |
| SGLang override | SGLANG_TRITON_CACHE_DIR (if implemented) |
e.g. ~/.triton/cache or ~/.cache/triton |
- Set only in some entry paths (engine, gRPC launcher, scheduler process); custom/PD launchers may not set it.
- When PyTorch Inductor runs first, it can set
TRITON_CACHE_DIRto its own subdir, so later Triton JIT (including allocator) writes under/tmp.
2. PyTorch Inductor (torch.compile)
| What | Env var / source | Default / behavior |
|---|---|---|
| Inductor cache | TORCHINDUCTOR_CACHE_DIR (PyTorch) |
/tmp/torchinductor_<user> (or similar) |
| SGLang override | SGLANG_TORCHINDUCTOR_CACHE_DIR (if implemented) |
e.g. ~/.cache/sglang/inductor |
- Default lives in
/tmp, so cache is often non-persistent and can conflict with other users on shared nodes.
3. DeepGEMM JIT cache
| What | Env var / source | Default / behavior |
|---|---|---|
| DeepGEMM cache | SGLANG_DG_CACHE_DIR / DG_JIT_CACHE_DIR |
~/.cache/deep_gemm |
- Set in
layers/deep_gemm_wrapper/compile_utils.py; separate from Triton/Inductor.
Problems
- No single root: Triton, Inductor, SGLang torch_compile, and DeepGEMM each have their own env or default; some write to
/tmp, others to~/.cache/....
Proposal
1. Introduce a single JIT cache root
- New env var:
SGLANG_JIT_CACHE_ROOT(orSGLANG_CACHE_ROOTif we want to align with existingSGLANG_CACHE_ROOTin custom_all_reduce_utils). - Default:
~/.cache/sglang(or$XDG_CACHE_HOME/sglangwhen set). - Semantics: All JIT/precompilation caches that SGLang controls should live under this root in fixed subdirs.
2. Standard layout under the root
Suggested subdirs (all under SGLANG_JIT_CACHE_ROOT):
| Subdir | Purpose | Maps from |
|---|---|---|
triton/ |
Triton JIT (direct @triton.jit and, when possible, Triton used by Inductor) |
TRITON_CACHE_DIR |
inductor/ |
PyTorch Inductor (torch.compile) | TORCHINDUCTOR_CACHE_DIR |
torch_compile/ |
SGLang torch.compile cache (hash-based, when using SGLangBackend) | SGLANG_CACHE_DIR + torch_compile_cache |
deep_gemm/ |
DeepGEMM JIT | SGLANG_DG_CACHE_DIR / DG_JIT_CACHE_DIR |
- If we keep backward compatibility, existing env vars (
SGLANG_TRITON_CACHE_DIR,SGLANG_TORCHINDUCTOR_CACHE_DIR,SGLANG_CACHE_DIR,SGLANG_DG_CACHE_DIR) could override the default subdir path when set; otherwise they are derived as{SGLANG_JIT_CACHE_ROOT}/{subdir}.
CC @Fridge003 @hnyls2002
Related resources
No response