sgl-project/sglang

[Feature] Unified JIT / Precompilation Cache Directory

Open

Aperta il 1 mar 2026

Vedi su GitHub
 (2 commenti) (2 reazioni) (0 assegnatari)Python (28.442 star) (6216 fork)auto 404
good first issue

Descrizione

Checklist

Motivation

Summary

Request: Unify all JIT and precompilation cache paths under a single configurable root so that users and operators can manage cache location, size, and persistence in one place. The current situation is fragmented across multiple env vars and default paths (including /tmp), which makes it hard to reason about where compiled artifacts live and to reuse caches across runs or machines.

Current State (Fragmented)

1. Triton JIT cache

What Env var / source Default / behavior
Direct @triton.jit (e.g. allocator, MoE kernels) TRITON_CACHE_DIR (Triton runtime) Triton default: ~/.triton/cache; often overwritten by Inductor to a path under /tmp/torchinductor_*
SGLang override SGLANG_TRITON_CACHE_DIR (if implemented) e.g. ~/.triton/cache or ~/.cache/triton
  • Set only in some entry paths (engine, gRPC launcher, scheduler process); custom/PD launchers may not set it.
  • When PyTorch Inductor runs first, it can set TRITON_CACHE_DIR to its own subdir, so later Triton JIT (including allocator) writes under /tmp.

2. PyTorch Inductor (torch.compile)

What Env var / source Default / behavior
Inductor cache TORCHINDUCTOR_CACHE_DIR (PyTorch) /tmp/torchinductor_<user> (or similar)
SGLang override SGLANG_TORCHINDUCTOR_CACHE_DIR (if implemented) e.g. ~/.cache/sglang/inductor
  • Default lives in /tmp, so cache is often non-persistent and can conflict with other users on shared nodes.

3. DeepGEMM JIT cache

What Env var / source Default / behavior
DeepGEMM cache SGLANG_DG_CACHE_DIR / DG_JIT_CACHE_DIR ~/.cache/deep_gemm
  • Set in layers/deep_gemm_wrapper/compile_utils.py; separate from Triton/Inductor.

Problems

  1. No single root: Triton, Inductor, SGLang torch_compile, and DeepGEMM each have their own env or default; some write to /tmp, others to ~/.cache/....

Proposal

1. Introduce a single JIT cache root

  • New env var: SGLANG_JIT_CACHE_ROOT (or SGLANG_CACHE_ROOT if we want to align with existing SGLANG_CACHE_ROOT in custom_all_reduce_utils).
  • Default: ~/.cache/sglang (or $XDG_CACHE_HOME/sglang when set).
  • Semantics: All JIT/precompilation caches that SGLang controls should live under this root in fixed subdirs.

2. Standard layout under the root

Suggested subdirs (all under SGLANG_JIT_CACHE_ROOT):

Subdir Purpose Maps from
triton/ Triton JIT (direct @triton.jit and, when possible, Triton used by Inductor) TRITON_CACHE_DIR
inductor/ PyTorch Inductor (torch.compile) TORCHINDUCTOR_CACHE_DIR
torch_compile/ SGLang torch.compile cache (hash-based, when using SGLangBackend) SGLANG_CACHE_DIR + torch_compile_cache
deep_gemm/ DeepGEMM JIT SGLANG_DG_CACHE_DIR / DG_JIT_CACHE_DIR
  • If we keep backward compatibility, existing env vars (SGLANG_TRITON_CACHE_DIR, SGLANG_TORCHINDUCTOR_CACHE_DIR, SGLANG_CACHE_DIR, SGLANG_DG_CACHE_DIR) could override the default subdir path when set; otherwise they are derived as {SGLANG_JIT_CACHE_ROOT}/{subdir}.

CC @Fridge003 @hnyls2002

Related resources

No response

Guida contributor