[Feature] Unified JIT / Precompilation Cache Directory · sgl-project/sglang#19612

(3 comments) (2 reactions) (0 assignees)Python (6,216 forks)auto 404

good first issue

Repository metrics

Stars: (28,442 stars)
PR merge metrics: (Avg merge 2d 1h) (1,000 merged PRs in 30d)

Description

Checklist

If this is not a feature request but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
Please use English. Otherwise, it will be closed.

Motivation

Summary

Request: Unify all JIT and precompilation cache paths under a single configurable root so that users and operators can manage cache location, size, and persistence in one place. The current situation is fragmented across multiple env vars and default paths (including /tmp), which makes it hard to reason about where compiled artifacts live and to reuse caches across runs or machines.

Current State (Fragmented)

1. Triton JIT cache

What	Env var / source	Default / behavior
Direct `@triton.jit` (e.g. allocator, MoE kernels)	`TRITON_CACHE_DIR` (Triton runtime)	Triton default: `~/.triton/cache`; often overwritten by Inductor to a path under `/tmp/torchinductor_*`
SGLang override	`SGLANG_TRITON_CACHE_DIR` (if implemented)	e.g. `~/.triton/cache` or `~/.cache/triton`

Set only in some entry paths (engine, gRPC launcher, scheduler process); custom/PD launchers may not set it.
When PyTorch Inductor runs first, it can set TRITON_CACHE_DIR to its own subdir, so later Triton JIT (including allocator) writes under /tmp.

2. PyTorch Inductor (torch.compile)

What	Env var / source	Default / behavior
Inductor cache	`TORCHINDUCTOR_CACHE_DIR` (PyTorch)	`/tmp/torchinductor_<user>` (or similar)
SGLang override	`SGLANG_TORCHINDUCTOR_CACHE_DIR` (if implemented)	e.g. `~/.cache/sglang/inductor`

Default lives in /tmp, so cache is often non-persistent and can conflict with other users on shared nodes.

3. DeepGEMM JIT cache

What	Env var / source	Default / behavior
DeepGEMM cache	`SGLANG_DG_CACHE_DIR` / `DG_JIT_CACHE_DIR`	`~/.cache/deep_gemm`

Set in layers/deep_gemm_wrapper/compile_utils.py; separate from Triton/Inductor.

Problems

No single root: Triton, Inductor, SGLang torch_compile, and DeepGEMM each have their own env or default; some write to /tmp, others to ~/.cache/....

Proposal

1. Introduce a single JIT cache root

New env var: SGLANG_JIT_CACHE_ROOT (or SGLANG_CACHE_ROOT if we want to align with existing SGLANG_CACHE_ROOT in custom_all_reduce_utils).
Default: ~/.cache/sglang (or $XDG_CACHE_HOME/sglang when set).
Semantics: All JIT/precompilation caches that SGLang controls should live under this root in fixed subdirs.

2. Standard layout under the root

Suggested subdirs (all under SGLANG_JIT_CACHE_ROOT):

Subdir	Purpose	Maps from
`triton/`	Triton JIT (direct `@triton.jit` and, when possible, Triton used by Inductor)	`TRITON_CACHE_DIR`
`inductor/`	PyTorch Inductor (torch.compile)	`TORCHINDUCTOR_CACHE_DIR`
`torch_compile/`	SGLang torch.compile cache (hash-based, when using SGLangBackend)	`SGLANG_CACHE_DIR` + `torch_compile_cache`
`deep_gemm/`	DeepGEMM JIT	`SGLANG_DG_CACHE_DIR` / `DG_JIT_CACHE_DIR`

If we keep backward compatibility, existing env vars (SGLANG_TRITON_CACHE_DIR, SGLANG_TORCHINDUCTOR_CACHE_DIR, SGLANG_CACHE_DIR, SGLANG_DG_CACHE_DIR) could override the default subdir path when set; otherwise they are derived as {SGLANG_JIT_CACHE_ROOT}/{subdir}.

CC @Fridge003 @hnyls2002

Related resources

No response

Contributor guide

Research direction: Implement a unified JIT cache root environment variable (e.g., SGLANG JIT CACHE ROOT) and update all relevant cache paths (Triton, Inductor, DeepGEMM) to use subdirectories under this root. Ensure backward compatibility by allowing existing env vars to override subdirectory defaults.
Tech stack: python
Domain: backendbuild systemdeveloper experience
Issue type: Feature
Difficulty: 3
Estimated time: Half day
Activity status: Active
Clarity: Clear
Prerequisites: PythonGit
Newbie friendliness: 60