[Feature] Add Multi LoRA Support · sgl-project/sglang-jax#311

Repository metrics

Stars: (276 stars)
PR merge metrics: (Avg merge 5d 22h) (86 merged PRs in 30d)

Description

Motivation

We plan to add LoRA (Low-Rank Adaptation) support in the SGLang JAX framework. LoRA is one of the most important PEFT (Parameter-Efficient Fine-Tuning) methods.

From the inference perspective, supporting multiple LoRA adapters within a single inference service can significantly improve efficiency and reduce costs. From the post-training perspective, compared with full fine-tuning, LoRA can greatly reduce resource requirements (no need for optimizers proportional to the total parameter size), making post-training experiments much more efficient.

Road Map

The first version Initial Support Multi-LoRA https://github.com/sgl-project/sglang-jax/pull/435 is already merged.

Design Docs [Docs] Multi LoRA Design Documentation #321
Release Initial Code @JamesBrianD https://github.com/sgl-project/sglang-jax/pull/432
- New interface and new configuration
- Request pipeline with lora_path
Core Components https://github.com/sgl-project/sglang-jax/pull/435
- Implement LoRAAdapter @pathfinder-pf
- Implement lora_layer with jax native kernel @aolemila
- LoRAManager Implementation @pathfinder-pf
- Implement prepare_lora_batch(), forward_batch() @JamesBrianD
JIT Precompilation @aolemila
- LoRA-specific precompilation

Base Kernel
- BGMV @aolemila

More test cases
- LoRA Layers test
- LoRA Base Accuracy test(sgl_jax vs. bonsai + qwix)
- BGMV test
- multiple-host LoRA test

We warmly welcome more suggestions and contributions from the community.
If you have ideas for a more detailed feature breakdown, design improvements, performance optimizations, or implementation strategies, feel free to share them in this issue or open a PR/discussion.
Contributions of all kinds — including design proposals, code, documentation, and benchmarking — are greatly appreciated.

Related resources

Contributor guide

Research direction: Focus on the remaining subtasks such as 'OpenAI compatible API' or 'LoRA usage documentation'. Start by understanding the existing multi LoRA implementation in the codebase, particularly the LoRAAdapter and LoRAManager classes, and then contribute to a specific subtask.
Tech stack: python
Domain: backend
Issue type: Feature
Difficulty: 4
Estimated time: Over 1 week
Activity status: Active
Clarity: Clear
Prerequisites: GitPython
Newbie friendliness: 20