sgl-project/sglang-jax

[Feature] Add Multi LoRA Support

Open

#311 opened on Nov 4, 2025

View on GitHub
 (0 comments) (0 reactions) (1 assignee)Python (276 stars) (101 forks)auto 404
Featurecollaborationhelp wantedlora

Description

Motivation

We plan to add LoRA (Low-Rank Adaptation) support in the SGLang JAX framework. LoRA is one of the most important PEFT (Parameter-Efficient Fine-Tuning) methods.

From the inference perspective, supporting multiple LoRA adapters within a single inference service can significantly improve efficiency and reduce costs. From the post-training perspective, compared with full fine-tuning, LoRA can greatly reduce resource requirements (no need for optimizers proportional to the total parameter size), making post-training experiments much more efficient.

Road Map

The first version Initial Support Multi-LoRA https://github.com/sgl-project/sglang-jax/pull/435 is already merged.

  • Base Kernel
    • BGMV @aolemila
  • Support dynamic LoRA @JamesBrianD
  • Support static LoRA @aolemila
  • OpenAI compatible API
  • LoRA usage documentation
  • Advanced pallas kernel
    • SGMV
    • BGMV
  • LRU evictionk policy in LoRA Manager
  • More test cases
    • LoRA Layers test
    • LoRA Base Accuracy test(sgl_jax vs. bonsai + qwix)
    • BGMV test
    • multiple-host LoRA test

We warmly welcome more suggestions and contributions from the community.
If you have ideas for a more detailed feature breakdown, design improvements, performance optimizations, or implementation strategies, feel free to share them in this issue or open a PR/discussion.
Contributions of all kinds — including design proposals, code, documentation, and benchmarking — are greatly appreciated.

Related resources

Contributor guide