Feature Request: Mapping model name to LoRA config · ggml-org/llama.cpp#11031

(6 comments) (2 reactions) (0 assignees)C++ (110,169 stars) (18,202 forks)batch import

enhancementgood first issueserver

Description

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

I came across this idea while working on #10994

The idea is that we can maintain a list of model name mapped to LoRA config, for example:

{
    "llama-base":               [{"id": 0, "scale": 0.0}, {"id": 1, "scale": 0.0}],
    "llama-story":              [{"id": 0, "scale": 1.0}, {"id": 1, "scale": 0.0}],
    "llama-abliteration":       [{"id": 0, "scale": 0.0}, {"id": 1, "scale": 1.0}],
    "llama-story-abliteration": [{"id": 0, "scale": 0.5}, {"id": 1, "scale": 0.5}]
}

Then, user can switch the model by specifying model in the request, for example:

# first user:
{
    "model": "llama-story-abliteration",
    "messages": [
        {"role": "user", "content": "Write a NSFW story"}
    ]
}

# second user:
{
    "model": "llama-base",
    "messages": [
        {"role": "user", "content": "Is this NSFW?"}
    ]
}

Motivation

N/A

Possible Implementation

No response

Contributor guide

Tech stack: cpprest api
Domain: backendapimachine learning
Issue type: feature
Difficulty: 3
Estimated time: 3-5 days
Activity status: fresh
Clarity: clear
Prerequisites: Experience with llama.cpp serverUnderstanding of LoRA configs
Newbie friendliness: 25
Research direction: Research the current LoRA adapter loading mechanism in llama.cpp's server code. Review the implementation in the server (likely in `examples/server/server.cpp` or similar) that handles model loading. Also examine PR #10994 for context on potential integration points. Investigate how to map model names to LoRA configurations and modify the server's API to accept a model field.