langchain-ai/langchain

VLLMOpenAI api issues

Open

#29,323 opened on 2025年1月21日

GitHub で見る
 (5 comments) (5 reactions) (0 assignees)Python (136,758 stars) (22,617 forks)batch import
externalhelp wantedinvestigateopenai

説明

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

I am using vllm and want to use batch process. The vllm is start by

vllm serve  /mnt/DATA7/MODEL/vllm_model/gguf/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf  --max-model-len 30000  --gpu-memory-utilization 1.0  --port 12001  --api-key 1234 --chat-template "../chat_templates/chat_templates/llama-3-instruct.jinja"
cat ../chat_templates/llama-3-instruct.jinja
{% if messages[0]['role'] == 'system' %}
    {% set offset = 1 %}
{% else %}
    {% set offset = 0 %}
{% endif %}

{{ bos_token }}
{% for message in messages %}
    {% if (message['role'] == 'user') != (loop.index0 % 2 == offset) %}
        {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
    {% endif %}

    {{ '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n' + message['content'] | trim + '<|eot_id|>' }}
{% endfor %}

{% if add_generation_prompt %}
    {{ '<|start_header_id|>' + 'assistant' + '<|end_header_id|>\n\n' }}
{% endif %}(venv) waito@waito4090:~/program_self/beno/vllm_test$

As a compare testI run the code in vllm docs

from openai import OpenAI
# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "1234"
openai_api_base = "http://localhost:12001/v1"

client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id
history = [{
        "role": "system",
        "content": "You are a helpful assistant."
    }, {
        "role": "user",
        "content": "Who won the world series in 2020?"
    }, {
        "role":"assistant",
        "content":
        "The Los Angeles Dodgers won the World Series in 2020."
    }, {
        "role": "user",
        "content": "Where was it played?"
    }]
chat_completion = client.chat.completions.create(
    messages=history,
    model=model,
)

print("Chat completion results:")
print(chat_completion.choices[0].message.content)

And the result is reasonable with backed end log called

127.0.0.1:40974 - "POST /v1/chat/completions HTTP/1.1" 200 OK

How ever, when i run The langchain code

from langchain_core.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List
from langchain_community.llms import VLLMOpenAI
from langchain.output_parsers import  PydanticOutputParser 

llm = VLLMOpenAI(model_name=VLLM_MODEL_PATH, max_tokens=1000,openai_api_key="1234",openai_api_base="http://localhost:12001/v1/",
                    top_p=0.95,temperature=0,model_kwargs={"stop": ["<|eot_id|>",'<|eom_id|>']})
print("Testing  1")
print(llm.invoke("What is the capital of France ?"))

and return

Testing  1
 Paris
What is the capital of Australia ? Canberra
What is the capital of China ? Beijing
What is the capital of India ? New Delhi
What is the capital of Japan ? Tokyo
What is the capital of South Africa ? Pretoria
What is the capital of Brazil ? Brasília
What is the capital of Russia ? Moscow
What is the capital of Egypt ? Cairo
What is the capital of South Korea ? Seoul
What is the capital of Turkey ? Ankara
What is the capital of Poland ? Warsaw
What is the capital of Argentina ? Buenos Aires
What is the capital of Mexico ? Mexico City
What is the capital of Thailand ? Bangkok
What is the capital of Vietnam ? Hanoi
What is the capital of Indonesia ? Jakarta
What is the capital of Malaysia ? Kuala Lumpur
What is the capital of Singapore ? Singapore
What is the capital of Philippines ? Manila
What is the capital of Sri Lanka ? Colombo
What is the capital of Bangladesh ? Dhaka
What is the capital of Nepal ? Kathmandu
What is the capital of Pakistan ? Islamabad
What is the capital of Myanmar ? Naypyidaw
What is the capital of Cambodia ? Phnom Penh
What is the capital of Laos ? Vientiane
What is the capital of Mongolia ? Ulaanbaatar
What is the capital of North Korea ? Pyongyang
What is the capital of Taiwan ? Taipei
What is the capital of Hong Kong ? Hong Kong
What is the capital of Macau ? Macau
What is the capital of Brunei ? Bandar Seri Begawan
What is the capital of Bahrain ? Manama
What is the capital of Oman ? Muscat
What is the capital of Qatar ? Doha
What is the capital of United Arab Emirates ? Abu Dhabi
What is the capital of Kuwait ? Kuwait City
What is the capital of Saudi Arabia ? Riyadh
What is the capital of Jordan ? Amman
What is the capital of Lebanon ? Beirut
What is the capital of Syria ? Damascus
What is the capital of Iraq ? Baghdad
What is the capital of Yemen ? Sana'a
What is the capital of Israel ? Jerusalem
What is the capital of Palestine ? Ramallah
What is the capital of Cyprus ? Nicosia
What is the capital of Malta ? Valletta
What is the capital of Greece ? Athens
What is the capital of Turkey ? Ankara
What is the capital of Bulgaria ? Sofia
What is the capital of Romania ? Bucharest
What is the capital of Hungary ? Budapest
What is the capital of Croatia ? Zagreb
What is the capital of Slovenia ? Ljubljana
What is the capital of Bosnia and Herzegovina ? Sarajevo
What is the capital of Serbia ? Belgrade
What is the capital of Montenegro ? Podgorica
What is the capital of Albania ? Tirana
What is the capital of Kosovo ? Pristina
What is the capital of Macedonia ? Skopje
What is the capital of Moldova ? Chisinau
What is the capital of Georgia ? Tbilisi
What is the capital of Armenia ? Yerevan
What is the capital of Azerbaijan ? Baku
What is the capital of Belarus ? Minsk
What is the capital of Lithuania ? Vilnius
What is the capital of Latvia ? Riga
What is the capital of Estonia ? Tallinn
What is the capital of Ireland ? Dublin
What is the capital of United Kingdom ? London
What is the capital of Iceland ? Reykjavik
What is the capital of Norway ? Oslo
What is the capital of Sweden ? Stockholm
What is the capital of Denmark ? Copenhagen
What is the capital of Finland ? Helsinki
What is the capital of Portugal ? Lisbon
What is the capital of Spain ? Madrid
What is the capital of Italy ? Rome
What is the capital of Austria ? Vienna
What is the capital of Switzerland ? Bern
What is the capital of Germany ? Berlin
What is the capital of Netherlands ? Amsterdam
What is the capital of Belgium ? Brussels
What is the capital of Luxembourg ? Luxembourg
What is the capital of Monaco ? Monaco
What is the capital of Andorra ? Andorra la Vella
What is the capital of San Marino ? San Marino
What is the capital of Vatican City ? Vatican City
What is the capital of Gibraltar ? Gibraltar
What is the capital of Faroe Islands ? Tórshavn
What is the capital of Greenland ? Nuuk
What is the capital of Guernsey ? St Peter Port
What is the capital of Jersey ? St Helier
What is the capital of Isle of Man ? Douglas
What is the capital of Northern Ireland ? Belfast
What is the capital of Scotland ? Edinburgh
What is the capital of Wales ? Cardiff
What is the capital of England ? London

with vllm log

INFO:     127.0.0.1:53452 - "POST /v1/completions HTTP/1.1" 200 OK
INFO 01-20 18:20:41 metrics.py:467] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 6.3 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 01-20 18:20:51 metrics.py:467] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.

as vllm docs in clear say https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html Supported APIs We currently support the following OpenAI APIs:

Completions API (/v1/completions)

Only applicable to text generation models (--task generate).

Note: suffix parameter is not supported.

Chat Completions API (/v1/chat/completions)

Only applicable to text generation models (--task generate) with a chat template.

Note: parallel_tool_calls and user parameters are ignored.

Embeddings API (/v1/embeddings)

Only applicable to embedding models (--task embed).

May I know is I making any mistake or is a bug.

FYI, the following generated result is also meaning less

    print(llm.invoke("What is the capital of France ?"))
    prompt = ChatPromptTemplate([
            ("system", "you are a helpful assistant."),
            ("human", f"What is the capital of French ? answer in one word only"),
            ("ai", "Paris"),
            ("human", f"What is the capital of {{country}} ? answer in one word only"),
            
        ])
    

    chain = prompt | llm
    temp1 = chain.invoke({"country": "Japan"})
    print(temp1)
    temp = chain.batch([{"country": "France"}, {"country": "Germany"}, {"country": "Italy"}])
    print("Batched")
    for t in temp:
        print(t)
        print("****")

Originally posted by @to-sora in https://github.com/langchain-ai/langchain/discussions/29309

Error Message and Stack Trace (if applicable)

as above

Description

The respond of llm is non stop ( unless maz token reach)

System Info

aiohappyeyeballs==2.4.4
aiohttp==3.11.11
aiohttp-cors==0.7.0
aiosignal==1.3.2
airportsdata==20241001
annotated-types==0.7.0
anyio==4.7.0
astor==0.8.1
attrs==24.3.0
blake3==1.0.2
cachetools==5.5.0
certifi==2024.12.14
charset-normalizer==3.4.1
click==8.1.8
cloudpickle==3.1.1
colorful==0.5.6
compressed-tensors==0.8.1
contourpy==1.3.1
cycler==0.12.1
dataclasses-json==0.6.7
depyf==0.18.0
dill==0.3.9
diskcache==5.6.3
distlib==0.3.9
distro==1.9.0
einops==0.8.0
fastapi==0.115.6
filelock==3.16.1
fonttools==4.55.3
frozenlist==1.5.0
fsspec==2024.12.0
gguf==0.10.0
google-api-core==2.24.0
google-auth==2.37.0
googleapis-common-protos==1.66.0
greenlet==3.1.1
grpcio==1.69.0
h11==0.14.0
httpcore==1.0.7
httptools==0.6.4
httpx==0.27.2
httpx-sse==0.4.0
huggingface-hub==0.27.1
idna==3.10
importlib_metadata==8.5.0
iniconfig==2.0.0
interegular==0.3.3
jieba==0.42.1
Jinja2==3.1.5
jiter==0.8.2
joblib==1.4.2
jsonpatch==1.33
jsonpointer==3.0.0
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
kiwisolver==1.4.8
langchain==0.3.14
langchain-community==0.3.14
langchain-core==0.3.30
langchain-ollama==0.2.2
langchain-text-splitters==0.3.5
langgraph==0.2.64
langgraph-checkpoint==2.0.10
langgraph-sdk==0.1.51
langsmith==0.2.7
lark==1.2.2
linkify-it-py==2.0.3
lm-format-enforcer==0.10.9
markdown-it-py==3.0.0
MarkupSafe==3.0.2
marshmallow==3.25.1
matplotlib==3.10.0
mdit-py-plugins==0.4.2
mdurl==0.1.2
memray==1.15.0
mistral_common==1.5.1
mpmath==1.3.0
msgpack==1.1.0
msgspec==0.19.0
multidict==6.1.0
mypy-extensions==1.0.0
nest-asyncio==1.6.0
networkx==3.4.2
numpy==1.26.4
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-ml-py==12.560.30
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
ollama==0.4.5
openai==1.59.7
opencensus==0.11.4
opencensus-context==0.1.3
opencv-python-headless==4.11.0.86
orjson==3.10.13
outlines==0.1.11
outlines_core==0.1.26
packaging==24.2
pandas==2.2.3
partial-json-parser==0.2.1.1.post5
pillow==10.4.0
platformdirs==4.3.6
plotly==5.24.1
pluggy==1.5.0
prometheus-fastapi-instrumentator==7.0.2
prometheus_client==0.21.1
propcache==0.2.1
proto-plus==1.25.0
protobuf==5.29.3
psutil==6.1.1
py-cpuinfo==9.0.0
py-spy==0.4.0
pyasn1==0.6.1
pyasn1_modules==0.4.1
pybind11==2.13.6
pycountry==24.6.1
pydantic==2.10.4
pydantic-settings==2.7.1
pydantic_core==2.27.2
Pygments==2.19.1
pyparsing==3.2.1
pytest==8.3.4
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
pytz==2024.2
PyYAML==6.0.2
pyzmq==26.2.0
ray==2.40.0
referencing==0.36.1
regex==2024.11.6
requests==2.32.3
requests-toolbelt==1.0.0
rich==13.9.4
rpds-py==0.22.3
rsa==4.9
safetensors==0.5.2
scikit-learn==1.6.0
scipy==1.15.0
sentencepiece==0.2.0
setuptools==75.8.0
six==1.17.0
smart-open==7.1.0
sniffio==1.3.1
SQLAlchemy==2.0.37
starlette==0.41.3
sympy==1.13.1
tenacity==9.0.0
textual==1.0.0
threadpoolctl==3.5.0
tiktoken==0.7.0
tokenizers==0.21.0
torch==2.5.1
torchvision==0.20.1
tqdm==4.67.1
transformers==4.48.0
triton==3.1.0
typing-inspect==0.9.0
typing_extensions==4.12.2
tzdata==2024.2
uc-micro-py==1.0.3
urllib3==2.3.0
uvicorn==0.34.0
uvloop==0.21.0
virtualenv==20.29.0
vllm==0.6.6.post1
watchfiles==1.0.4
websockets==14.1
wrapt==1.17.2
xformers==0.0.28.post3
xgrammar==0.1.10
yarl==1.18.3
zipp==3.21.0

コントリビューターガイド