langchain-ai/langchain

openai: intermittent `LengthFinishReasonError` in `AzureChatOpenAI`

Open

#30,924 opened on Apr 18, 2025

View on GitHub
 (3 comments) (1 reaction) (0 assignees)Python (136,758 stars) (22,617 forks)batch import
bugexternalhelp wantedinvestigateopenai

Description

Checked other resources

  • I added a very descriptive title to this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Example Code


model = AzureChatOpenAI(
    azure_deployment="gpt-4o",
    api_key=...,
    api_version=...,
    azure_endpoint=...,
    temperature=0.3,
    max_retries=3,
    max_tokens=None,
    timeout=None,
)

system_prompt = "..."
user_prompts = [ ... ]

prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessage(content=system_prompt),
        HumanMessagePromptTemplate.from_template("{input}"),
    ]
)
chain = prompt | model | JsonOutputParser()
responses = await chain.abatch(
    [
        {
            "input": user_prompt,
        }
        for user_prompt in user_prompts
    ],
    config={
        "max_concurrency": 20,
    },
)

Error Message and Stack Trace (if applicable)

[2025-04-18 18:17:26.906][SpawnProcess-768][44173][46d32417-2680-4b6a-8005-65d532070441][ERROR][core.services.llm:77] An unexpected error occurred in async batch processing: Could not parse response content as the length limit was reached - CompletionUsage(completion_tokens=16384, prompt_tokens=238, total_tokens=16622, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))
Traceback (most recent call last):
  File "/usr/core/services/llm.py", line 71, in abatch
    responses = await chain.abatch(
  File "/usr/local/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 3331, in abatch
    inputs = await step.abatch(
  File "/usr/local/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 5498, in abatch
    return await self.bound.abatch(
  File "/usr/local/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 905, in abatch
    return await gather_with_concurrency(configs[0].get("max_concurrency"), *coros)
  File "/usr/local/lib/python3.10/site-packages/langchain_core/runnables/utils.py", line 75, in gather_with_concurrency
    return await asyncio.gather(*(gated_coro(semaphore, c) for c in coros))
  File "/usr/local/lib/python3.10/site-packages/langchain_core/runnables/utils.py", line 57, in gated_coro
    return await coro
  File "/usr/local/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 902, in ainvoke
    return await self.ainvoke(input, config, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 353, in ainvoke
    llm_result = await self.agenerate_prompt(
  File "/usr/local/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 905, in agenerate_prompt
    return await self.agenerate(
  File "/usr/local/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 863, in agenerate
    raise exceptions[0]
  File "/usr/local/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 1033, in _agenerate_with_cache
    result = await self._agenerate(
  File "/usr/local/lib/python3.10/site-packages/langchain_openai/chat_models/base.py", line 1129, in _agenerate
    response = await self.root_async_client.beta.chat.completions.parse(
  File "/usr/local/lib/python3.10/site-packages/openai/resources/beta/chat/completions.py", line 437, in parse
    return await self._post(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1767, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1461, in request
    return await self._request(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1564, in _request
    return await self._process_response(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1661, in _process_response
    return await api_response.parse()
  File "/usr/local/lib/python3.10/site-packages/openai/_response.py", line 432, in parse
    parsed = self._options.post_parser(parsed)
  File "/usr/local/lib/python3.10/site-packages/openai/resources/beta/chat/completions.py", line 431, in parser
    return _parse_chat_completion(
  File "/usr/local/lib/python3.10/site-packages/openai/lib/_parsing/_completions.py", line 72, in parse_chat_completion
    raise LengthFinishReasonError(completion=chat_completion)
openai.LengthFinishReasonError: Could not parse response content as the length limit was reached - CompletionUsage(completion_tokens=16384, prompt_tokens=238, total_tokens=16622, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))

Description

Hi team,

I recently migrated our client from ChatOpenAI to AzureChatOpenAI, and since the migration, I’ve been encountering intermittent LengthFinishReasonError exceptions.

According to the LangSmith traces, each call had a combined token count (prompt + completion) between 1,000 and 1,500 tokens, and the total was always well below 10,000 tokens. This is significantly under the total_tokens value shown in the stacktrace where the error is raised.

Interestingly, when the issue occurs, the requests seem to hang for about 3 minutes although the output appears in LangSmith trace pretty quickly (10~20 seconds).

It seems like the error is being thrown even though we're not approaching the model's token limit. Any insights into what could be causing this or how to further debug it would be appreciated.

System Info

langchain_core: 0.3.54 langchain: 0.3.23 langsmith: 0.3.32 langchain_google_cloud_sql_pg: 0.13.0 langchain_google_vertexai: 2.0.20 langchain_openai: 0.3.14 langchain_text_splitters: 0.3.8 langgraph_sdk: 0.1.61

Contributor guide