openai: intermittent `LengthFinishReasonError` in `AzureChatOpenAI` · langchain-ai/langchain#30924

(3 comments) (1 reaction) (0 assignees)Python (136,758 stars) (22,617 forks)batch import

bugexternalhelp wantedinvestigateopenai

Description

Checked other resources

I added a very descriptive title to this issue.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Example Code


model = AzureChatOpenAI(
    azure_deployment="gpt-4o",
    api_key=...,
    api_version=...,
    azure_endpoint=...,
    temperature=0.3,
    max_retries=3,
    max_tokens=None,
    timeout=None,
)

system_prompt = "..."
user_prompts = [ ... ]

prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessage(content=system_prompt),
        HumanMessagePromptTemplate.from_template("{input}"),
    ]
)
chain = prompt | model | JsonOutputParser()
responses = await chain.abatch(
    [
        {
            "input": user_prompt,
        }
        for user_prompt in user_prompts
    ],
    config={
        "max_concurrency": 20,
    },
)

Error Message and Stack Trace (if applicable)

[2025-04-18 18:17:26.906][SpawnProcess-768][44173][46d32417-2680-4b6a-8005-65d532070441][ERROR][core.services.llm:77] An unexpected error occurred in async batch processing: Could not parse response content as the length limit was reached - CompletionUsage(completion_tokens=16384, prompt_tokens=238, total_tokens=16622, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))
Traceback (most recent call last):
  File "/usr/core/services/llm.py", line 71, in abatch
    responses = await chain.abatch(
  File "/usr/local/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 3331, in abatch
    inputs = await step.abatch(
  File "/usr/local/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 5498, in abatch
    return await self.bound.abatch(
  File "/usr/local/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 905, in abatch
    return await gather_with_concurrency(configs[0].get("max_concurrency"), *coros)
  File "/usr/local/lib/python3.10/site-packages/langchain_core/runnables/utils.py", line 75, in gather_with_concurrency
    return await asyncio.gather(*(gated_coro(semaphore, c) for c in coros))
  File "/usr/local/lib/python3.10/site-packages/langchain_core/runnables/utils.py", line 57, in gated_coro
    return await coro
  File "/usr/local/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 902, in ainvoke
    return await self.ainvoke(input, config, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 353, in ainvoke
    llm_result = await self.agenerate_prompt(
  File "/usr/local/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 905, in agenerate_prompt
    return await self.agenerate(
  File "/usr/local/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 863, in agenerate
    raise exceptions[0]
  File "/usr/local/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 1033, in _agenerate_with_cache
    result = await self._agenerate(
  File "/usr/local/lib/python3.10/site-packages/langchain_openai/chat_models/base.py", line 1129, in _agenerate
    response = await self.root_async_client.beta.chat.completions.parse(
  File "/usr/local/lib/python3.10/site-packages/openai/resources/beta/chat/completions.py", line 437, in parse
    return await self._post(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1767, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1461, in request
    return await self._request(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1564, in _request
    return await self._process_response(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1661, in _process_response
    return await api_response.parse()
  File "/usr/local/lib/python3.10/site-packages/openai/_response.py", line 432, in parse
    parsed = self._options.post_parser(parsed)
  File "/usr/local/lib/python3.10/site-packages/openai/resources/beta/chat/completions.py", line 431, in parser
    return _parse_chat_completion(
  File "/usr/local/lib/python3.10/site-packages/openai/lib/_parsing/_completions.py", line 72, in parse_chat_completion
    raise LengthFinishReasonError(completion=chat_completion)
openai.LengthFinishReasonError: Could not parse response content as the length limit was reached - CompletionUsage(completion_tokens=16384, prompt_tokens=238, total_tokens=16622, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))

Description

Hi team,

I recently migrated our client from ChatOpenAI to AzureChatOpenAI, and since the migration, I’ve been encountering intermittent LengthFinishReasonError exceptions.

According to the LangSmith traces, each call had a combined token count (prompt + completion) between 1,000 and 1,500 tokens, and the total was always well below 10,000 tokens. This is significantly under the total_tokens value shown in the stacktrace where the error is raised.

Interestingly, when the issue occurs, the requests seem to hang for about 3 minutes although the output appears in LangSmith trace pretty quickly (10~20 seconds).

It seems like the error is being thrown even though we're not approaching the model's token limit. Any insights into what could be causing this or how to further debug it would be appreciated.

System Info

langchain_core: 0.3.54 langchain: 0.3.23 langsmith: 0.3.32 langchain_google_cloud_sql_pg: 0.13.0 langchain_google_vertexai: 2.0.20 langchain_openai: 0.3.14 langchain_text_splitters: 0.3.8 langgraph_sdk: 0.1.61

Contributor guide

Tech stack: python
Domain: backendapi
Issue type: bug
Difficulty: 4
Estimated time: 1-2 days
Activity status: fresh
Clarity: mostly clear
Prerequisites: PythonlangchainAzure OpenAI API
Newbie friendliness: 20
Research direction: Investigate the intermittent `LengthFinishReasonError` in `AzureChatOpenAI`. Focus on the token limit handling in the OpenAI response parsing, specifically in `openai/lib/ parsing/ completions.py`. Check if the error is due to a mismatch between the model's max tokens and the actual response. Look at LangSmith traces to correlate timing and token counts. Consider adding retry logic or error handling for `LengthFinishReasonError` in the async batch processing path.