openai: intermittent `LengthFinishReasonError` in `AzureChatOpenAI`
#30,924 opened on Apr 18, 2025
Description
Checked other resources
- I added a very descriptive title to this issue.
- I used the GitHub search to find a similar question and didn't find it.
- I am sure that this is a bug in LangChain rather than my code.
- The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
- I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.
Example Code
model = AzureChatOpenAI(
azure_deployment="gpt-4o",
api_key=...,
api_version=...,
azure_endpoint=...,
temperature=0.3,
max_retries=3,
max_tokens=None,
timeout=None,
)
system_prompt = "..."
user_prompts = [ ... ]
prompt = ChatPromptTemplate.from_messages(
[
SystemMessage(content=system_prompt),
HumanMessagePromptTemplate.from_template("{input}"),
]
)
chain = prompt | model | JsonOutputParser()
responses = await chain.abatch(
[
{
"input": user_prompt,
}
for user_prompt in user_prompts
],
config={
"max_concurrency": 20,
},
)
Error Message and Stack Trace (if applicable)
[2025-04-18 18:17:26.906][SpawnProcess-768][44173][46d32417-2680-4b6a-8005-65d532070441][ERROR][core.services.llm:77] An unexpected error occurred in async batch processing: Could not parse response content as the length limit was reached - CompletionUsage(completion_tokens=16384, prompt_tokens=238, total_tokens=16622, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))
Traceback (most recent call last):
File "/usr/core/services/llm.py", line 71, in abatch
responses = await chain.abatch(
File "/usr/local/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 3331, in abatch
inputs = await step.abatch(
File "/usr/local/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 5498, in abatch
return await self.bound.abatch(
File "/usr/local/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 905, in abatch
return await gather_with_concurrency(configs[0].get("max_concurrency"), *coros)
File "/usr/local/lib/python3.10/site-packages/langchain_core/runnables/utils.py", line 75, in gather_with_concurrency
return await asyncio.gather(*(gated_coro(semaphore, c) for c in coros))
File "/usr/local/lib/python3.10/site-packages/langchain_core/runnables/utils.py", line 57, in gated_coro
return await coro
File "/usr/local/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 902, in ainvoke
return await self.ainvoke(input, config, **kwargs)
File "/usr/local/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 353, in ainvoke
llm_result = await self.agenerate_prompt(
File "/usr/local/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 905, in agenerate_prompt
return await self.agenerate(
File "/usr/local/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 863, in agenerate
raise exceptions[0]
File "/usr/local/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 1033, in _agenerate_with_cache
result = await self._agenerate(
File "/usr/local/lib/python3.10/site-packages/langchain_openai/chat_models/base.py", line 1129, in _agenerate
response = await self.root_async_client.beta.chat.completions.parse(
File "/usr/local/lib/python3.10/site-packages/openai/resources/beta/chat/completions.py", line 437, in parse
return await self._post(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1767, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1461, in request
return await self._request(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1564, in _request
return await self._process_response(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1661, in _process_response
return await api_response.parse()
File "/usr/local/lib/python3.10/site-packages/openai/_response.py", line 432, in parse
parsed = self._options.post_parser(parsed)
File "/usr/local/lib/python3.10/site-packages/openai/resources/beta/chat/completions.py", line 431, in parser
return _parse_chat_completion(
File "/usr/local/lib/python3.10/site-packages/openai/lib/_parsing/_completions.py", line 72, in parse_chat_completion
raise LengthFinishReasonError(completion=chat_completion)
openai.LengthFinishReasonError: Could not parse response content as the length limit was reached - CompletionUsage(completion_tokens=16384, prompt_tokens=238, total_tokens=16622, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))
Description
Hi team,
I recently migrated our client from ChatOpenAI to AzureChatOpenAI, and since the migration, I’ve been encountering intermittent LengthFinishReasonError exceptions.
According to the LangSmith traces, each call had a combined token count (prompt + completion) between 1,000 and 1,500 tokens, and the total was always well below 10,000 tokens. This is significantly under the total_tokens value shown in the stacktrace where the error is raised.
Interestingly, when the issue occurs, the requests seem to hang for about 3 minutes although the output appears in LangSmith trace pretty quickly (10~20 seconds).
It seems like the error is being thrown even though we're not approaching the model's token limit. Any insights into what could be causing this or how to further debug it would be appreciated.
System Info
langchain_core: 0.3.54 langchain: 0.3.23 langsmith: 0.3.32 langchain_google_cloud_sql_pg: 0.13.0 langchain_google_vertexai: 2.0.20 langchain_openai: 0.3.14 langchain_text_splitters: 0.3.8 langgraph_sdk: 0.1.61