langchain-ai/langchain

String Evaluation with labelled criteria Scoring is wrong

Open

#31,870 opened on Jul 4, 2025

View on GitHub
 (7 comments) (0 reactions) (1 assignee)Python (136,758 stars) (22,617 forks)batch import
externalhelp wantedinvestigatelangchain-classic

Description

Checked other resources

  • I added a very descriptive title to this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Example Code

evaluator = load_evaluator(EvaluatorType.LABELED_CRITERIA, llm=llm, criteria=param_criteria)
  eval_result = evaluator.evaluate_strings(
      prediction=param_prediction,
      input=param_input,
      reference=param_reference
  )
return eval_result

Error Message and Stack Trace (if applicable)

Reasoning

{'reasoning': 'Step-by-step reasoning:\n\n1. Understand the user objective: \n The objective is to "Set the flight dates for any dates which has 5 days of difference in departure and return dates." \n The expected steps are: \n - Select a date in the departure date picker \n - Select a date in the return date picker \n\n2. Analyze the actual steps performed in the submission: \n - "Close the login popup" \n - "Click on the return date field to open the date picker." \n - "Click on 10th July 2025 to select the return date." \n\n3. Check if the submission fulfills the objective: \n - The submission only interacts with the return date picker and selects a return date. \n - There is no mention or action related to selecting a departure date. \n - The objective requires setting both departure and return dates with a 5-day difference. \n - The submission does not set the departure date at all, so it is incomplete. \n\n4. Check correctness and factual accuracy: \n - The steps performed are factually correct actions but incomplete relative to the objective. \n - Closing the login popup is not part of the objective but may be a prerequisite; this is acceptable if it does not interfere. \n - Selecting only the return date without selecting the departure date means the objective is not fully met. \n\n5. Conclusion: \n - The submission does not meet the criteria of correctness because it fails to perform all required actions (missing departure date selection). \n - Therefore, the submission is incomplete and does not fulfill the objective as specified.\n\nY', 'value': 'Y', 'score': 1}

Description

  • I am expecting a N here, but received Y
  • Reasoning clearly mentioned that it doesn't meet the requirements but still passed

System Info

System Information

OS: Darwin OS Version: Darwin Kernel Version 24.0.0: Mon Aug 12 20:49:48 PDT 2024; root:xnu-11215.1.10~2/RELEASE_ARM64_T8103 Python Version: 3.9.6 (default, Aug 9 2024, 14:24:13) [Clang 16.0.0 (clang-1600.0.26.3)]

Package Information

langchain_core: 0.3.56 langchain: 0.3.24 langchain_community: 0.3.23 langsmith: 0.2.10 langchain_anthropic: 0.3.10 langchain_openai: 0.2.14 langchain_text_splitters: 0.3.8

Optional packages not installed

langserve

Other Dependencies

aiohttp<4.0.0,>=3.8.3: Installed. No version info available. anthropic<1,>=0.49.0: Installed. No version info available. async-timeout<5.0.0,>=4.0.0;: Installed. No version info available. dataclasses-json<0.7,>=0.5.7: Installed. No version info available. httpx: 0.27.0 httpx-sse<1.0.0,>=0.4.0: Installed. No version info available. jsonpatch<2.0,>=1.33: Installed. No version info available. langchain-anthropic;: Installed. No version info available. langchain-aws;: Installed. No version info available. langchain-azure-ai;: Installed. No version info available. langchain-cohere;: Installed. No version info available. langchain-community;: Installed. No version info available. langchain-core<1.0.0,>=0.3.45: Installed. No version info available. langchain-core<1.0.0,>=0.3.51: Installed. No version info available. langchain-core<1.0.0,>=0.3.55: Installed. No version info available. langchain-core<1.0.0,>=0.3.56: Installed. No version info available. langchain-deepseek;: Installed. No version info available. langchain-fireworks;: Installed. No version info available. langchain-google-genai;: Installed. No version info available. langchain-google-vertexai;: Installed. No version info available. langchain-groq;: Installed. No version info available. langchain-huggingface;: Installed. No version info available. langchain-mistralai;: Installed. No version info available. langchain-ollama;: Installed. No version info available. langchain-openai;: Installed. No version info available. langchain-perplexity;: Installed. No version info available. langchain-text-splitters<1.0.0,>=0.3.8: Installed. No version info available. langchain-together;: Installed. No version info available. langchain-xai;: Installed. No version info available. langchain<1.0.0,>=0.3.24: Installed. No version info available. langsmith-pyo3: Installed. No version info available. langsmith<0.4,>=0.1.125: Installed. No version info available. langsmith<0.4,>=0.1.17: Installed. No version info available. numpy>=1.26.2;: Installed. No version info available. numpy>=2.1.0;: Installed. No version info available. openai: 1.59.4 orjson: 3.10.3 packaging<25,>=23.2: Installed. No version info available. pydantic: 2.10.4 pydantic-settings<3.0.0,>=2.4.0: Installed. No version info available. pydantic<3.0.0,>=2.5.2;: Installed. No version info available. pydantic<3.0.0,>=2.7.4: Installed. No version info available. pydantic<3.0.0,>=2.7.4;: Installed. No version info available. PyYAML>=5.3: Installed. No version info available. requests: 2.32.3 requests-toolbelt: 1.0.0 requests<3,>=2: Installed. No version info available. SQLAlchemy<3,>=1.4: Installed. No version info available. tenacity!=8.4.0,<10,>=8.1.0: Installed. No version info available. tenacity!=8.4.0,<10.0.0,>=8.1.0: Installed. No version info available. tiktoken: 0.7.0 typing-extensions>=4.7: Installed. No version info available. zstandard: Installed. No version info available.

Contributor guide