Wrong prioritization of languages · scrapinghub/dateparser#770

(6 留言) (0 反應) (1 負責人)Python (2,318 star) (443 fork)batch import

good first issue

描述

I think there is something wrong in dateparser prioritization of languages, as introducing 'en' even in the last position hurts extraction of dates that were extracted properly when English was not there.

import dateparser
dateparser.parse("11/12", languages=['en'])
Out[3]: datetime.datetime(2020, 11, 12, 0, 0)

This is right

dateparser.parse("11/12", languages=['es'])
Out[4]: datetime.datetime(2020, 12, 11, 0, 0)

This is also right, because the standard in Spain is DD/MM But now if we add English to the languages list in the last position...

dateparser.parse("11/12", languages=['es', 'en'])
Out[5]: datetime.datetime(2020, 11, 12, 0, 0)

We got it parsed like in English, even if Spanish is first in the list of languages. This is unexpected to me, I would have expected prioritizing Spanish instead.

貢獻者指南

技術棧: python
領域: backend
議題類型: bug
難度: 3
預計時間: 1-2 days
活動狀態: stale
清晰度: clear
前置要求: Basic PythonUnderstanding of dateparser libraryKnowledge of locale date formats
新手友善度: 50
研究方向: Investigate the language prioritization logic in the dateparser source code, particularly in the language detection module. Check how the 'languages' parameter is processed and where the order might be lost. The fix should ensure that the order of languages in the list is respected, with the first language having highest priority. Relevant files likely include 'dateparser/parser.py' or similar.