Wrong prioritization of languages · scrapinghub/dateparser#770

(6 comments) (0 reactions) (1 assignee)Python (2,318 stars) (443 forks)batch import

good first issue

説明

I think there is something wrong in dateparser prioritization of languages, as introducing 'en' even in the last position hurts extraction of dates that were extracted properly when English was not there.

import dateparser
dateparser.parse("11/12", languages=['en'])
Out[3]: datetime.datetime(2020, 11, 12, 0, 0)

This is right

dateparser.parse("11/12", languages=['es'])
Out[4]: datetime.datetime(2020, 12, 11, 0, 0)

This is also right, because the standard in Spain is DD/MM But now if we add English to the languages list in the last position...

dateparser.parse("11/12", languages=['es', 'en'])
Out[5]: datetime.datetime(2020, 11, 12, 0, 0)

We got it parsed like in English, even if Spanish is first in the list of languages. This is unexpected to me, I would have expected prioritizing Spanish instead.

コントリビューターガイド

技術スタック: python
領域: backend
Issue 種別: bug
難度: 3
推定時間: 1-2 days
活動状況: stale
明確さ: clear
前提条件: Basic PythonUnderstanding of dateparser libraryKnowledge of locale date formats
初心者向け度: 50
調査方針: Investigate the language prioritization logic in the dateparser source code, particularly in the language detection module. Check how the 'languages' parameter is processed and where the order might be lost. The fix should ensure that the order of languages in the list is respected, with the first language having highest priority. Relevant files likely include 'dateparser/parser.py' or similar.