psf/requests-html
Auf GitHub ansehenWhen requesting a page that is ISO-8859-1 encoded, HTML is still interpreted as UTF-8
Open
#442 geöffnet am 27. Jan. 2021
help wanted
Repository-Metriken
- Stars
- (13.555 Stars)
- PR-Merge-Metriken
- (Keine gemergten PRs in 30 T)
Beschreibung
When requesting a page that is ISO-8859-1 encoded:
>>> r = session.get('https://gerda.geus.dk/Gerda/Search')
>>> r.encoding
'ISO-8859-1'
>>> r.html.default_encoding
'ISO-8859-1'
>>> r.html.encoding
'utf8'
>>> r.html.find("option")[-1].text
'Bygge-anl�g'
Expected behavior:
>>> r.html.find("option")[-1].text
'Bygge-anlæg'
As far as I can see, there are two problems:
r.html.encodingis incorrectly setr.html.element(ThePyQueryinstance) does not take encoding into account at all but just assumes utf-8