:Core/Infra/REST API>enhancementTeam:Core/Infrahelp wantedteam-discusstriaged
Description
In https://github.com/elastic/elasticsearch/pull/22691#discussion_r96935452, I added a comment which points out that our code currently ignores the charset parameter of the Content-Type header and that this is something we should look into. Looking at the javadocs of JsonFactory to see how different charsets are handled:
Encoding is auto-detected from contents according to JSON
specification recommended mechanism. Json specification
supports only UTF-8, UTF-16 and UTF-32 as valid encodings,
so auto-detection implemented only for this charsets.
For other charsets use {@link #createParser(java.io.Reader)}.
Unfortunately not all clients adhere to the unicode only encodings as I have seen some send data as ISO-8859-1. I think we should consider parsing the charset from the content-type when available and handling appropriately (failing if we cannot support, convert, create parser differently etc.).