elastic/elasticsearch-dsl-py

Critical parts of the API are undocumented

Open

#1312 opened on Feb 13, 2020

View on GitHub
 (2 comments) (4 reactions) (0 assignees)Python (3,665 stars) (793 forks)batch import
Area: DocumentationPriority: Mediumgood first issue

Description

My first tasks, when I started working with Elasticsearch-DSL, were to create a couple of custom field-types - one which was stored as a keyword but mapped to an instance of an Enum class, and another which was stored as a list of keywords but mapped to a Python set - and implement a document class that required some inter-field validation (e.g. start_date < end_date).

In order to do this, I had to figure out:

  • What do the different methods on Field do, and what are their relationships to each other?
    • What's the difference between _serialize() and serialize()? Which should I be overriding?
    • Based on the name, _deserialize()/deserialize() convert data from Elasticsearch-format to Python-format. Can I assume they'll never be called on already-deserialized data? If not, why not?
    • When is clean() used? Does it have to deal with Elasticsearch-format data, Python-format data, or both? Is it just supposed to validation or is it permitted to actually alter the data?
  • What do Field.name and ``Field._coerce` mean/do?
  • Does default_timezone on a Date field affect naive datetimes when they're saved, or only on load?
    • Assuming it only applies on load, if I create a doc using a naive datetime in a Date field with a default_timezone, save it, load it again, then resave it, will the saved version still have a naive date or will it now have a zone one?
  • Which of Document.clean(), Document.full_clean(), or Document.clean_fields() should I be overriding/extending? In what context are these called?

Now, at this point, I've figured out the answers to all of these question (I think) by reading the code and experimenting, but I should have been able to find all of this by looking at the documentation. Each of these methods should have a docstring which explains its purpose, contract, and the context in which it is used, while the class members (name, _coerce, etc.) should have a descriptive comment at their point of declaration.

Contributor guide