marshmallow-code/marshmallow

Support streaming dump

Open

#1.696 aberto em 24 de nov. de 2020

Ver no GitHub
 (5 comments) (5 reactions) (0 assignees)Python (640 forks)batch import
help wanted

Métricas do repositório

Stars
 (6.787 stars)
Métricas de merge de PR
 (Mesclagem média 4h 13m) (6 fundiu PRs em 30d)

Description

If you feed a generator into a many=True schema, Marshmallow builds up the entire generator into memory before serializing it. This makes serializing a collection of many elements take longer and consume more memory than is necessary, possibly even exceeding the available memory of the system or time limits in the environment. These concerns are especially common in web services, where Marshmallow is often used for serializing JSON response bodies, and where web workers often run in memory-constrained environments, and clients or gateways will time out if the service takes too long to start streaming a response.

Users can currently hack around this with something like this:

from typing import Iterable
from marshmallow import Schema


def dumps_many(obj: Iterable, schema: Schema):
    schema.many = False
    yield "["
    it = iter(obj)
    i = next(it, None)
    while i is not None:
        yield schema.dumps(i)
        i = next(it, None)
        if i is not None:
            yield ","
    yield "]"
    schema.many = True


if __name__ == "__main__":
    import sys
    from marshmallow.fields import Int

    class MySchema(Schema):
        i = Int(required=True)

    obj = (dict(i=i) for i in range(int(sys.argv[1])))
    print(repr("".join(dumps_many(obj, MySchema(many=True)))))


# $ python3 foo.py 0
# '[]'
# $ python3 foo.py 1
# '[{"i": 0}]'
# $ python3 foo.py 2
# '[{"i": 0},{"i": 1}]'
# $ python3 foo.py 9999999999999  # you get the idea
# ...

But it would be great if Marshmallow offered first-class support for this.

Looks like this was previously discussed briefly in https://github.com/marshmallow-code/marshmallow/pull/1164#issuecomment-473316007 where @deckar01 said

We might want to explore streaming with generators in 3.x.

Is now a good time to add this to Marshmallow v3? Could be another really strong reason for v2 users to upgrade.

Thanks for your consideration and for the great work on Marshmallow!

Guia do colaborador