Description
If you feed a generator into a many=True schema, Marshmallow builds up the entire generator into memory before serializing it. This makes serializing a collection of many elements take longer and consume more memory than is necessary, possibly even exceeding the available memory of the system or time limits in the environment. These concerns are especially common in web services, where Marshmallow is often used for serializing JSON response bodies, and where web workers often run in memory-constrained environments, and clients or gateways will time out if the service takes too long to start streaming a response.
Users can currently hack around this with something like this:
from typing import Iterable
from marshmallow import Schema
def dumps_many(obj: Iterable, schema: Schema):
schema.many = False
yield "["
it = iter(obj)
i = next(it, None)
while i is not None:
yield schema.dumps(i)
i = next(it, None)
if i is not None:
yield ","
yield "]"
schema.many = True
if __name__ == "__main__":
import sys
from marshmallow.fields import Int
class MySchema(Schema):
i = Int(required=True)
obj = (dict(i=i) for i in range(int(sys.argv[1])))
print(repr("".join(dumps_many(obj, MySchema(many=True)))))
# $ python3 foo.py 0
# '[]'
# $ python3 foo.py 1
# '[{"i": 0}]'
# $ python3 foo.py 2
# '[{"i": 0},{"i": 1}]'
# $ python3 foo.py 9999999999999 # you get the idea
# ...
But it would be great if Marshmallow offered first-class support for this.
Looks like this was previously discussed briefly in https://github.com/marshmallow-code/marshmallow/pull/1164#issuecomment-473316007 where @deckar01 said
We might want to explore streaming with generators in 3.x.
Is now a good time to add this to Marshmallow v3? Could be another really strong reason for v2 users to upgrade.
Thanks for your consideration and for the great work on Marshmallow!