cloudpipe/cloudpickle

Feature request: support for pickling/unpickling interactively defined Cython functions and classes

Open

#502 opened on Feb 16, 2023

View on GitHub
 (1 comment) (0 reactions) (0 assignees)Python (1,407 stars) (152 forks)batch import
help wanted

Description

It would be nice to be able to run interactively defined Cython functions in parallel on dask/ray/spark clusters or joblib/loky process pools.

Such interactively defined functions can typically be created using the %%cython magic in a Jupyter session or by a pyximport call in a Python script (e.g. from the __main__ module).

In both cases, it quite easy to interactively inspect the callable from Python to identify that their definition come from such an interactively defined Cython module.

A first step would be to add a new assume_shared_dynamic_cython_module_folder=False constructor parameter to the cloudpickle.Pickler class, in which case we could not need to ship the backing .so file over the wire (nor rebuild it from a source code sent over the wire).

This assumption would hold for the single host parallelization calls (e.g. joblib/loky or single host ray/dask setups) or for cluster computing setups with shared home folders between the driver node and the worker nodes (e.g. NFS home folder in jupyter hub + HPC cluster).

Contributor guide