DESIGN: Future API - Minimal/Core/Essential API and Extended/Optional API
#172 opened on Oct 25, 2017
Description
The future package receives many interesting and handy feature requests. Some of them are straighforward whereas others does not necessarily fit straight in. I'm creating this issue to clarify why it's not straightforward to implement these and what the alternatives going forward are, and to encourage further discussion and ideas.
Minimal Future API (aka Future API)
In it's most minimal and essential form, the Future API provides:
future()- creates a future (on any future backend)value()- collects the value of the future (waits for it to resolve if not already done)resolved()- checks whether a future is resolved or not.- A future is stateless, i.e. just as plain R functions, evaluation of a future expression is purely functional without side effects and the outcome is the value (or a condition) of the evaluated expression.
- The values of futures should not depend in what order they are resolved.
On top of this, we have arguments controlling whether the future should be resolved lazily or eagerly, what or how globals are exported, polling and timeout strategies, etc.
I probably forgot something above, so please feel free to comment.
It is critical that this Minimal Future API can be supported by all future backends (including those not yet implemented by that may show up in the future). Because of this, the Minimal Future API is limited in what it can provide.
Examples of features that probably would fits in the Minimal Future API, but has not yet been added:
- (since future 1.9.0) Capture and relay standard output (in addition to the value) (Issue #67)
- (since future 1.11.0) Capture and relay conditions (e.g. messages and warning)
-
Capturing of standard error (in addition to the value) (Issue #67)unlikely, at least not until R itself supports it fully - Benchmarking, e.g. collecting total processing time and memory usage for resolving a future (Issue #59)
- timestamps (easy)
- memory (requires non-base R solutions)
- Mirroring of options (Issue #134)
- manually
- automatically
- Mirroring of environment variables (e.g. Issue #219)
- manually
- automatically
- Just-in-time compilation (Issue #133)
- Memoization (Issue #127)
- Syntax for specifying resources and declaring them optional or mandatory (see below)
- Hook functions called when a future is, say, created, launched, resolved, finalized
- Merging two or more non-launched futures into a single future
Optional Future API
Any features related to futures that can not be supported by all backends belongs to what I consider an extended / optional API - let's call it the Optional Future API. Some features may be specific to a single backend while others to a majority of backends but not all.
Below is a set of features that fit into this category:
- "Passing" existing futures to an new one, e.g.
a <- future(1); b <- future(value(a))- requiresbto be able to "communicate" witha(e.g. different machines) - Suspending/terminating a future currently being evaluated, e.g.
suspend(f)(Issue #93) - Instant forwarding of the future's standard output
and standard errorstreams to the owner process (Issues #141, #171) - "Monitoring" of a future, e.g. progress updates / progress bars (Issue https://github.com/HenrikBengtsson/doFuture/issues/8)
- Persistent workers, i.e. a future can change the state of an underlying worker that a following future can utilize.
- efficiency: don't export globals that already exist on the worker (requires a method for asserting
identical(local, remote). - this can be for efficiency, e.g. futures that share the same global variables may resolve faster if they are resolved by the same worker (this can be optional, i.e. export global if not already available; think memoization)
- a future preserves a value for a downstream future (not sure if this fits into the concept of futures, but I'll add it here in case someone has thoughts about this)
- efficiency: don't export globals that already exist on the worker (requires a method for asserting
- Resources specifications typically seen in HPC environments, e.g. how much available memory and wall-time need to be available in order to start resolving a specific future. Other examples are access to a GPU. The future.batchtools package actually provides a little bit of these features under the hood, but such features are currently experimental and exploratory.
- Other resource specifications, such as only running on the local machine, on the local file system, on a given version of R, access to a certain set of files, and so on.
- ...
Some of the referenced issues discuss why it's hard to implement the features in a generic fashion such that they would work with all future backends (i.e. why the cannot be added to the Minimal Future API but belongs to a set of optional features).