FR Streaming MCMC interface for big models · pyro-ppl/pyro#2843

仓库指标

Star: (8,211 star)
PR 合并指标: (平均合并 10天 19小时) (30 天内合并 1 个 PR)

描述

This issue proposes a streaming architecture for MCMC on models with large memory footprint.

The problem this addresses is that, in models with high-dimensional latents (say >1M latent variables), it becomes difficult to save a list of samples, especially on GPUs with limited memory. The proposed solution is to eagerly compute statistics on those samples, and discard them during inference.

@fehiepsi suggested creating a new MCMC class (say StreamingMCMC) with similar interface to MCMC and still independent of kernel (using either HMC or NUTS) but that follows an internal streaming architecture. Since large models like these usually run on GPU or are otherwise memory constrained, it is reasonable to avoid multiprocessing support in StreamingMCMC.

Along with the new StreamingMCMC class I think there should be a set of helpers to streamingly compute statistics from sample streams, e.g. mean, variance, covariance, r_hat statistics.

Tasks (to be split into multiple PRs)

@mtsokol

#2857 Create a StreamingMCMC class with interface identical to MCMC (except disallowing parallel chains).
#2857 Generalize unit tests of MCMC to parametrize over both MCMC and StreamingMCMC
Add some tests ensuring StreamingMCMC and MCMC perform identical computations, up to numerical precision
Create a tutorial using StreamingMCMC on a big model

@fritzo

#2856 Create streaming helpers for mean, variance, etc.
Add r_hat to pyro.ops.streaming
Add n_eff = ess to pyro.ops.streaming

贡献者指南

研究方向: 实现一个与 MCMC 类似但无多进程支持的 StreamingMCMC 类，并添加流式统计助手（均值、方差、协方差、r hat）。
技术栈: pythonpytorch
领域: machine learning
议题类型: 功能
难度: 3
预计时间: 1-2 天
活动状态: 活跃
清晰度: 清晰
前置要求: PythonPyTorchProbabilistic Programming
新手友好度: 30

仓库指标

描述

Tasks (to be split into multiple PRs)

贡献者指南

每天在邮箱收到新鲜 Easy issues。