pytorch/ignite

Support for TorchSnapshot for efficient checkpoint saving and loading

Open

#2,752 opened on Oct 24, 2022

View on GitHub
聽(3 comments)聽(0 reactions)聽(0 assignees)Python聽(4,313 stars)聽(602 forks)batch import
enhancementhelp wanted

Description

馃殌 Feature

TorchSnapshot is a performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind. It includes many optimizations to control for memory usage and optimize checkpoint writing for DDP-style workloads over torch.save/torch.load. For more information, please check out the readme: https://github.com/pytorch/torchsnapshot#why-torchsnapshot

This could be a nice addition to Ignite, similar to the existing Checkpoint handler

cc @yifuwang

Contributor guide

Support for TorchSnapshot for efficient checkpoint saving and loading 路 pytorch/ignite#2752 | Good First Issue