pytorch/vision
View on GitHub[proposal] Move dataset tar/zip extraction and integrity checking of multiple files to utils.py
Open
#441 opened on Mar 6, 2018
enhancementhelp wantedneeds discussion
Description
Datasets such as CIFAR / MNIST / etc have download / integrity logic that is useful for reimplementing custom user datasets (such as https://github.com/vadimkantorov/metriclearningbench/blob/master/cars196.py, https://github.com/vadimkantorov/metriclearningbench/blob/master/cub2011.py, https://github.com/vadimkantorov/metriclearningbench/blob/master/stanford_online_products.py)
I propose moving it to torchvision/datasets/utils.py functions like downloading and extracting tarballs / zipfiles / plain files; checking integrity by md5 of a file list if it is provided.
Currently avoiding duplication leads to quirky subclassing of ImageFolder, Cifar10 etc.