pytorch/vision

[proposal] Move dataset tar/zip extraction and integrity checking of multiple files to utils.py

Open

#441 opened on Mar 6, 2018

View on GitHub
 (6 comments) (0 reactions) (0 assignees)Python (15,050 stars) (6,858 forks)batch import
enhancementhelp wantedneeds discussion

Description

Datasets such as CIFAR / MNIST / etc have download / integrity logic that is useful for reimplementing custom user datasets (such as https://github.com/vadimkantorov/metriclearningbench/blob/master/cars196.py, https://github.com/vadimkantorov/metriclearningbench/blob/master/cub2011.py, https://github.com/vadimkantorov/metriclearningbench/blob/master/stanford_online_products.py)

I propose moving it to torchvision/datasets/utils.py functions like downloading and extracting tarballs / zipfiles / plain files; checking integrity by md5 of a file list if it is provided.

Currently avoiding duplication leads to quirky subclassing of ImageFolder, Cifar10 etc.

Contributor guide