huggingface/datasets

Add video feature

Open

#5,225 创建于 2022年11月10日

在 GitHub 查看
 (7 评论) (4 反应) (0 负责人)Python (18,313 star) (2,496 fork)batch import
enhancementhelp wantedvision

描述

Feature request

Add a Video feature to the library so folks can include videos in their datasets.

Motivation

Being able to load Video data would be quite helpful. However, there are some challenges when it comes to videos:

  1. Videos, unlike images, can end up being extremely large files
  2. Often times when training video models, you need to do some very specific sampling. Videos might end up needing to be broken down into X number of clips used for training/inference
  3. Videos have an additional audio stream, which must be accounted for
  4. The feature needs to be able to encode/decode videos (with right video settings) from bytes.

Your contribution

I did work on this a while back in this (now closed) PR. It used a library I made called encoded_video, which is basically the utils from pytorchvideo, but without the torch dep. It included the ability to read/write from bytes, as we need to do here. We don't want to be using a sketchy library that I made as a dependency in this repo, though.

Would love to use this issue as a place to:

  • brainstorm ideas on how to do this right
  • list ways/examples to work around it for now

CC @sayakpaul @mariosasko @fcakyon

贡献者指南