enhancementhelp wantedvision
描述
Feature request
Add a Video feature to the library so folks can include videos in their datasets.
Motivation
Being able to load Video data would be quite helpful. However, there are some challenges when it comes to videos:
- Videos, unlike images, can end up being extremely large files
- Often times when training video models, you need to do some very specific sampling. Videos might end up needing to be broken down into X number of clips used for training/inference
- Videos have an additional audio stream, which must be accounted for
- The feature needs to be able to encode/decode videos (with right video settings) from bytes.
Your contribution
I did work on this a while back in this (now closed) PR. It used a library I made called encoded_video, which is basically the utils from pytorchvideo, but without the torch dep. It included the ability to read/write from bytes, as we need to do here. We don't want to be using a sketchy library that I made as a dependency in this repo, though.
Would love to use this issue as a place to:
- brainstorm ideas on how to do this right
- list ways/examples to work around it for now
CC @sayakpaul @mariosasko @fcakyon