open-guides/og-aws

S3: Discuss ways to list and tally objects efficiently

Open

#58 opened on Sep 2, 2016

View on GitHub
 (9 comments) (0 reactions) (0 assignees)Shell (36,412 stars) (3,905 forks)batch import
help wantedunder discussion

Description

Topics:

  • Listing and pagination
  • Need for multi-threaded S3 crawl over keys for speed
    • Prefix-based listings, with separators
    • Hash-type prefixes with known alphabet, uniform distribution
  • Possibly: Reassigning work; using markers to optimize if alphabet is not known
  • Tallying usage by mapreduce over keys that propagate usage up by folder

https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html

Contributor guide