apple/turicreate

Inefficient implementation on checking if path is a s3 directory

Open

#3.049 geöffnet am 14. März 2020

Auf GitHub ansehen
 (0 Kommentare) (0 Reaktionen) (0 zugewiesene Personen)C++ (1.161 Forks)batch import
S3good first issue

Repository-Metriken

Stars
 (11.135 Stars)
PR-Merge-Metriken
 (Keine gemergten PRs in 30 T)

Beschreibung

our current implementation of s3 is_directory is to exhaust all keys under a certain URL (a.k.a, prefix, "folder" or "group") and compare the URL with each item returned, e.g., Contents[Key] and CommonPrefixes. This is a bad practice if the URL contains many keys and the comparison is linear O(n).

Besides that, is_directory is widely used in our codebase, not to mention the time complexity, the network delay will be observable.

A solution should be,

  • head on URL assuming it's an object. If 404, then
  • list-objects --with-delimiter and --max-items=1, and check the CommonPrefixes section. You can provide an extra parameter to tell list_objects_impl to do so by setting the max keys to return.

Contributor Guide