apache/beam

GCSFileSystem requires gcp extra at lookup time while S3FileSystem does not

Open

#37.445 geöffnet am 29. Jan. 2026

Auf GitHub ansehen
 (19 Kommentare) (0 Reaktionen) (1 zugewiesene Person)Java (7.313 Stars) (4.097 Forks)batch import
good first issue

Beschreibung

There's an inconsistency in how FileSystems.get_filesystem() handles missing optional dependencies between GCS and S3.

Current Behavior

S3 (without aws extra):

>>> from apache_beam.io import filesystems
>>> filesystems.FileSystems.get_filesystem("s3://blah")
<apache_beam.io.aws.s3filesystem.S3FileSystem at 0x11a0af750>

Returns the filesystem object; validation happens later when the filesystem is actually used.

GCS (without gcp extra):

>>> from apache_beam.io import filesystems
>>> filesystems.FileSystems.get_filesystem("gcs://blah")
ValueError: Unable to get filesystem from specified path, please use the correct path or ensure the required dependency is installed, e.g., pip install apache-beam[gcp]. Path specified: gcs://blah

Raises immediately because GCSFileSystem isn't registered as a subclass.

Proposed Behavior

Both should behave consistently. GCSFileSystem should be returned from get_filesystem() like S3FileSystem, allowing callers to validate dependencies when the filesystem is actually used rather than at lookup time.

Why This Matters

  • Inconsistent API behavior is confusing
  • Code that handles multiple filesystem types can't catch/handle GCS gracefully
  • Dependency validation at usage time (not lookup time) allows for better error handling and lazy loading patterns

Environment

  • Apache Beam version: 2.70.0
  • Python version: 3.11

Generated by Claude Code, confirmed by @hjtran

Contributor Guide