apache/beam

[Feature Request]: Use unique id for Python BigQueryIO

Open

Aperta il 15 ago 2022

Vedi su GitHub
 (9 commenti) (0 reazioni) (1 assegnatario)Java (7313 star) (4097 fork)batch import
P2gcpgood first issueionew featurepython

Descrizione

What would you like to happen?

There can be a collision when 2 pipelines using templates are loading to BigQuery at the same time to the same temp_location.

@baeminbo found there is code to use unique_id [1], but it seems that templates can re-use the same uuid. It is fixed for Java[2] by moving the UUID generation into a DoFn of a ParDo

[1] https://github.com/apache/beam/blob/v2.34.0/sdks/python/apache_beam/io/gcp/bigquery.py#L2399 [2] https://github.com/apache/beam/blob/v2.34.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1094-L1105

Issue Priority

Priority: 2

Issue Component

Component: io-py-gcp

Guida contributor