apache/beam

[Feature Request]: Use unique id for Python BigQueryIO

Open

#22,733 opened on Aug 15, 2022

View on GitHub
 (9 comments) (0 reactions) (1 assignee)Java (7,313 stars) (4,097 forks)batch import
P2gcpgood first issueionew featurepython

Description

What would you like to happen?

There can be a collision when 2 pipelines using templates are loading to BigQuery at the same time to the same temp_location.

@baeminbo found there is code to use unique_id [1], but it seems that templates can re-use the same uuid. It is fixed for Java[2] by moving the UUID generation into a DoFn of a ParDo

[1] https://github.com/apache/beam/blob/v2.34.0/sdks/python/apache_beam/io/gcp/bigquery.py#L2399 [2] https://github.com/apache/beam/blob/v2.34.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1094-L1105

Issue Priority

Priority: 2

Issue Component

Component: io-py-gcp

Contributor guide

[Feature Request]: Use unique id for Python BigQueryIO · apache/beam#22733 | Good First Issue