apache/beam

[Feature Request]: Use unique id for Python BigQueryIO

Open

#22,733 opened on 2022年8月15日

GitHub で見る
 (9 comments) (0 reactions) (1 assignee)Java (7,313 stars) (4,097 forks)batch import
P2gcpgood first issueionew featurepython

説明

What would you like to happen?

There can be a collision when 2 pipelines using templates are loading to BigQuery at the same time to the same temp_location.

@baeminbo found there is code to use unique_id [1], but it seems that templates can re-use the same uuid. It is fixed for Java[2] by moving the UUID generation into a DoFn of a ParDo

[1] https://github.com/apache/beam/blob/v2.34.0/sdks/python/apache_beam/io/gcp/bigquery.py#L2399 [2] https://github.com/apache/beam/blob/v2.34.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1094-L1105

Issue Priority

Priority: 2

Issue Component

Component: io-py-gcp

コントリビューターガイド

[Feature Request]: Use unique id for Python BigQueryIO · apache/beam#22733 | Good First Issue