apache/beam

[Feature Request]: Use unique id for Python BigQueryIO

Open

#22,733 创建于 2022年8月15日

在 GitHub 查看
 (9 评论) (0 反应) (1 负责人)Java (7,313 star) (4,097 fork)batch import
P2gcpgood first issueionew featurepython

描述

What would you like to happen?

There can be a collision when 2 pipelines using templates are loading to BigQuery at the same time to the same temp_location.

@baeminbo found there is code to use unique_id [1], but it seems that templates can re-use the same uuid. It is fixed for Java[2] by moving the UUID generation into a DoFn of a ParDo

[1] https://github.com/apache/beam/blob/v2.34.0/sdks/python/apache_beam/io/gcp/bigquery.py#L2399 [2] https://github.com/apache/beam/blob/v2.34.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1094-L1105

Issue Priority

Priority: 2

Issue Component

Component: io-py-gcp

贡献者指南

[Feature Request]: Use unique id for Python BigQueryIO · apache/beam#22733 | Good First Issue