apache/seatunnel

Optimize Sink Connection Creation When Writing Multiple Sharded MySQL Tables to a Single Paimon Table (Stuck on 200+ Sinks)

Open

Aperta il 29 mag 2026

Vedi su GitHub
 (1 commento) (0 reazioni) (0 assegnatari)Java (6897 star) (1432 fork)batch import
featurehelp wanted

Descrizione

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

The data source is a sharded MySQL database, and the target database is Paimon. When I write more than 200 sharded tables into a single Paimon table, more than 200 sink connections will be created, that is, the number of sinks is equal to the number of source tables. Currently, it gets stuck when creating sinks. How can I optimize this?

SeaTunnel Version

2.3.13

SeaTunnel Config

env {
        parallelism = 4
	job.mode = "BATCH"
        checkpoint.interval = 180000
        checkpoint.timeout = 600000  
}
source {
  Jdbc {
    url = "jdbc:mysql://${hostname}:${port}/"
    user = "${username}"
    password = "${password}"
    driver = "com.mysql.cj.jdbc.Driver"
    use_regex = true
    table_path = "database_\\{2}.t_report_table_{2}_\\{6}"
    fetch_size = 1000
    plugin_output = "source"
  }
}
transform {
  Sql {
    plugin_input = "source"
    plugin_output = "sql_out"
    query = """
      SELECT 
        *,
        substring(city_code,0,2) AS province_no,
        TO_DATE(statistical_date,yyyyMM) AS record_year_month
      FROM source
    """
  }
}

sink {
  Paimon {
    plugin_input = "sql_out"
    catalog_type = "hive"
    catalog_uri = "${catalog_uri}"
    warehouse = "${warehouse}"
    database = "ods_paper"
    table = "ods_report_table"
    paimon.hadoop.conf-path = "${hadoop_conf_path}"
    data_save_mode = "DROP_DATA"
    paimon.table.primary-keys = "id,province_no,record_year_month"
    paimon.table.partition-keys = "province_no"
    multi_table_sink_replica = 1
    paimon.table.write-props = {
        bucket = 4
        bucket-key = id
        sink.parallelism = 4
        changelog-producer = "none"
    }
  }
}

Running Command

env {
        parallelism = 4
	job.mode = "BATCH"
        checkpoint.interval = 180000
        checkpoint.timeout = 600000  
}
source {
  Jdbc {
    url = "jdbc:mysql://${hostname}:${port}/"
    user = "${username}"
    password = "${password}"
    driver = "com.mysql.cj.jdbc.Driver"
    use_regex = true
    table_path = "database_\\{2}.t_report_table_{2}_\\{6}"
    fetch_size = 1000
    plugin_output = "source"
  }
}
transform {
  Sql {
    plugin_input = "source"
    plugin_output = "sql_out"
    query = """
      SELECT 
        *,
        substring(city_code,0,2) AS province_no,
        TO_DATE(statistical_date,yyyyMM) AS record_year_month
      FROM source
    """
  }
}

sink {
  Paimon {
    plugin_input = "sql_out"
    catalog_type = "hive"
    catalog_uri = "${catalog_uri}"
    warehouse = "${warehouse}"
    database = "ods_paper"
    table = "ods_report_table"
    paimon.hadoop.conf-path = "${hadoop_conf_path}"
    data_save_mode = "DROP_DATA"
    paimon.table.primary-keys = "id,province_no,record_year_month"
    paimon.table.partition-keys = "province_no"
    multi_table_sink_replica = 1
    paimon.table.write-props = {
        bucket = 4
        bucket-key = id
        sink.parallelism = 4
        changelog-producer = "none"
    }
  }
}

Error Exception

[INFO] 2026-05-28 10:55:14.795 +0800 -  -> 
	2026-05-28 10:55:13,818 INFO  [o.a.s.a.t.f.FactoryUtil       ] [main] - Create sink 'Paimon' with upstream input catalog-table[database: database_43, schema: null, table: t_report_table_43_202605]
	2026-05-28 10:55:13,826 INFO  [.c.s.p.s.PaimonSecurityContext] [main] - Hadoop config initialized: org.apache.seatunnel.connectors.seatunnel.paimon.config.PaimonHadoopConfiguration
	2026-05-28 10:55:13,834 INFO  [.c.s.p.s.PaimonSecurityContext] [main] - Hadoop config initialized: org.apache.seatunnel.connectors.seatunnel.paimon.config.PaimonHadoopConfiguration
	2026-05-28 10:55:13,834 INFO  [o.a.p.h.HiveCatalog           ] [main] - Setting hive conf dir as null
	2026-05-28 10:55:13,915 INFO  [.p.f.h.HadoopSecuredFileSystem] [main] - Hadoop security configuration is legal, use the secured FileSystem.
	2026-05-28 10:55:13,995 INFO  [o.a.p.s.HadoopModule          ] [main] - Hadoop user set to root (auth:SIMPLE)
	2026-05-28 10:55:13,995 INFO  [o.a.p.s.HadoopModule          ] [main] - Kerberos security is disabled.
	2026-05-28 10:55:14,015 INFO  [o.a.s.a.t.f.FactoryUtil       ] [main] - Create sink 'Paimon' with upstream input catalog-table[database: database_43, schema: null, table: t_report_table_43_202606]
	2026-05-28 10:55:14,023 INFO  [.c.s.p.s.PaimonSecurityContext] [main] - Hadoop config initialized: org.apache.seatunnel.connectors.seatunnel.paimon.config.PaimonHadoopConfiguration
	2026-05-28 10:55:14,031 INFO  [.c.s.p.s.PaimonSecurityContext] [main] - Hadoop config initialized: org.apache.seatunnel.connectors.seatunnel.paimon.config.PaimonHadoopConfiguration
	2026-05-28 10:55:14,032 INFO  [o.a.p.h.HiveCatalog           ] [main] - Setting hive conf dir as null
	2026-05-28 10:55:14,196 INFO  [.p.f.h.HadoopSecuredFileSystem] [main] - Hadoop security configuration is legal, use the secured FileSystem.
	2026-05-28 10:55:14,196 INFO  [o.a.p.s.HadoopModule          ] [main] - Hadoop user set to root (auth:SIMPLE)
	2026-05-28 10:55:14,196 INFO  [o.a.p.s.HadoopModule          ] [main] - Kerberos security is disabled.
	2026-05-28 10:55:14,217 INFO  [o.a.s.a.t.f.FactoryUtil       ] [main] - Create sink 'Paimon' with upstream input catalog-table[database: database_43, schema: null, table: t_report_table_43_202607]
	2026-05-28 10:55:14,225 INFO  [.c.s.p.s.PaimonSecurityContext] [main] - Hadoop config initialized: org.apache.seatunnel.connectors.seatunnel.paimon.config.PaimonHadoopConfiguration
	2026-05-28 10:55:14,232 INFO  [.c.s.p.s.PaimonSecurityContext] [main] - Hadoop config initialized: org.apache.seatunnel.connectors.seatunnel.paimon.config.PaimonHadoopConfiguration
	2026-05-28 10:55:14,233 INFO  [o.a.p.h.HiveCatalog           ] [main] - Setting hive conf dir as null
	2026-05-28 10:55:14,313 INFO  [.p.f.h.HadoopSecuredFileSystem] [main] - Hadoop security configuration is legal, use the secured FileSystem.
	2026-05-28 10:55:14,313 INFO  [o.a.p.s.HadoopModule          ] [main] - Hadoop user set to root (auth:SIMPLE)
	2026-05-28 10:55:14,313 INFO  [o.a.p.s.HadoopModule          ] [main] - Kerberos security is disabled.
	2026-05-28 10:55:14,333 INFO  [o.a.s.a.t.f.FactoryUtil       ] [main] - Create sink 'Paimon' with upstream input catalog-table[database: database_43, schema: null, table: t_report_table_43_202608]
	2026-05-28 10:55:14,402 INFO  [.c.s.p.s.PaimonSecurityContext] [main] - Hadoop config initialized: org.apache.seatunnel.connectors.seatunnel.paimon.config.PaimonHadoopConfiguration
	2026-05-28 10:55:14,410 INFO  [.c.s.p.s.PaimonSecurityContext] [main] - Hadoop config initialized: org.apache.seatunnel.connectors.seatunnel.paimon.config.PaimonHadoopConfiguration
	2026-05-28 10:55:14,410 INFO  [o.a.p.h.HiveCatalog           ] [main] - Setting hive conf dir as null
	2026-05-28 10:55:14,500 INFO  [.p.f.h.HadoopSecuredFileSystem] [main] - Hadoop security configuration is legal, use the secured FileSystem.
	2026-05-28 10:55:14,500 INFO  [o.a.p.s.HadoopModule          ] [main] - Hadoop user set to root (auth:SIMPLE)
	2026-05-28 10:55:14,501 INFO  [o.a.p.s.HadoopModule          ] [main] - Kerberos security is disabled.
	2026-05-28 10:55:14,520 INFO  [o.a.s.a.t.f.FactoryUtil       ] [main] - Create sink 'Paimon' with upstream input catalog-table[database: database_43, schema: null, table: t_report_table_43_202609]
	2026-05-28 10:55:14,527 INFO  [.c.s.p.s.PaimonSecurityContext] [main] - Hadoop config initialized: org.apache.seatunnel.connectors.seatunnel.paimon.config.PaimonHadoopConfiguration
	2026-05-28 10:55:14,535 INFO  [.c.s.p.s.PaimonSecurityContext] [main] - Hadoop config initialized: org.apache.seatunnel.connectors.seatunnel.paimon.config.PaimonHadoopConfiguration
	2026-05-28 10:55:14,536 INFO  [o.a.p.h.HiveCatalog           ] [main] - Setting hive conf dir as null
	2026-05-28 10:55:14,606 INFO  [.p.f.h.HadoopSecuredFileSystem] [main] - Hadoop security configuration is legal, use the secured FileSystem.
	2026-05-28 10:55:14,607 INFO  [o.a.p.s.HadoopModule          ] [main] - Hadoop user set to root (auth:SIMPLE)
	2026-05-28 10:55:14,607 INFO  [o.a.p.s.HadoopModule          ] [main] - Kerberos security is disabled.
	2026-05-28 10:55:14,695 INFO  [o.a.s.a.t.f.FactoryUtil       ] [main] - Create sink 'Paimon' with upstream input catalog-table[database: database_43, schema: null, table: t_report_table_43_202610]
	2026-05-28 10:55:14,703 INFO  [.c.s.p.s.PaimonSecurityContext] [main] - Hadoop config initialized: org.apache.seatunnel.connectors.seatunnel.paimon.config.PaimonHadoopConfiguration
	2026-05-28 10:55:14,711 INFO  [.c.s.p.s.PaimonSecurityContext] [main] - Hadoop config initialized: org.apache.seatunnel.connectors.seatunnel.paimon.config.PaimonHadoopConfiguration
	2026-05-28 10:55:14,712 INFO  [o.a.p.h.HiveCatalog           ] [main] - Setting hive conf dir as null
[INFO] 2026-05-28 10:55:15.801 +0800 -  -> 
	2026-05-28 10:55:14,803 INFO  [.p.f.h.HadoopSecuredFileSystem] [main] - Hadoop security configuration is legal, use the secured FileSystem.
	2026-05-28 10:55:14,803 INFO  [o.a.p.s.HadoopModule          ] [main] - Hadoop user set to root (auth:SIMPLE)
	2026-05-28 10:55:14,803 INFO  [o.a.p.s.HadoopModule          ] [main] - Kerberos security is disabled.
	2026-05-28 10:55:14,822 INFO  [o.a.s.a.t.f.FactoryUtil       ] [main] - Create sink 'Paimon' with upstream input catalog-table[database: database_43, schema: null, table: t_report_table_43_202611]
	2026-05-28 10:55:14,831 INFO  [.c.s.p.s.PaimonSecurityContext] [main] - Hadoop config initialized: org.apache.seatunnel.connectors.seatunnel.paimon.config.PaimonHadoopConfiguration
	2026-05-28 10:55:14,899 INFO  [.c.s.p.s.PaimonSecurityContext] [main] - Hadoop config initialized: org.apache.seatunnel.connectors.seatunnel.paimon.config.PaimonHadoopConfiguration
	2026-05-28 10:55:14,900 INFO  [o.a.p.h.HiveCatalog           ] [main] - Setting hive conf dir as null
	2026-05-28 10:55:14,920 INFO  [.p.f.h.HadoopSecuredFileSystem] [main] - Hadoop security configuration is legal, use the secured FileSystem.
	2026-05-28 10:55:14,920 INFO  [o.a.p.s.HadoopModule          ] [main] - Hadoop user set to root (auth:SIMPLE)
	2026-05-28 10:55:14,920 INFO  [o.a.p.s.HadoopModule          ] [main] - Kerberos security is disabled.
	2026-05-28 10:55:15,010 INFO  [o.a.s.a.t.f.FactoryUtil       ] [main] - Create sink 'Paimon' with upstream input catalog-table[database: database_43, schema: null, table: t_report_table_43_202612]
	2026-05-28 10:55:15,099 INFO  [.c.s.p.s.PaimonSecurityContext] [main] - Hadoop config initialized: org.apache.seatunnel.connectors.seatunnel.paimon.config.PaimonHadoopConfiguration
	2026-05-28 10:55:15,107 INFO  [.c.s.p.s.PaimonSecurityContext] [main] - Hadoop config initialized: org.apache.seatunnel.connectors.seatunnel.paimon.config.PaimonHadoopConfiguration
	2026-05-28 10:55:15,107 INFO  [o.a.p.h.HiveCatalog           ] [main] - Setting hive conf dir as null
	2026-05-28 10:55:15,199 INFO  [.p.f.h.HadoopSecuredFileSystem] [main] - Hadoop security configuration is legal, use the secured FileSystem.
	2026-05-28 10:55:15,199 INFO  [o.a.p.s.HadoopModule          ] [main] - Hadoop user set to root (auth:SIMPLE)
	2026-05-28 10:55:15,199 INFO  [o.a.p.s.HadoopModule          ] [main] - Kerberos security is disabled.

Zeta or Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Guida contributor