[Bug] [Connector-V2] [TiDB-CDC] resolvedTs keeps advancing but row change events are silently missed · apache/seatunnel#11013

(4 comments) (0 reactions) (0 assignees)Java (6,897 stars) (1,432 forks)batch import

bughelp wanted

Description

This looks related to #8815, but this case happens on SeaTunnel 2.3.13 with Flink engine and the normal TiDB-CDC connector, not TiDB-CDC-MIGRATE.

Compared with #8815, this case has additional evidence that TiDBSourceReader / CDCClient kept advancing resolvedTs and checkpoints kept succeeding, while row events were not emitted downstream.

Search before asking

I had searched in the issues and found no similar issues.

What happened

A TiDB-CDC job stayed RUNNING and checkpoints kept succeeding, but the target table stopped receiving new row changes.

The source table continued to receive inserts/updates after the cutoff time, while the target table remained stale.

In this case:

Target table stopped at max(updated_at) = 2026-06-03 18:59:55.
Source table continued changing until at least max(updated_at) = 2026-06-05 11:27:36.
There were 1183 source rows changed after the cutoff time.
The Flink job remained RUNNING.
Checkpoints kept completing successfully.
No sink errors or backpressure were observed.
Flink metrics stopped increasing:
- Source numRecordsOut = 91746
- Sink numRecordsIn = 91746
- SourceReceivedQPS = 0.0
- SinkWriteQPS = 0.0
- Sink error count = 0

TaskManager logs show that TiDBSourceReader and CDCClient continued advancing resolvedTs after the target stopped receiving rows:

TiDBSourceReader - Capture streaming event from resolvedTs:... CDCClient - handle resolvedTs: ..., regionId: ... TiDBSourceReader - Capture streaming event next resolvedTs:...

SeaTunnel Version

2.3.13

SeaTunnel Config

env {
  parallelism = 1
  job.mode = "STREAMING"
  job.name = "dsp-17-v2-wzb_test_deposit_applications"
  checkpoint.interval = 300000
  checkpoint.timeout = 600000
  checkpoint.mode = "EXACTLY_ONCE"
  restart-strategy = "failure-rate"
  restart-strategy.failure-rate.max-failures-per-interval = 10
  restart-strategy.failure-rate.failure-rate-interval = "300 s"
  restart-strategy.failure-rate.delay = "10 s"
}

source {
  TiDB-CDC {
    plugin_output = "src"
    url = "jdbc:mysql://*****:4001/alpha_online"
    driver = "com.mysql.cj.jdbc.Driver"
    pd-addresses = "*****:2379,*****:2379,*****:2379"
    username = "******"
    password = "******"
    database-name = "alpha_online"
    table-name = "deposit_applications"
    startup.mode = "initial"
  }
}

sink {
  Jdbc {
    source_table_name = "src"
    driver = "com.mysql.cj.jdbc.Driver"
    url = "jdbc:mysql://*****:4000/?useSSL=false&useUnicode=true&characterEncoding=utf8"
    user = "root"
    password = "******"
    batch_size = 1000
    database = "sync_test"
    table = "deposit_applications"
    primary_keys = ["id"]
    generate_sink_sql = true
    support_upsert_by_query_primary_key_exist = true
  }
}

Running Command

The job was submitted by our internal DataT platform to Flink on YARN.

Runtime application:
application_1756448821197_5312

Flink job name:
dsp-17-v2-wzb_test_deposit_applications

The platform uses SeaTunnel Flink starter with the TiDB-CDC source plugin.
The exact generated SeaTunnel config is provided in the "SeaTunnel Config" section above.

This was not started by manually running start-seatunnel-flink-*.sh.

Error Exception

No exception was thrown.

The job stayed RUNNING and checkpoints kept succeeding.
No sink errors or backpressure were observed.

The issue is silent data loss / silent data miss:
TiDBSourceReader and CDCClient kept advancing resolvedTs, but row change events after 2026-06-03 18:59:55 were not emitted downstream.

Zeta or Flink or Spark Version

Flink 1.17.1

Java or Scala Version

Java 8，OpenJDK 1.8.0_422

Screenshots

jobmanager-log-excerpt.log

taskmanager-log-excerpt.log

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

Contributor guide

Tech stack: java
Domain: data
Issue type: bug
Difficulty: 3
Estimated time: 1-2 days
Activity status: active
Clarity: clear
Prerequisites: JavaFlinkTiDB
Newbie friendliness: 35
Research direction: Investigate TiDBSourceReader and CDCClient to identify why resolvedTs advances but row events after a certain point are not emitted downstream. Check for event filtering or region checkpoint issues.