apache/seatunnel

[Bug][Connector-V2][MySQL-CDC] BinlogOffset.compareTo ignores restartSkipRows when GTID sets are equal

Open

#10775 opened on Apr 16, 2026

View on GitHub
 (2 comments) (0 reactions) (1 assignee)Java (6,897 stars) (1,432 forks)batch import
bugcdcconnectors-v2help wanted

Description

Search before asking

I searched existing issues with keywords including BinlogOffset, GTID, restartSkipRows, and MySQL CDC compareTo, and did not find a duplicate issue for this offset comparison problem.

What happened

BinlogOffset stores both restartSkipEvents and restartSkipRows in the offset map, but BinlogOffset.compareTo only compares restartSkipEvents when both offsets have the same GTID set.

Code path:

  • seatunnel-connectors-v2/connector-cdc/connector-cdc-mysql/src/main/java/org/apache/seatunnel/connectors/seatunnel/cdc/mysql/source/offset/BinlogOffset.java
  • When gtidSet.equals(targetGtidSet), the method returns the comparison result of restartSkipEvents only.
  • restartSkipRows is stored in the offset map but is not considered in this branch.

This means two offsets can be considered equal when they have:

  1. The same GTID set
  2. The same restartSkipEvents
  3. Different restartSkipRows

For CDC recovery and offset ordering, this can be risky because row-level progress inside the same binlog event may be lost in comparison semantics.

Expected behavior

When GTID sets are equal, BinlogOffset.compareTo should preserve deterministic ordering for all relevant offset components.

At minimum, the comparison should consider restartSkipRows after restartSkipEvents. It may also be worth reviewing whether binlog file/position should be used as a fallback in this branch.

Why this matters

MySQL CDC correctness depends on precise offset comparison during recovery, checkpoint restore, and split/watermark coordination. If two distinct offsets compare as equal, SeaTunnel may make incorrect decisions when resuming from GTID-based offsets.

Suggested direction

Contributors are welcome to help with this issue.

A good PR could include:

  1. Add unit tests for BinlogOffset.compareTo covering equal GTID sets with different restartSkipRows.
  2. Decide the correct comparison order for equal GTID sets:
    • restartSkipEvents
    • restartSkipRows
    • optional fallback to binlog file/position if needed
  3. Update compareTo accordingly.
  4. Add regression coverage for GTID-based recovery semantics if possible.

Contributor guide