apache/seatunnel

[Feature][Connector] Add Informix CDC Connector

Open

#10359 opened on Jan 17, 2026

View on GitHub
 (3 comments) (0 reactions) (0 assignees)Java (6,897 stars) (1,432 forks)batch import
help wanted

Description

Background

IBM Informix is a relational database widely used in financial institutions, retail chains, and legacy enterprise systems, especially in banking core systems and point-of-sale (POS) applications.

SeaTunnel currently supports Informix via the JDBC connector (dialect-based). However, JDBC-based reading cannot provide CDC, and it cannot leverage Informix-specific capabilities (for example, fragmentation-aware reads) for better performance.

Motivation

  • Legacy system integration: Informix remains mission-critical in many industries.
  • Multi-table CDC: Sync multiple tables in one job to modern platforms.
  • Performance: JDBC cannot utilize Informix fragmentation/parallelism effectively.
  • Near real-time: CDC is required for timely analytics and replication.

Current Status vs. Proposed Enhancement

Feature Current (JDBC Connector) Proposed (Informix-CDC Source)
Read Snapshot Snapshot + CDC
CDC No Yes (API / log / trigger based)
Multi-Table Limited (depends on connector) Yes (CDC multi-table patterns)
Optimizations Generic Informix-aware (when available)
Data Types Basic (limited) SERIAL8, LVARCHAR, Smart LOBs

Proposed Solution

Implement a dedicated Informix-CDC Source connector.

Why not table_list?

For CDC connectors in SeaTunnel, multi-table selection and per-table overrides are typically done via:

  • table-names / table-pattern (choose tables)
  • table-names-config (per-table overrides such as custom PK and snapshot split column)

This is already established in MySQL-CDC and the shared CDC base config model. Using table_list here would introduce a third multi-table style and create confusion around parameter naming (for example primary_keys vs primaryKeys).

If we need a batch-style multi-table snapshot/incremental reader for Informix, it should continue to align with JDBC Source and use table_list there (separate from this CDC connector).

Goals

  • Align with existing CDC connector configuration patterns (table-names, table-pattern, table-names-config).
  • Support snapshot + CDC with clear offset/checkpoint semantics.
  • Improve read performance with Informix-specific optimizations when possible.

Non-goals (for the initial version)

  • Cover every Informix edition-specific CDC feature in one release.
  • Replace JDBC Source for batch reads.

Core Features

  1. Multi-Table Support (CDC-style)

    • Use table-names for explicit table lists.
    • Use table-pattern for regex-based table selection.
    • Use table-names-config for per-table overrides (for example primaryKeys, snapshotSplitColumn).
  2. Change Data Capture (CDC)

    • Option A: CDC API (Enterprise Edition), preferred for low latency.
    • Option B: Log-based CDC (logical logs), for Standard Edition if feasible.
    • Option C: Trigger-based CDC as a fallback.
  3. Type Mapping

    • Support Informix-specific types (for example, SERIAL8, LVARCHAR, Smart LOBs) beyond basic JDBC mappings.
  4. Performance Tuning

    • Fragmentation-aware and parallel reads where applicable.
    • Standard connector tuning parameters such as fetch_size.

Configuration Examples

CDC (Multi-Table, table-names + table-names-config)

env {
  parallelism = 4
  checkpoint.interval = 5000
}

source {
  Informix-CDC {
    # Connection
    url = "jdbc:informix-sqli://informix-server.example.com:9088/stores_demo:INFORMIXSERVER=ol_informix"
    username = "informix"
    password = "******"

    # Multi-table selection
    table-names = ["stores_demo.customer", "stores_demo.orders", "stores_demo.items"]

    # Per-table overrides (custom PK / snapshot split column)
    table-names-config = [
      {
        table = "stores_demo.customer"
        primaryKeys = ["customer_num"]
        snapshotSplitColumn = "customer_num"
      },
      {
        table = "stores_demo.orders"
        primaryKeys = ["order_num"]
        snapshotSplitColumn = "order_num"
      }
    ]

    # Startup mode (example naming; align with existing CDC connectors)
    startup.mode = "initial"

    fetch_size = 1000
  }
}

sink {
  Console {}
}

CDC (Regex-based multi-table via table-pattern)

source {
  Informix-CDC {
    # Connection omitted (same as above)
    table-pattern = "stores_demo\\..*"
    startup.mode = "initial"
  }
}

Technical Considerations

  • Configuration alignment: Follow existing CDC connector patterns and the CDC base config model.
  • Table discovery: Use a stable table identifier (for example TableId) to match table-names-config entries reliably.
  • Snapshot split strategy: Use snapshotSplitColumn and align split semantics with CDC base.

Contributor guide