[Feature][Connector] Add Azure CosmosDB Source Connector
#10357 opened on Jan 17, 2026
Description
Background
Azure Cosmos DB is Microsoft's globally distributed, multi-model NoSQL database service designed for mission-critical applications. It offers 99.999% SLA, single-digit millisecond latency, and support for multiple API models (SQL, MongoDB, Cassandra, Gremlin, Table).
Currently, SeaTunnel lacks native support for Azure Cosmos DB as a data source, limiting its ability to integrate with Azure cloud-native applications and globally distributed systems.
Motivation
- Azure Cloud Leadership: Cosmos DB is Microsoft Azure's flagship NoSQL database service.
- Multi-Model Support: Single database supporting SQL, document, key-value, graph, and column-family data models.
- Multi-Container Integration: Need to sync multiple containers from single or multiple databases.
- No JDBC Support: Requires native SDK for optimal performance and feature access.
Proposed Solution
Implement a dedicated Azure Cosmos DB Source connector supporting multiple API modes with multi-container support.
Crucially, this connector will follow SeaTunnel's standard multi-table configuration (aligned with JDBC Source) using table_list and table_path.
Core Features
-
Multi-Container Support (Standardized)
- Use
table_liststandard parameter for multi-container definition. - Use
table_path(format:database.container) to identify resources, consistent with other SeaTunnel connectors. - Support specialized configuration per container (partition keys, queries).
- Use
-
API Support
- SQL API (Core/SQL)
Configuration Examples
Multi-Container SQL API Configuration (Standardized)
env {
parallelism = 2
job.mode = "BATCH"
}
source {
CosmosDB {
# Connection
endpoint = "https://myaccount.documents.azure.com:443/"
auth_type = "master_key"
master_key = "your-primary-key"
api_type = "sql"
# Multi-container standard configuration
table_list = [
{
# Standard table_path format: database.container
table_path = "ecommerce.customers"
# Container specific settings
partition_key = "/customerId"
# Extraction settings
extraction_mode = "incremental"
incremental_field = "_ts"
start_timestamp = 1640995200
# Custom query (optional)
query = "SELECT * FROM c WHERE c._ts > @lastTimestamp AND c.status = 'active'"
},
{
table_path = "ecommerce.orders"
partition_key = "/orderId"
extraction_mode = "incremental"
incremental_field = "_ts"
},
{
table_path = "analytics.user_events"
partition_key = "/userId"
# Change feed (CDC) mode
extraction_mode = "change_feed"
change_feed_mode = "incremental"
lease_container_name = "leases"
}
]
# Global settings
max_ru_per_second = 1000
request_timeout_ms = 30000
}
}
sink {
Console {}
}
Technical Considerations
Multi-Container Configuration Standardization
- Parameter Alignment: Adopt
table_listto replace customcontainer-configsproposal. This ensures consistency with JDBC, StarRocks, and other multi-table sources. - Table Path Parsing: Utilize SeaTunnel's
TablePathclass to parsedatabase.containerstrings automatically.
Dependencies
- SQL API:
azure-cosmosJava SDK