[Discuss] Discuss adding VARIANT SqlType for semi-structured data
#10774 opened on Apr 16, 2026
Description
Search before asking
I searched existing issues with keywords such as VARIANT, SqlType, and parse_json.
There is a related but different SQL Server variant connector issue, but I did not find an issue discussing a generic SeaTunnel API-level VARIANT/semi-structured type.
What would you like to be added?
I would like to start a discussion about whether SeaTunnel should add a VARIANT-like type to the API type system for semi-structured data.
Currently, SqlType does not include a generic semi-structured type:
public enum SqlType {
ARRAY,
MAP,
STRING,
BOOLEAN,
TINYINT,
SMALLINT,
INT,
BIGINT,
FLOAT,
DOUBLE,
DECIMAL,
NULL,
BYTES,
DATE,
TIME,
TIMESTAMP,
TIMESTAMP_TZ,
BINARY_VECTOR,
FLOAT_VECTOR,
FLOAT16_VECTOR,
BFLOAT16_VECTOR,
SPARSE_FLOAT_VECTOR,
ROW,
MULTIPLE_ROW;
}
SeaTunnel already has ARRAY, MAP, and ROW, which work well when the schema is known. However, CDC and JSON-oriented pipelines often need to carry semi-structured values whose shape may vary across rows or evolve frequently.
Why is this needed?
Some common scenarios are difficult to model cleanly today:
- CDC pipelines that need to preserve semi-structured source columns without converting everything to
STRING. - JSON/Kafka/MongoDB-style data where fields may be dynamic or partially unknown.
- Transform use cases such as
PARSE_JSON, where users may want to parse a JSON string into a typed semi-structured value instead of a plain string. - Sink mappings for systems that have native JSON/VARIANT-like types.
Current workarounds usually fall into two categories:
- Use
STRING, which preserves the raw value but loses type semantics. - Use
MAP/ROW, which requires a more fixed schema and is less convenient for highly dynamic JSON payloads.
Proposal for discussion
This issue is intended as a design discussion first, not an immediate implementation request.
Possible directions:
- Add a new
SqlType.VARIANTorSqlType.JSON. - Add a corresponding
SeaTunnelDataType, for exampleVariantTypeorJsonType. - Define conversion rules for JSON format, CDC deserialization, catalog mapping, and common sinks.
- Add a
PARSE_JSONtransform/function later, either returning the new semi-structured type or requiring an explicit target schema. - Document connector support as a compatibility matrix, because not every sink can store semi-structured values natively.
Open questions
- Should the type be named
VARIANT,JSON, or something else? - Should the physical representation preserve the original JSON text, use a structured object model, or support both?
- How should this interact with schema evolution events?
- Which connectors should support it in the first phase?
- Should SeaTunnel first add
PARSE_JSONwith explicitROW/MAPoutput before introducing a new API type?
Compatibility
This can be introduced as an additive API capability. Existing behavior does not need to change unless a connector explicitly opts into the new type.