[Feature] Arrow Flight SQL: support Arrow IPC compression (LZ4/ZSTD) for DoGet responses
#73,876 建立於 2026年5月26日
描述
Feature request
Is your feature request related to a problem? Please describe.
StarRocks returns Arrow IPC data over Arrow Flight completely uncompressed. Confirmed at source level:
be/src/service/service_be/arrow_flight_sql_service.cpp—DoGetStatementreturnsRecordBatchStream(reader)with noIpcWriteOptionsargument; the codec defaults toUNCOMPRESSED.fe/.../ArrowFlightSqlService.java—FlightServer.builderhas no.compressor()call; nothing compression-related exists on the builder.- No BE or FE config parameters exist to enable compression. Confirmed with StarRocks support (Rocky, 2026-05-22).
Client-side workarounds have no effect: grpc-encoding: gzip and
grpc.default_compression_algorithm only compress client→server messages. The server
must compress its own DoGet responses, which it does not.
Describe the solution you'd like
Add IpcWriteOptions with a codec to RecordBatchStream in DoGetStatement:
arrow::ipc::IpcWriteOptions options = arrow::ipc::IpcWriteOptions::Defaults();
ARROW_ASSIGN_OR_RAISE(options.codec, arrow::util::Codec::Create(arrow::Compression::LZ4_FRAME));
return std::make_unique<arrow::flight::RecordBatchStream>(reader, options);
The Arrow IPC format spec defines CompressionType with exactly two values: LZ4_FRAME
and ZSTD (other codecs are not valid for IPC). One implementation note: LZ4_FRAME
(frame format, enum value 6) and LZ4 (raw/block format, enum value 5) are different
on-wire formats; a user-facing lz4 value must map to LZ4_FRAME.
Ideally exposed as a session variable (SET arrow_flight_compression = 'lz4') for
per-connection control, with a cluster-level default via BE config.
Describe alternatives you've considered
- gRPC message-level compression: no effect on server→client
DoGetresponses without server-side configuration, and inferior to Arrow IPC compression regardless — gRPC compresses arbitrary byte frames rather than column-aligned record batches, breaking Arrow's zero-copy path and yielding worse compression ratios.
Additional context
IpcWriteOptionshas amin_space_savingsfield (Arrow ≥ 5.0): skip compression when savings are below a threshold. Worth hardcoding a small default (e.g. 0.05) to avoid negative compression on already-dense numeric columns.write_legacy_ipc_formatmust befalse(the default) for compression to work; the legacy format does not support compression. Should be verified inArrowFlightBatchReader.