CubeJS Server crashes with "terminating connection due to conflict with recovery" (PG read replica)
#3,904 建立於 2022年1月10日
描述
Describe the bug
When using a PostgreSQL read replica as datasource, the Cube Server crashes when a running query is interrupted because there are pending WAL entries that conflict with the query for more than max_standby_archive_delay (or max_standby_streaming_delay) (both are 30s by default).
As per the documentation: "Note that max_standby_archive_delay is not the same as the maximum length of time a query can run before cancellation (...) if one query has resulted in significant delay, subsequent conflicting queries will have much less grace time until the standby server has caught up again."
To Reproduce
- (optional) To make reproduction easier reduce
max_standby_archive_delayandmax_standby_streaming_delayto a few milliseconds instead of the default value of 30s. - Trigger a long running query against a PostgreSQL read replica.
- CubeJS server crashes.
error: terminating connection due to conflict with recovery
at Parser.parseErrorMessage (/app/index.js:402934:98)
at Parser.handlePacket (/app/index.js:402773:29)
at Parser.parse (/app/index.js:402686:38)
(...)
Expected behavior When a query is aborted by a PostgreSQL read replica due "to conflict with recovery", the query should be retried once or twice with backoff.
Our current CubeJS server configuration (below) already includes a query execution timeout smaller than the allowed replication delay, however if one or more previous queries already took a significant portion of the time, a relatively fast query can still get aborted.
orchestratorOptions: {
queryCacheOptions: {
(...)
queueOptions: {
executionTimeout: 25,
orphanedTimeout: 20,
heartBeatInterval: 5,
},
},
Version: @cubejs-backend/server-core: "0.29.17" @cubejs-backend/postgres-driver: "0.29.17"
Additional context
Also tried to increase the max_standby_archive_delay and max_standby_streaming_delay to 60s (double) but the problem still occurs frequently (twice a day or more).