citusdata/citus

rebalance never starts

Open

#7,103 建立於 2023年8月3日

在 GitHub 查看
 (12 留言) (0 反應) (0 負責人)C (9,388 star) (625 fork)batch import
good first issuewarm-up

描述

With citus 11.3 I've added a node and triggered a rebalance. The rebalance has been scheduled correctly but never starts running despite having 1 runnable task (and 10 blocked ones).

I'm using the docker image citusdata/citus:11.3 in all nodes. The connection between the nodes works (primary is at 10.132.0.2):

SELECT * FROM citus_get_active_worker_nodes();
 node_name  | node_port
------------+-----------
 10.132.0.4 |      5432
 10.132.0.5 |      5432
(2 rows)

Command history:

staging=# SELECT * from citus_add_node('10.132.0.5', 5432);
 citus_add_node
----------------
             10
(1 row)

Time: 623.522 ms
staging=# SELECT citus_rebalance_start();
NOTICE:  Scheduled 10 moves as job 1
DETAIL:  Rebalance scheduled as background job
HINT:  To monitor progress, run: SELECT * FROM citus_rebalance_status();
 citus_rebalance_start
-----------------------
                     1
(1 row)

Time: 26.101 ms
staging=# SELECT * FROM citus_rebalance_status();
 job_id |   state   | job_type  |           description           | started_at | finished_at |                              details
--------+-----------+-----------+---------------------------------+------------+-------------+--------------------------------------------------------------------
      1 | scheduled | rebalance | Rebalance all colocation groups |            |             | {"tasks": [], "task_state_counts": {"blocked": 10, "runnable": 1}}
(1 row)

Time: 3.200 ms
staging=# SELECT pg_terminate_backend(pg_stat_activity.pid)
FROM pg_stat_activity
WHERE pg_stat_activity.datname = 'staging'
  AND pid <> pg_backend_pid();
 pg_terminate_backend
----------------------
 t
 t
 t
 t
 t
 t
 t
 t
(8 rows)
staging=# SELECT get_rebalance_table_shards_plan();
               get_rebalance_table_shards_plan
-------------------------------------------------------------
 (sensor_datapoint,102183,0,10.132.0.4,5432,10.132.0.5,5432)
 (sensor_datapoint,102182,0,10.132.0.2,5432,10.132.0.5,5432)
 (sensor_datapoint,102185,0,10.132.0.4,5432,10.132.0.5,5432)
 (sensor_datapoint,102184,0,10.132.0.2,5432,10.132.0.5,5432)
 (sensor_datapoint,102187,0,10.132.0.4,5432,10.132.0.5,5432)
 (sensor_datapoint,102186,0,10.132.0.2,5432,10.132.0.5,5432)
 (sensor_datapoint,102189,0,10.132.0.4,5432,10.132.0.5,5432)
 (sensor_datapoint,102188,0,10.132.0.2,5432,10.132.0.5,5432)
 (sensor_datapoint,102191,0,10.132.0.4,5432,10.132.0.5,5432)
 (sensor_datapoint,102190,0,10.132.0.2,5432,10.132.0.5,5432)
(10 rows)

Time: 4.475 ms
staging=# SELECT * from pg_dist_node;
 nodeid | groupid |  nodename  | nodeport | noderack | hasmetadata | isactive | noderole | nodecluster | metadatasynced | shouldhaveshards
--------+---------+------------+----------+----------+-------------+----------+----------+-------------+----------------+------------------
      1 |       0 | 10.132.0.2 |     5432 | default  | t           | t        | primary  | default     | t              | t
      6 |       5 | 10.132.0.4 |     5432 | default  | t           | t        | primary  | default     | t              | t
     10 |       9 | 10.132.0.5 |     5432 | default  | t           | t        | primary  | default     | t              | t
staging=# ALTER SYSTEM SET citus.max_background_task_executors_per_node = 2;
ALTER SYSTEM
Time: 9.613 ms
staging=# SELECT pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)

Time: 1.585 ms
staging=# SELECT * FROM citus_rebalance_status() \gx
-[ RECORD 1 ]-------------------------------------------------------------------
job_id      | 1
state       | scheduled
job_type    | rebalance
description | Rebalance all colocation groups
started_at  |
finished_at |
details     | {"tasks": [], "task_state_counts": {"blocked": 10, "runnable": 1}}

Time: 3.033 ms

I've been waiting for a long time and nothing changes.

貢獻者指南