软件版本:
PostgreSQL 13.4
repmgr 5.3
数据库安装规划:
node_id | node_name | port | data_directory | |
1 | node1 | 5432 | /data/n001 | primary |
2 | node2 | 5433 | /data/n002 | standby |
3 | node3 | 5434 | /data/n003 | standby |
01,初始化数据库,并修改数据库参数,启动数据库
#初始化
initdb -D /data/n001
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
creating directory /data/n001 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Asia/Shanghai
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok
initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
Success. You can now start the database server using:
pg_ctl -D /data/n001 -l logfile start
#参数修改
wal_keep_size=10GB
max_wal_size = 10GB
min_wal_size = 8000MB
max_wal_senders = 10
max_replication_slots = 10
hot_standby = on
archive_mode = on
archive_command = '/bin/true'
shared_preload_libraries ='repmgr'
wal_log_hints=on
show wal_keep_size;
show max_wal_size;
show min_wal_size;
show max_wal_senders;
show max_replication_slots;
show hot_standby;
show archive_mode;
show archive_command;
show shared_preload_libraries;
show wal_log_hints;
#启动数据库
-bash-4.2$ pg_ctl -D /data/n001/ start
waiting for server to start....2021-10-28 10:58:44.526 CST [31597] LOG: starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2021-10-28 10:58:44.526 CST [31597] LOG: listening on IPv6 address "::1", port 5432
2021-10-28 10:58:44.526 CST [31597] LOG: listening on IPv4 address "127.0.0.1", port 5432
2021-10-28 10:58:44.547 CST [31597] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-10-28 10:58:44.571 CST [31604] LOG: database system was shut down at 2021-10-28 10:52:48 CST
2021-10-28 10:58:44.602 CST [31597] LOG: database system is ready to accept connections
done
server started
02,初始化repmgr数据库和用户
#Create a dedicated PostgreSQL superuser account and a database for the repmgr metadat.
createuser -s repmgr
createdb repmgr -O repmgr
03,编辑repmgr配置文件
::::::::::::::
node1.conf
::::::::::::::
node_id=1
node_name='node1'
conninfo='host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/data/n001'
pg_bindir='/usr/local/postgresql/bin'
::::::::::::::
node2.conf
::::::::::::::
node_id=2
node_name='node2'
conninfo='host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/data/n002'
pg_bindir='/usr/local/postgresql/bin'
::::::::::::::
node3.conf
::::::::::::::
node_id=3
node_name='node3'
conninfo='host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/data/n003'
pg_bindir='/usr/local/postgresql/bin'
04,注册node01主数据库
-bash-4.2$ repmgr -f /data/node1.conf primary register
INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered
-bash-4.2$ repmgr -f /data/node1.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2
05,克隆node02,node03备数据库
克隆命令:
repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node2.conf standby clone --dry-run
repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node2.conf standby clone
repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node3.conf standby clone --dry-run
repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node3.conf standby clone
克隆日志:
-bash-4.2$ repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node2.conf standby clone --dry-run
NOTICE: destination directory "/data/n002" provided
INFO: connecting to source node
DETAIL: connection string is: host=127.0.0.1 user=repmgr dbname=repmgr
DETAIL: current installation size is 31 MB
INFO: "repmgr" extension is installed in database "repmgr"
INFO: replication slot usage not requested; no replication slot will be set up for this standby
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 10 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: would execute:
/usr/local/postgresql/bin/pg_basebackup -l "repmgr base backup" -D /data/n002 -h 127.0.0.1 -p 5432 -U repmgr -X stream
INFO: all prerequisites for "standby clone" are met
-bash-4.2$ repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node2.conf standby clone
NOTICE: destination directory "/data/n002" provided
INFO: connecting to source node
DETAIL: connection string is: host=127.0.0.1 user=repmgr dbname=repmgr
DETAIL: current installation size is 31 MB
INFO: replication slot usage not requested; no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: creating directory "/data/n002"...
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
/usr/local/postgresql/bin/pg_basebackup -l "repmgr base backup" -D /data/n002 -h 127.0.0.1 -p 5432 -U repmgr -X stream
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /data/n002 start
HINT: after starting the server, you need to register this standby with "repmgr standby register"
修改node02,node03端口
sed -i "s/#port = 5432/port = 5433/g" /data/n002/postgresql.conf
sed -i "s/#port = 5432/port = 5434/g" /data/n003/postgresql.conf
-bash-4.2$ cat /data/n002/postgresql.conf |grep port
port = 5433 # (change requires restart)
#ssl_passphrase_command_supports_reload = off
# supported by the operating system:
# supported by the operating system:
# supported by the operating system:
# %r = remote host and port
-bash-4.2$ cat /data/n003/postgresql.conf |grep port
port = 5434 # (change requires restart)
#ssl_passphrase_command_supports_reload = off
# supported by the operating system:
# supported by the operating system:
# supported by the operating system:
# %r = remote host and port
启动数据库并注册:
pg_ctl -D /data/n002 start
pg_ctl -D /data/n003 start
repmgr -f /data/node2.conf standby register
repmgr -f /data/node3.conf standby register
-bash-4.2$ pg_ctl -D /data/n002 start
waiting for server to start....2021-10-28 11:13:27.211 CST [5683] LOG: starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2021-10-28 11:13:27.212 CST [5683] LOG: listening on IPv6 address "::1", port 5433
2021-10-28 11:13:27.212 CST [5683] LOG: listening on IPv4 address "127.0.0.1", port 5433
2021-10-28 11:13:27.238 CST [5683] LOG: listening on Unix socket "/tmp/.s.PGSQL.5433"
2021-10-28 11:13:27.271 CST [5684] LOG: database system was interrupted; last known up at 2021-10-28 11:09:14 CST
..2021-10-28 11:13:29.396 CST [5684] LOG: entering standby mode
2021-10-28 11:13:29.418 CST [5684] LOG: redo starts at 0/2000028
2021-10-28 11:13:29.440 CST [5684] LOG: consistent recovery state reached at 0/2000138
2021-10-28 11:13:29.440 CST [5683] LOG: database system is ready to accept read only connections
2021-10-28 11:13:29.482 CST [5718] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
done
server started
-bash-4.2$ pg_ctl -D /data/n003 start
waiting for server to start....2021-10-28 11:13:33.327 CST [5734] LOG: starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2021-10-28 11:13:33.328 CST [5734] LOG: listening on IPv6 address "::1", port 5434
2021-10-28 11:13:33.328 CST [5734] LOG: listening on IPv4 address "127.0.0.1", port 5434
2021-10-28 11:13:33.350 CST [5734] LOG: listening on Unix socket "/tmp/.s.PGSQL.5434"
2021-10-28 11:13:33.397 CST [5736] LOG: database system was interrupted; last known up at 2021-10-28 11:11:11 CST
.2021-10-28 11:13:35.226 CST [5736] LOG: entering standby mode
.2021-10-28 11:13:35.258 CST [5736] LOG: redo starts at 0/4000028
2021-10-28 11:13:35.271 CST [5736] LOG: consistent recovery state reached at 0/4000100
2021-10-28 11:13:35.271 CST [5734] LOG: database system is ready to accept read only connections
2021-10-28 11:13:35.285 CST [5764] LOG: started streaming WAL from primary at 0/5000000 on timeline 1
done
server started
-bash-4.2$ repmgr -f /data/node2.conf standby register
INFO: connecting to local node "node2" (ID: 2)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
INFO: standby registration complete
NOTICE: standby node "node2" (ID: 2) successfully registered
-bash-4.2$ repmgr -f /data/node3.conf standby register
INFO: connecting to local node "node3" (ID: 3)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
INFO: standby registration complete
NOTICE: standby node "node3" (ID: 3) successfully registered
-bash-4.2$ repmgr -f /data/node3.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | standby | running | node1 | default | 100 | 1 | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2
3 | node3 | standby | running | node1 | default | 100 | 1 | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2
06,switchover with repmgr
#切换执行需要配置ssh等效性。(SSH)
#在standby上执行,切换node2为primary
#check:
repmgr standby switchover -f /data/node2.conf --siblings-follow --dry-run
#执行切换:
repmgr standby switchover -f /data/node2.conf
#备机指向新的primary:
repmgr -f /data/node3.conf standby follow
#切换主数据库为node2:
-bash-4.2$ repmgr standby switchover -f /data/node2.conf --siblings-follow --dry-run
NOTICE: checking switchover on node "node2" (ID: 2) in --dry-run mode
INFO: SSH connection to host "127.0.0.1" succeeded
INFO: able to execute "repmgr" on remote host "127.0.0.1"
INFO: all sibling nodes are reachable via SSH
INFO: 2 walsenders required, 10 available
INFO: demotion candidate is able to make replication connection to promotion candidate
INFO: 0 pending archive files
INFO: replication lag on this standby is 0 seconds
NOTICE: attempting to pause repmgrd on 3 nodes
NOTICE: local node "node2" (ID: 2) would be promoted to primary; current primary "node1" (ID: 1) would be demoted to standby
INFO: following shutdown command would be run on node "node1":
"/usr/local/postgresql/bin/pg_ctl -D '/data/n001' -W -m fast stop"
INFO: parameter "shutdown_check_timeout" is set to 60 seconds
INFO: prerequisites for executing STANDBY SWITCHOVER are met
-bash-4.2$ repmgr standby switchover -f /data/node2.conf
NOTICE: executing switchover on node "node2" (ID: 2)
WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified
DETAIL: these nodes will remain attached to the current primary:
node3 (node ID: 3)
NOTICE: attempting to pause repmgrd on 3 nodes
NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "node1" (ID: 1)
NOTICE: issuing CHECKPOINT on node "node1" (ID: 1)
DETAIL: executing server command "/usr/local/postgresql/bin/pg_ctl -D '/data/n001' -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/6000028
NOTICE: promoting standby to primary
DETAIL: promoting server "node2" (ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node2" (ID: 2) was successfully promoted to primary
NOTICE: node "node2" (ID: 2) promoted to primary, node "node1" (ID: 1) demoted to standby
NOTICE: switchover was successful
DETAIL: node "node2" is now primary and node "node1" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully
-bash-4.2$ repmgr standby switchover -f /data/node2.conf
NOTICE: executing switchover on node "node2" (ID: 2)
WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified
DETAIL: these nodes will remain attached to the current primary:
node3 (node ID: 3)
NOTICE: attempting to pause repmgrd on 3 nodes
NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "node1" (ID: 1)
NOTICE: issuing CHECKPOINT on node "node1" (ID: 1)
DETAIL: executing server command "/usr/local/postgresql/bin/pg_ctl -D '/data/n001' -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/6000028
NOTICE: promoting standby to primary
DETAIL: promoting server "node2" (ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node2" (ID: 2) was successfully promoted to primary
NOTICE: node "node2" (ID: 2) promoted to primary, node "node1" (ID: 1) demoted to standby
NOTICE: switchover was successful
DETAIL: node "node2" is now primary and node "node1" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully
#可以发现node2已经变为primary
-bash-4.2$ repmgr -f /data/node2.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------
1 | node1 | standby | running | node2 | default | 100 | 1 | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | primary | * running | | default | 100 | 2 | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2
3 | node3 | standby | running | node1 | default | 100 | 1 | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2
#切换node3从node2同步数据:执行新的primary
-bash-4.2$ cat n003/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'host=127.0.0.1 user=repmgr application_name=node3 connect_timeout=2'
-bash-4.2$ repmgr -f /data/node3.conf standby follow
NOTICE: attempting to find and follow current primary
INFO: timelines are same, this server is not ahead
DETAIL: local node lsn is 0/6000A28, follow target lsn is 0/6000A28
NOTICE: setting node 3's upstream to node 2
WARNING: node "node3" not found in "pg_stat_replication"
NOTICE: STANDBY FOLLOW successful
DETAIL: standby attached to upstream node "node2" (ID: 2)
-bash-4.2$ repmgr -f /data/node2.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------
1 | node1 | standby | running | node2 | default | 100 | 1 | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | primary | * running | | default | 100 | 2 | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2
3 | node3 | standby | running | node2 | default | 100 | 2 | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2
#改名了修改了primary_conninfo信息,指向新的primary数据库。
-bash-4.2$ cat n003/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'user=repmgr connect_timeout=2 host=127.0.0.1 port=5433 application_name=node3'
#切换主数据库为node3:
#check:
repmgr standby switchover -f /data/node3.conf --siblings-follow --dry-run
#执行切换:
repmgr standby switchover -f /data/node3.conf
-bash-4.2$ repmgr -f /data/node2.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------
1 | node1 | standby | running | node2 | default | 100 | 2 | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | primary | * running | | default | 100 | 2 | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2
3 | node3 | standby | running | node2 | default | 100 | 2 | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2
-bash-4.2$ repmgr standby switchover -f /data/node3.conf --siblings-follow --dry-run
NOTICE: checking switchover on node "node3" (ID: 3) in --dry-run mode
INFO: SSH connection to host "127.0.0.1" succeeded
INFO: able to execute "repmgr" on remote host "127.0.0.1"
INFO: all sibling nodes are reachable via SSH
INFO: 2 walsenders required, 10 available
INFO: demotion candidate is able to make replication connection to promotion candidate
INFO: 0 pending archive files
INFO: replication lag on this standby is 0 seconds
NOTICE: attempting to pause repmgrd on 3 nodes
NOTICE: local node "node3" (ID: 3) would be promoted to primary; current primary "node2" (ID: 2) would be demoted to standby
INFO: following shutdown command would be run on node "node2":
"/usr/local/postgresql/bin/pg_ctl -D '/data/n002' -W -m fast stop"
INFO: parameter "shutdown_check_timeout" is set to 60 seconds
INFO: prerequisites for executing STANDBY SWITCHOVER are met
-bash-4.2$ repmgr standby switchover -f /data/node3.conf
NOTICE: executing switchover on node "node3" (ID: 3)
WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified
DETAIL: these nodes will remain attached to the current primary:
node1 (node ID: 1)
NOTICE: attempting to pause repmgrd on 3 nodes
NOTICE: local node "node3" (ID: 3) will be promoted to primary; current primary "node2" (ID: 2) will be demoted to standby
NOTICE: stopping current primary node "node2" (ID: 2)
NOTICE: issuing CHECKPOINT on node "node2" (ID: 2)
DETAIL: executing server command "/usr/local/postgresql/bin/pg_ctl -D '/data/n002' -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/7000028
NOTICE: promoting standby to primary
DETAIL: promoting server "node3" (ID: 3) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node3" (ID: 3) was successfully promoted to primary
NOTICE: node "node3" (ID: 3) promoted to primary, node "node2" (ID: 2) demoted to standby
NOTICE: switchover was successful
DETAIL: node "node3" is now primary and node "node2" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully
-bash-4.2$ repmgr -f /data/node2.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------
1 | node1 | standby | running | node2 | default | 100 | 2 | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | standby | running | node3 | default | 100 | 2 | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2
3 | node3 | primary | * running | | default | 100 | 3 | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2
#切换主数据库为node1:
#candidate的数据库不能直接切换为主数据库:
repmgr -f /data/node1.conf standby follow
#check:
repmgr standby switchover -f /data/node1.conf --siblings-follow --dry-run
#执行切换:
repmgr standby switchover -f /data/node1.conf
-bash-4.2$ repmgr -f /data/node2.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------
1 | node1 | standby | running | node2 | default | 100 | 2 | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | standby | running | node3 | default | 100 | 2 | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2
3 | node3 | primary | * running | | default | 100 | 3 | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2
#node1为级联备库,不能直接提升为主数据库。
-bash-4.2$ repmgr standby switchover -f /data/node1.conf --siblings-follow --dry-run
NOTICE: checking switchover on node "node1" (ID: 1) in --dry-run mode
ERROR: local node "node1" (ID: 1) is not a downstream of demotion candidate primary "node3" (ID: 3)
DETAIL: registered upstream node ID is 2
HINT: execute "repmgr standby register --force" to update the local node's metadata
#修改node1为node3的从库才可以正常切换为主数据库。
-bash-4.2$ repmgr -f /data/node1.conf standby follow
NOTICE: attempting to find and follow current primary
INFO: timelines are same, this server is not ahead
DETAIL: local node lsn is 0/7000E30, follow target lsn is 0/7000E30
NOTICE: setting node 1's upstream to node 3
WARNING: node "node1" not found in "pg_stat_replication"
NOTICE: STANDBY FOLLOW successful
DETAIL: standby attached to upstream node "node3" (ID: 3)
-bash-4.2$ repmgr -f /data/node2.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------
1 | node1 | standby | running | node3 | default | 100 | 3 | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | standby | running | node3 | default | 100 | 3 | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2
3 | node3 | primary | * running | | default | 100 | 3 | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2
-bash-4.2$ repmgr standby switchover -f /data/node1.conf --siblings-follow --dry-run
NOTICE: checking switchover on node "node1" (ID: 1) in --dry-run mode
ERROR: local node "node1" (ID: 1) is not a downstream of demotion candidate primary "node3" (ID: 3)
DETAIL: registered upstream node ID is 2
HINT: execute "repmgr standby register --force" to update the local node's metadata
-bash-4.2$ repmgr -f /data/node1.conf standby follow
NOTICE: attempting to find and follow current primary
INFO: timelines are same, this server is not ahead
DETAIL: local node lsn is 0/7000E30, follow target lsn is 0/7000E30
NOTICE: setting node 1's upstream to node 3
WARNING: node "node1" not found in "pg_stat_replication"
NOTICE: STANDBY FOLLOW successful
DETAIL: standby attached to upstream node "node3" (ID: 3)
#--siblings-follow更从库指向新的主数据库
-bash-4.2$ repmgr standby switchover -f /data/node1.conf --siblings-follow
NOTICE: executing switchover on node "node1" (ID: 1)
NOTICE: attempting to pause repmgrd on 3 nodes
NOTICE: local node "node1" (ID: 1) will be promoted to primary; current primary "node3" (ID: 3) will be demoted to standby
NOTICE: stopping current primary node "node3" (ID: 3)
NOTICE: issuing CHECKPOINT on node "node3" (ID: 3)
DETAIL: executing server command "/usr/local/postgresql/bin/pg_ctl -D '/data/n003' -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/8000028
NOTICE: promoting standby to primary
DETAIL: promoting server "node1" (ID: 1) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node1" (ID: 1) was successfully promoted to primary
NOTICE: node "node1" (ID: 1) promoted to primary, node "node3" (ID: 3) demoted to standby
NOTICE: executing STANDBY FOLLOW on 1 of 1 siblings
INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes
NOTICE: switchover was successful
DETAIL: node "node1" is now primary and node "node3" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully
-bash-4.2$ repmgr -f /data/node2.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 4 | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | standby | running | node1 | default | 100 | 3 | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2
3 | node3 | standby | running | node1 | default | 100 | 3 | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2
07.failover(Promoting a standby server with repmgr)
#检查数据库状态
repmgr -f /data/node1.conf cluster show
-bash-4.2$ repmgr -f /data/node1.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 4 | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | standby | running | node1 | default | 100 | 3 | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2
3 | node3 | standby | running | node1 | default | 100 | 3 | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2
#停止主数据库
-bash-4.2$ pg_ctl -D /data/n001/ stop
waiting for server to shut down..... done
server stopped
-bash-4.2$ repmgr -f /data/node2.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+---------------+----------+----------+----------+----------+----------------------------------------------------------------------
1 | node1 | primary | ? unreachable | ? | default | 100 | | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | standby | running | ? node1 | default | 100 | 4 | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2
3 | node3 | standby | running | ? node1 | default | 100 | 4 | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
- unable to connect to node "node1" (ID: 1)
- node "node1" (ID: 1) is registered as an active primary but is unreachable
- unable to connect to node "node2" (ID: 2)'s upstream node "node1" (ID: 1)
- unable to determine if node "node2" (ID: 2) is attached to its upstream node "node1" (ID: 1)
- unable to connect to node "node3" (ID: 3)'s upstream node "node1" (ID: 1)
- unable to determine if node "node3" (ID: 3) is attached to its upstream node "node1" (ID: 1)
HINT: execute with --verbose option to see connection error messages+
# node2 提升为主数据库
-bash-4.2$ repmgr -f /data/node2.conf standby promote
WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified
DETAIL: these nodes will remain attached to the current primary:
node3 (node ID: 3)
NOTICE: promoting standby to primary
DETAIL: promoting server "node2" (ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node2" (ID: 2) was successfully promoted to primary
-bash-4.2$ repmgr -f /data/node2.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------
1 | node1 | primary | - failed | ? | default | 100 | | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | primary | * running | | default | 100 | 5 | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2
3 | node3 | standby | running | ? node1 | default | 100 | 4 | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
- unable to connect to node "node1" (ID: 1)
- unable to connect to node "node3" (ID: 3)'s upstream node "node1" (ID: 1)
- unable to determine if node "node3" (ID: 3) is attached to its upstream node "node1" (ID: 1)
HINT: execute with --verbose option to see connection error messages
#node3 更改主库为node2
-bash-4.2$ repmgr -f /data/node3.conf standby follow
NOTICE: attempting to find and follow current primary
INFO: local node 3 can attach to follow target node 2
DETAIL: local node's recovery point: 0/90000A0; follow target node's fork point: 0/90000A0
NOTICE: setting node 3's upstream to node 2
WARNING: node "node3" attached in state "startup"
NOTICE: STANDBY FOLLOW successful
DETAIL: standby attached to upstream node "node2" (ID: 2)
-bash-4.2$ repmgr -f /data/node2.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------
1 | node1 | primary | - failed | ? | default | 100 | | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | primary | * running | | default | 100 | 5 | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2
3 | node3 | standby | running | node2 | default | 100 | 4 | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
- unable to connect to node "node1" (ID: 1)
HINT: execute with --verbose option to see connection error messages
#修复node1 ,重新加入集群
#pg_rewind requires that either wal_log_hints is enabled。
repmgr node rejoin -f /etc/repmgr.conf -d 'host=node1 dbname=repmgr user=repmgr' --force-rewind --verbose --dry-run
-bash-4.2$ repmgr -f /data/node1.conf node rejoin -d 'host=127.0.0.1 port=5433 user=repmgr dbname=repmgr'
NOTICE: rejoin target is node "node2" (ID: 2)
INFO: local node 1 can attach to rejoin target node 2
DETAIL: local node's recovery point: 0/9000028; rejoin target node's fork point: 0/90000A0
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "/usr/local/postgresql/bin/pg_ctl -w -D '/data/n001' start"
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2
-bash-4.2$ repmgr -f /data/node2.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------
1 | node1 | standby | running | node2 | default | 100 | 4 | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | primary | * running | | default | 100 | 5 | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2
3 | node3 | standby | running | node2 | default | 100 | 5 | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2
点击加载更多