文章 > PostgreSQL > PostgreSQL高可用之repmgr

PostgreSQL高可用之repmgr

XuYuchong · 2021-10-28 1174 PostgreSQL
分享 收藏

软件版本:

PostgreSQL 13.4

repmgr 5.3

数据库安装规划:

node_id

node_name

port

data_directory


1

node1

5432

/data/n001

primary

2

node2

5433

/data/n002

standby

3

node3

5434

/data/n003

standby

01,初始化数据库,并修改数据库参数,启动数据库

#初始化

initdb -D /data/n001

The files belonging to this database system will be owned by user "postgres".

This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".

The default database encoding has accordingly been set to "UTF8".

The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /data/n001 ... ok

creating subdirectories ... ok

selecting dynamic shared memory implementation ... posix

selecting default max_connections ... 100

selecting default shared_buffers ... 128MB

selecting default time zone ... Asia/Shanghai

creating configuration files ... ok

running bootstrap script ... ok

performing post-bootstrap initialization ... ok

syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections

You can change this by editing pg_hba.conf or using the option -A, or

--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    pg_ctl -D /data/n001 -l logfile start

#参数修改

wal_keep_size=10GB

max_wal_size = 10GB

min_wal_size = 8000MB

max_wal_senders = 10

max_replication_slots = 10

hot_standby = on

archive_mode = on

archive_command = '/bin/true'

shared_preload_libraries ='repmgr'

wal_log_hints=on

show wal_keep_size;

show max_wal_size;

show min_wal_size;

show max_wal_senders;

show max_replication_slots;

show hot_standby;

show archive_mode;

show archive_command;

show shared_preload_libraries;

show wal_log_hints;

#启动数据库

-bash-4.2$ pg_ctl -D /data/n001/ start

waiting for server to start....2021-10-28 10:58:44.526 CST [31597] LOG:  starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit

2021-10-28 10:58:44.526 CST [31597] LOG:  listening on IPv6 address "::1", port 5432

2021-10-28 10:58:44.526 CST [31597] LOG:  listening on IPv4 address "127.0.0.1", port 5432

2021-10-28 10:58:44.547 CST [31597] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"

2021-10-28 10:58:44.571 CST [31604] LOG:  database system was shut down at 2021-10-28 10:52:48 CST

2021-10-28 10:58:44.602 CST [31597] LOG:  database system is ready to accept connections

done

server started

02,初始化repmgr数据库和用户

#Create a dedicated PostgreSQL superuser account and a database for the repmgr metadat.

createuser -s repmgr

createdb repmgr -O repmgr

03,编辑repmgr配置文件

::::::::::::::

node1.conf

::::::::::::::

node_id=1

node_name='node1'

conninfo='host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2'

data_directory='/data/n001'

pg_bindir='/usr/local/postgresql/bin'

::::::::::::::

node2.conf

::::::::::::::

node_id=2

node_name='node2'

conninfo='host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2'

data_directory='/data/n002'

pg_bindir='/usr/local/postgresql/bin'

::::::::::::::

node3.conf

::::::::::::::

node_id=3

node_name='node3'

conninfo='host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2'

data_directory='/data/n003'

pg_bindir='/usr/local/postgresql/bin'

04,注册node01主数据库

-bash-4.2$ repmgr -f /data/node1.conf  primary register

INFO: connecting to primary database...

NOTICE: attempting to install extension "repmgr"

NOTICE: "repmgr" extension successfully installed

NOTICE: primary node record (ID: 1) registered

-bash-4.2$ repmgr -f /data/node1.conf  cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------

1  | node1 | primary | * running |          | default  | 100      | 1        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

05,克隆node02,node03备数据库

克隆命令:

repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node2.conf standby clone --dry-run

repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node2.conf standby clone

repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node3.conf standby clone --dry-run

repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node3.conf standby clone

克隆日志:

-bash-4.2$ repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node2.conf standby clone --dry-run

NOTICE: destination directory "/data/n002" provided

INFO: connecting to source node

DETAIL: connection string is: host=127.0.0.1 user=repmgr dbname=repmgr

DETAIL: current installation size is 31 MB

INFO: "repmgr" extension is installed in database "repmgr"

INFO: replication slot usage not requested;  no replication slot will be set up for this standby

INFO: parameter "max_wal_senders" set to 10

NOTICE: checking for available walsenders on the source node (2 required)

INFO: sufficient walsenders available on the source node

DETAIL: 2 required, 10 available

NOTICE: checking replication connections can be made to the source server (2 required)

INFO: required number of replication connections could be made to the source server

DETAIL: 2 replication connections required

NOTICE: standby will attach to upstream node 1

HINT: consider using the -c/--fast-checkpoint option

INFO: would execute:

  /usr/local/postgresql/bin/pg_basebackup -l "repmgr base backup"  -D /data/n002 -h 127.0.0.1 -p 5432 -U repmgr -X stream

INFO: all prerequisites for "standby clone" are met

-bash-4.2$ repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node2.conf standby clone

NOTICE: destination directory "/data/n002" provided

INFO: connecting to source node

DETAIL: connection string is: host=127.0.0.1 user=repmgr dbname=repmgr

DETAIL: current installation size is 31 MB

INFO: replication slot usage not requested;  no replication slot will be set up for this standby

NOTICE: checking for available walsenders on the source node (2 required)

NOTICE: checking replication connections can be made to the source server (2 required)

INFO: creating directory "/data/n002"...

NOTICE: starting backup (using pg_basebackup)...

HINT: this may take some time; consider using the -c/--fast-checkpoint option

INFO: executing:

  /usr/local/postgresql/bin/pg_basebackup -l "repmgr base backup"  -D /data/n002 -h 127.0.0.1 -p 5432 -U repmgr -X stream

NOTICE: standby clone (using pg_basebackup) complete

NOTICE: you can now start your PostgreSQL server

HINT: for example: pg_ctl -D /data/n002 start

HINT: after starting the server, you need to register this standby with "repmgr standby register"

修改node02,node03端口

sed -i "s/#port = 5432/port = 5433/g" /data/n002/postgresql.conf

sed -i "s/#port = 5432/port = 5434/g" /data/n003/postgresql.conf

-bash-4.2$ cat /data/n002/postgresql.conf |grep port

port = 5433                             # (change requires restart)

#ssl_passphrase_command_supports_reload = off

                                        # supported by the operating system:

                                        # supported by the operating system:

                                        # supported by the operating system:

                                        #   %r = remote host and port

-bash-4.2$ cat /data/n003/postgresql.conf |grep port

port = 5434                             # (change requires restart)

#ssl_passphrase_command_supports_reload = off

                                        # supported by the operating system:

                                        # supported by the operating system:

                                        # supported by the operating system:

                                        #   %r = remote host and port

启动数据库并注册:

pg_ctl -D /data/n002 start

pg_ctl -D /data/n003 start

repmgr -f /data/node2.conf standby register

repmgr -f /data/node3.conf standby register

-bash-4.2$ pg_ctl -D /data/n002 start

waiting for server to start....2021-10-28 11:13:27.211 CST [5683] LOG:  starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit

2021-10-28 11:13:27.212 CST [5683] LOG:  listening on IPv6 address "::1", port 5433

2021-10-28 11:13:27.212 CST [5683] LOG:  listening on IPv4 address "127.0.0.1", port 5433

2021-10-28 11:13:27.238 CST [5683] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5433"

2021-10-28 11:13:27.271 CST [5684] LOG:  database system was interrupted; last known up at 2021-10-28 11:09:14 CST

..2021-10-28 11:13:29.396 CST [5684] LOG:  entering standby mode

2021-10-28 11:13:29.418 CST [5684] LOG:  redo starts at 0/2000028

2021-10-28 11:13:29.440 CST [5684] LOG:  consistent recovery state reached at 0/2000138

2021-10-28 11:13:29.440 CST [5683] LOG:  database system is ready to accept read only connections

2021-10-28 11:13:29.482 CST [5718] LOG:  started streaming WAL from primary at 0/3000000 on timeline 1

done

server started

-bash-4.2$ pg_ctl -D /data/n003 start

waiting for server to start....2021-10-28 11:13:33.327 CST [5734] LOG:  starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit

2021-10-28 11:13:33.328 CST [5734] LOG:  listening on IPv6 address "::1", port 5434

2021-10-28 11:13:33.328 CST [5734] LOG:  listening on IPv4 address "127.0.0.1", port 5434

2021-10-28 11:13:33.350 CST [5734] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5434"

2021-10-28 11:13:33.397 CST [5736] LOG:  database system was interrupted; last known up at 2021-10-28 11:11:11 CST

.2021-10-28 11:13:35.226 CST [5736] LOG:  entering standby mode

.2021-10-28 11:13:35.258 CST [5736] LOG:  redo starts at 0/4000028

2021-10-28 11:13:35.271 CST [5736] LOG:  consistent recovery state reached at 0/4000100

2021-10-28 11:13:35.271 CST [5734] LOG:  database system is ready to accept read only connections

2021-10-28 11:13:35.285 CST [5764] LOG:  started streaming WAL from primary at 0/5000000 on timeline 1

done

server started

-bash-4.2$ repmgr -f /data/node2.conf standby register

INFO: connecting to local node "node2" (ID: 2)

INFO: connecting to primary database

WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)

INFO: standby registration complete

NOTICE: standby node "node2" (ID: 2) successfully registered

-bash-4.2$ repmgr -f /data/node3.conf standby register

INFO: connecting to local node "node3" (ID: 3)

INFO: connecting to primary database

WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)

INFO: standby registration complete

NOTICE: standby node "node3" (ID: 3) successfully registered

-bash-4.2$ repmgr -f /data/node3.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | primary | * running |          | default  | 100      | 1        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

2  | node2 | standby |   running | node1    | default  | 100      | 1        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node1    | default  | 100      | 1        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

06,switchover with repmgr

#切换执行需要配置ssh等效性。(SSH

#在standby上执行,切换node2为primary

#check:

repmgr standby switchover -f /data/node2.conf --siblings-follow --dry-run

#执行切换:

repmgr standby switchover -f /data/node2.conf

#备机指向新的primary:

repmgr  -f /data/node3.conf standby follow

#切换主数据库为node2:

-bash-4.2$ repmgr standby switchover -f /data/node2.conf --siblings-follow --dry-run

NOTICE: checking switchover on node "node2" (ID: 2) in --dry-run mode

INFO: SSH connection to host "127.0.0.1" succeeded

INFO: able to execute "repmgr" on remote host "127.0.0.1"

INFO: all sibling nodes are reachable via SSH

INFO: 2 walsenders required, 10 available

INFO: demotion candidate is able to make replication connection to promotion candidate

INFO: 0 pending archive files

INFO: replication lag on this standby is 0 seconds

NOTICE: attempting to pause repmgrd on 3 nodes

NOTICE: local node "node2" (ID: 2) would be promoted to primary; current primary "node1" (ID: 1) would be demoted to standby

INFO: following shutdown command would be run on node "node1":

  "/usr/local/postgresql/bin/pg_ctl  -D '/data/n001' -W -m fast stop"

INFO: parameter "shutdown_check_timeout" is set to 60 seconds

INFO: prerequisites for executing STANDBY SWITCHOVER are met

-bash-4.2$ repmgr standby switchover -f /data/node2.conf

NOTICE: executing switchover on node "node2" (ID: 2)

WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified

DETAIL: these nodes will remain attached to the current primary:

  node3 (node ID: 3)

NOTICE: attempting to pause repmgrd on 3 nodes

NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby

NOTICE: stopping current primary node "node1" (ID: 1)

NOTICE: issuing CHECKPOINT on node "node1" (ID: 1)

DETAIL: executing server command "/usr/local/postgresql/bin/pg_ctl  -D '/data/n001' -W -m fast stop"

INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")

NOTICE: current primary has been cleanly shut down at location 0/6000028

NOTICE: promoting standby to primary

DETAIL: promoting server "node2" (ID: 2) using pg_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "node2" (ID: 2) was successfully promoted to primary

NOTICE: node "node2" (ID: 2) promoted to primary, node "node1" (ID: 1) demoted to standby

NOTICE: switchover was successful

DETAIL: node "node2" is now primary and node "node1" is attached as standby

NOTICE: STANDBY SWITCHOVER has completed successfully

-bash-4.2$ repmgr standby switchover -f /data/node2.conf

NOTICE: executing switchover on node "node2" (ID: 2)

WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified

DETAIL: these nodes will remain attached to the current primary:

  node3 (node ID: 3)

NOTICE: attempting to pause repmgrd on 3 nodes

NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby

NOTICE: stopping current primary node "node1" (ID: 1)

NOTICE: issuing CHECKPOINT on node "node1" (ID: 1)

DETAIL: executing server command "/usr/local/postgresql/bin/pg_ctl  -D '/data/n001' -W -m fast stop"

INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")

NOTICE: current primary has been cleanly shut down at location 0/6000028

NOTICE: promoting standby to primary

DETAIL: promoting server "node2" (ID: 2) using pg_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "node2" (ID: 2) was successfully promoted to primary

NOTICE: node "node2" (ID: 2) promoted to primary, node "node1" (ID: 1) demoted to standby

NOTICE: switchover was successful

DETAIL: node "node2" is now primary and node "node1" is attached as standby

NOTICE: STANDBY SWITCHOVER has completed successfully

#可以发现node2已经变为primary

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node2    | default  | 100      | 1        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

2  | node2 | primary | * running |          | default  | 100      | 2        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node1    | default  | 100      | 1        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

#切换node3从node2同步数据:执行新的primary

-bash-4.2$ cat n003/postgresql.auto.conf

# Do not edit this file manually!

# It will be overwritten by the ALTER SYSTEM command.

primary_conninfo = 'host=127.0.0.1 user=repmgr application_name=node3 connect_timeout=2'

-bash-4.2$ repmgr  -f /data/node3.conf standby follow

NOTICE: attempting to find and follow current primary

INFO: timelines are same, this server is not ahead

DETAIL: local node lsn is 0/6000A28, follow target lsn is 0/6000A28

NOTICE: setting node 3's upstream to node 2

WARNING: node "node3" not found in "pg_stat_replication"

NOTICE: STANDBY FOLLOW successful

DETAIL: standby attached to upstream node "node2" (ID: 2)

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node2    | default  | 100      | 1        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

2  | node2 | primary | * running |          | default  | 100      | 2        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node2    | default  | 100      | 2        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

#改名了修改了primary_conninfo信息,指向新的primary数据库。

-bash-4.2$ cat n003/postgresql.auto.conf

# Do not edit this file manually!

# It will be overwritten by the ALTER SYSTEM command.

primary_conninfo = 'user=repmgr connect_timeout=2 host=127.0.0.1 port=5433 application_name=node3'

#切换主数据库为node3:

#check:

repmgr standby switchover -f /data/node3.conf --siblings-follow --dry-run

#执行切换:

repmgr standby switchover -f /data/node3.conf

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node2    | default  | 100      | 2        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

2  | node2 | primary | * running |          | default  | 100      | 2        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node2    | default  | 100      | 2        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

-bash-4.2$ repmgr standby switchover -f /data/node3.conf --siblings-follow --dry-run

NOTICE: checking switchover on node "node3" (ID: 3) in --dry-run mode

INFO: SSH connection to host "127.0.0.1" succeeded

INFO: able to execute "repmgr" on remote host "127.0.0.1"

INFO: all sibling nodes are reachable via SSH

INFO: 2 walsenders required, 10 available

INFO: demotion candidate is able to make replication connection to promotion candidate

INFO: 0 pending archive files

INFO: replication lag on this standby is 0 seconds

NOTICE: attempting to pause repmgrd on 3 nodes

NOTICE: local node "node3" (ID: 3) would be promoted to primary; current primary "node2" (ID: 2) would be demoted to standby

INFO: following shutdown command would be run on node "node2":

  "/usr/local/postgresql/bin/pg_ctl  -D '/data/n002' -W -m fast stop"

INFO: parameter "shutdown_check_timeout" is set to 60 seconds

INFO: prerequisites for executing STANDBY SWITCHOVER are met

-bash-4.2$ repmgr standby switchover -f /data/node3.conf

NOTICE: executing switchover on node "node3" (ID: 3)

WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified

DETAIL: these nodes will remain attached to the current primary:

  node1 (node ID: 1)

NOTICE: attempting to pause repmgrd on 3 nodes

NOTICE: local node "node3" (ID: 3) will be promoted to primary; current primary "node2" (ID: 2) will be demoted to standby

NOTICE: stopping current primary node "node2" (ID: 2)

NOTICE: issuing CHECKPOINT on node "node2" (ID: 2)

DETAIL: executing server command "/usr/local/postgresql/bin/pg_ctl  -D '/data/n002' -W -m fast stop"

INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")

NOTICE: current primary has been cleanly shut down at location 0/7000028

NOTICE: promoting standby to primary

DETAIL: promoting server "node3" (ID: 3) using pg_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "node3" (ID: 3) was successfully promoted to primary

NOTICE: node "node3" (ID: 3) promoted to primary, node "node2" (ID: 2) demoted to standby

NOTICE: switchover was successful

DETAIL: node "node3" is now primary and node "node2" is attached as standby

NOTICE: STANDBY SWITCHOVER has completed successfully

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node2    | default  | 100      | 2        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

2  | node2 | standby |   running | node3    | default  | 100      | 2        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | primary | * running |          | default  | 100      | 3        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

#切换主数据库为node1:

#candidate的数据库不能直接切换为主数据库:

repmgr  -f /data/node1.conf standby follow

#check:

repmgr standby switchover -f /data/node1.conf --siblings-follow --dry-run

#执行切换:

repmgr standby switchover -f /data/node1.conf

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node2    | default  | 100      | 2        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

2  | node2 | standby |   running | node3    | default  | 100      | 2        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | primary | * running |          | default  | 100      | 3        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

#node1为级联备库,不能直接提升为主数据库。

-bash-4.2$ repmgr standby switchover -f /data/node1.conf --siblings-follow --dry-run

NOTICE: checking switchover on node "node1" (ID: 1) in --dry-run mode

ERROR: local node "node1" (ID: 1) is not a downstream of demotion candidate primary "node3" (ID: 3)

DETAIL: registered upstream node ID is 2

HINT: execute "repmgr standby register --force" to update the local node's metadata

#修改node1为node3的从库才可以正常切换为主数据库。

-bash-4.2$ repmgr  -f /data/node1.conf standby follow

NOTICE: attempting to find and follow current primary

INFO: timelines are same, this server is not ahead

DETAIL: local node lsn is 0/7000E30, follow target lsn is 0/7000E30

NOTICE: setting node 1's upstream to node 3

WARNING: node "node1" not found in "pg_stat_replication"

NOTICE: STANDBY FOLLOW successful

DETAIL: standby attached to upstream node "node3" (ID: 3)

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node3    | default  | 100      | 3        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2      

2  | node2 | standby |   running | node3    | default  | 100      | 3        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | primary | * running |          | default  | 100      | 3        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

-bash-4.2$ repmgr standby switchover -f /data/node1.conf --siblings-follow --dry-run

NOTICE: checking switchover on node "node1" (ID: 1) in --dry-run mode

ERROR: local node "node1" (ID: 1) is not a downstream of demotion candidate primary "node3" (ID: 3)

DETAIL: registered upstream node ID is 2

HINT: execute "repmgr standby register --force" to update the local node's metadata

-bash-4.2$ repmgr  -f /data/node1.conf standby follow

NOTICE: attempting to find and follow current primary

INFO: timelines are same, this server is not ahead

DETAIL: local node lsn is 0/7000E30, follow target lsn is 0/7000E30

NOTICE: setting node 1's upstream to node 3

WARNING: node "node1" not found in "pg_stat_replication"

NOTICE: STANDBY FOLLOW successful

DETAIL: standby attached to upstream node "node3" (ID: 3)

#--siblings-follow更从库指向新的主数据库

-bash-4.2$ repmgr standby switchover -f /data/node1.conf --siblings-follow

NOTICE: executing switchover on node "node1" (ID: 1)

NOTICE: attempting to pause repmgrd on 3 nodes

NOTICE: local node "node1" (ID: 1) will be promoted to primary; current primary "node3" (ID: 3) will be demoted to standby

NOTICE: stopping current primary node "node3" (ID: 3)

NOTICE: issuing CHECKPOINT on node "node3" (ID: 3)

DETAIL: executing server command "/usr/local/postgresql/bin/pg_ctl  -D '/data/n003' -W -m fast stop"

INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")

NOTICE: current primary has been cleanly shut down at location 0/8000028

NOTICE: promoting standby to primary

DETAIL: promoting server "node1" (ID: 1) using pg_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "node1" (ID: 1) was successfully promoted to primary

NOTICE: node "node1" (ID: 1) promoted to primary, node "node3" (ID: 3) demoted to standby

NOTICE: executing STANDBY FOLLOW on 1 of 1 siblings

INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes

NOTICE: switchover was successful

DETAIL: node "node1" is now primary and node "node3" is attached as standby

NOTICE: STANDBY SWITCHOVER has completed successfully

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | primary | * running |          | default  | 100      | 4        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2      

2  | node2 | standby |   running | node1    | default  | 100      | 3        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node1    | default  | 100      | 3        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

07.failover(Promoting  a standby server with repmgr)

#检查数据库状态

repmgr  -f /data/node1.conf cluster show

-bash-4.2$ repmgr  -f /data/node1.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | primary | * running |          | default  | 100      | 4        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2      

2  | node2 | standby |   running | node1    | default  | 100      | 3        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node1    | default  | 100      | 3        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

#停止主数据库

-bash-4.2$ pg_ctl -D /data/n001/ stop

waiting for server to shut down..... done

server stopped

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status        | Upstream | Location | Priority | Timeline | Connection string                                           

----+-------+---------+---------------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | primary | ? unreachable | ?        | default  | 100      |          | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2  

2  | node2 | standby |   running     | ? node1  | default  | 100      | 4        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running     | ? node1  | default  | 100      | 4        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected

  - unable to connect to node "node1" (ID: 1)

  - node "node1" (ID: 1) is registered as an active primary but is unreachable

  - unable to connect to node "node2" (ID: 2)'s upstream node "node1" (ID: 1)

  - unable to determine if node "node2" (ID: 2) is attached to its upstream node "node1" (ID: 1)

  - unable to connect to node "node3" (ID: 3)'s upstream node "node1" (ID: 1)

  - unable to determine if node "node3" (ID: 3) is attached to its upstream node "node1" (ID: 1)

HINT: execute with --verbose option to see connection error messages+

# node2 提升为主数据库

-bash-4.2$ repmgr  -f /data/node2.conf standby promote

WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified

DETAIL: these nodes will remain attached to the current primary:

  node3 (node ID: 3)

NOTICE: promoting standby to primary

DETAIL: promoting server "node2" (ID: 2) using pg_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "node2" (ID: 2) was successfully promoted to primary

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | primary | - failed  | ?        | default  | 100      |          | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2      

2  | node2 | primary | * running |          | default  | 100      | 5        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | ? node1  | default  | 100      | 4        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected

  - unable to connect to node "node1" (ID: 1)

  - unable to connect to node "node3" (ID: 3)'s upstream node "node1" (ID: 1)

  - unable to determine if node "node3" (ID: 3) is attached to its upstream node "node1" (ID: 1)

HINT: execute with --verbose option to see connection error messages

#node3 更改主库为node2

-bash-4.2$ repmgr  -f /data/node3.conf standby follow

NOTICE: attempting to find and follow current primary

INFO: local node 3 can attach to follow target node 2

DETAIL: local node's recovery point: 0/90000A0; follow target node's fork point: 0/90000A0

NOTICE: setting node 3's upstream to node 2

WARNING: node "node3" attached in state "startup"

NOTICE: STANDBY FOLLOW successful

DETAIL: standby attached to upstream node "node2" (ID: 2)

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | primary | - failed  | ?        | default  | 100      |          | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2      

2  | node2 | primary | * running |          | default  | 100      | 5        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node2    | default  | 100      | 4        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected

  - unable to connect to node "node1" (ID: 1)

HINT: execute with --verbose option to see connection error messages

#修复node1 ,重新加入集群

#pg_rewind requires that either wal_log_hints is enabled。

repmgr node rejoin -f /etc/repmgr.conf -d 'host=node1 dbname=repmgr user=repmgr' --force-rewind --verbose --dry-run

-bash-4.2$ repmgr -f /data/node1.conf node rejoin -d 'host=127.0.0.1 port=5433 user=repmgr dbname=repmgr'

NOTICE: rejoin target is node "node2" (ID: 2)

INFO: local node 1 can attach to rejoin target node 2

DETAIL: local node's recovery point: 0/9000028; rejoin target node's fork point: 0/90000A0

NOTICE: setting node 1's upstream to node 2

WARNING: unable to ping "host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2"

DETAIL: PQping() returned "PQPING_NO_RESPONSE"

NOTICE: starting server using "/usr/local/postgresql/bin/pg_ctl  -w -D '/data/n001' start"

NOTICE: NODE REJOIN successful

DETAIL: node 1 is now attached to node 2

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node2    | default  | 100      | 4        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2      

2  | node2 | primary | * running |          | default  | 100      | 5        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node2    | default  | 100      | 5        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2


点击加载更多