此篇文章仅用来对 corosync + pacemaker 的服务资源化管理有一个初步认识，后面如果真的要用上它再深入研究咯。。虽然它很强大，但是八成还是碰不到的~~

使用 corosync + pacemaker 配置集群的前提有如下：

各节点时间同步；
各节点可基于当前正在使用的主机名互相访问；
确定是否会用到仲裁设备；

我这里规划如下两台主机：

主机名	IP	OS
node1	10.0.1.201	CentOS 7
node2	10.0.1.202	CentOS 7

注意，要修改两台主机的 hosts 文件如下：

$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.0.1.201      node1
10.0.1.202      node2

使用 PCS 安装并启动集群

下面操作仅适用于 CentOS 7。

安装启动 pcsd

pcsd 是各节点上的守护进程，pcs 客户端工具通过这个守护进程与各节点完成通信。
下面的操作在两个主机上都需要执行。

1、安装 PCS：

$ yum -y install pcs

2、启动 pcsd 服务：

$ systemctl start pcsd && systemctl enable pcsd

3、给 hacluster 用户设置上密码：

$ echo 123 | passwd hacluster --stdin

配置 corosync

corosync 的配置可选择任意一个主机进行配置，我这里就选择 node1 了。

1、认证节点：

$ pcs cluster auth node1 node2 -u hacluster -p 123
node1: Authorized
node2: Authorized

如果出现错误：Error: Unable to communicate with node1，则需要在两台主机上执行 yum update -y nss curl libcurl 以更新 SSL 相关组件。

2、设定集群：

# testcluster 是自定义的集群名称，node1 node2 是这个集群中的节点。
$ pcs cluster setup --name testcluster node1 node2
Destroying cluster on nodes: node1, node2...
node1: Stopping Cluster (pacemaker)...
node2: Stopping Cluster (pacemaker)...
node1: Successfully destroyed cluster
node2: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'node1', 'node2'
node1: successful distribution of the file 'pacemaker_remote authkey'
node2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node1: Succeeded
node2: Succeeded

Synchronizing pcsd certificates on nodes node1, node2...
node1: Success
node2: Success
Restarting pcsd on the nodes in order to reload the certificates...
node1: Success
node2: Success

3、上述命令执行成功后会自动生成 corosync 的配置文件，文件路径为 /etc/corosync/corosync.conf，其内容及各项说明如下：

# totem 模块，定义各节点之间通信的公有配置
totem {
    # totem 模块的版本
    version: 2
    # 集群名称
    cluster_name: testcluster
    # 是否开启安全认证
    secauth: off
    # 传递协议
    transport: udpu
}
# 节点列表
nodelist {
    # 定义一个节点
    node {
        # 环 0 地址，消息可以直接送达的位置
        ring0_addr: node1
        # 标识 ID
        nodeid: 1
    }

    node {
        ring0_addr: node2
        nodeid: 2
    }
}
# 仲裁机制
quorum {
    # 提供者
    provider: corosync_votequorum
    # 当前集群是否是 2 个节点
    two_node: 1
}
# 日志配置
logging {
    # 是否记录到文件
    to_logfile: yes
    # 指定日志文件位置
    logfile: /var/log/cluster/corosync.log
    # 是否通过 rsyslog 记录日志
    to_syslog: yes
}

启动集群

1、在任一节点执行下面命令以启动集群：

$ pcs cluster start --all
node1: Starting Cluster (corosync)...
node2: Starting Cluster (corosync)...
node1: Starting Cluster (pacemaker)...
node2: Starting Cluster (pacemaker)...

2、检查各节点的通信状态：

# 在 node1 执行
$ corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
        id      = 10.0.1.201
        status  = ring 0 active with no faults

# 在 node2 执行
$ corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
        id      = 10.0.1.202
        status  = ring 0 active with no faults

3、检查集群成员关系及 Quorum API：

$ corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(10.0.1.201) 
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(10.0.1.202) 
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined

4、查看当前集群或资源状态：

$ pcs status
# 集群名称
Cluster name: testcluster
# 由于当前没有配置 stonith 设备但启用了 stonith 功能而发出的警告信息，stonith 设备用于给故障节点做隔离。
WARNINGS:
No stonith devices and stonith-enabled is not false
# 传递集群事务信息的协议栈（message layer）
Stack: corosync
# 指定的协调源，用于做全局集群事务决策
Current DC: node1 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum
# 集群上次更新时间
Last updated: Sun Apr 12 17:34:55 2020
# 集群上次改变时间
Last change: Sun Apr 12 17:28:32 2020 by hacluster via crmd on node1
# 已配置的节点数
2 nodes configured
# 已配置的资源数
0 resources configured
# 在线的节点列表
Online: [ node1 node2 ]
# 没有配置资源
No resources

# 守护进程状态
Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

5、由于我们这里没有 stonith 设备，所以可以通过如下命令修改全局配置关闭 stonith 功能：

$ pcs property set stonith-enabled=false
# 查看我们修改过的配置项
$ pcs property list
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: testcluster
 dc-version: 1.1.20-5.el7_7.2-3c4c782f70
 have-watchdog: false
 stonith-enabled: false

6、检查配置是否正常：

$ crm_verify -LV

使用 crmsh 来管理集群

安装及基本使用

crmsh 在默认 yum 源中没有提供，需要手动下载安装，关注文章首部微信公众号发送 #pcs 获取 crmsh 及其依赖的 rpm 包。

安装如下程序包：

$ ls
crmsh-3.0.0-6.2.noarch.rpm          pssh-2.3.1-7.7.noarch.rpm              python-pssh-2.3.1-7.7.noarch.rpm
crmsh-scripts-3.0.0-6.2.noarch.rpm  python-parallax-1.0.1-29.1.noarch.rpm
$ yum -y install *.rpm

crmsh 有两种使用模式，可以直接在 bash 命令提示符下使用，也可以执行 crm 进入 crmsh 交互式命令提示符下使用，看下面示例：

# 查看集群状态为例：
$ crm status
Stack: corosync
Current DC: node1 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum
Last updated: Sun Apr 12 18:03:22 2020
Last change: Sun Apr 12 17:46:05 2020 by root via cibadmin on node2

2 nodes configured
0 resources configured

Online: [ node1 node2 ]

No active resources

# 查看当前配置
$ crm 
crm(live)# configure
crm(live)configure# show
node 1: node1
node 2: node2
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=1.1.20-5.el7_7.2-3c4c782f70 \
        cluster-infrastructure=corosync \
        cluster-name=testcluster \
        stonith-enabled=false

高可用一个 Web 服务

1、这里就以高可用 Nginx 为例，在两个节点下安装配置 Nginx 并提供访问，效果如下：

$ curl 10.0.1.201
from node1
$ curl 10.0.1.202
from node2

2、由于要将 Nginx 服务作为资源交给资源管理器管理，所以我们需要关闭各节点上的 Web 服务：

$ systemctl stop nginx && systemctl disable nginx

3、查看 Nginx 当前是否可以被 systemd 做资源代理：

$ crm ra list systemd | grep nginx
nfslock                                           nginx                                             ntpdate

查看资源代理的所有种类可使用 crm ra classes。

4、配置主资源，这里的主资源主要有两个，一个是 VIP（我这里 VIP 就指定为 10.0.1.200），另一个是 Web 服务即 Nginx，我们先来配置一下 IP 资源：

$ crm configure 
# 配置一个 IP 资源
crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=10.0.1.200
# 校验配置是否无误
crm(live)configure# verify
# 提交配置
crm(live)configure# commit
# 回到 crmsh 根目录
crm(live)configure# cd
# 查看当前状态
crm(live)# status
Stack: corosync
Current DC: node1 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum
Last updated: Sun Apr 12 19:14:48 2020
Last change: Sun Apr 12 19:14:43 2020 by root via cibadmin on node1

2 nodes configured
1 resource configured

Online: [ node1 node2 ]

Full list of resources:
# webip 已经在 node1 上启动了
 webip  (ocf::heartbeat:IPaddr):        Started node1

crm(live)# quit
bye
# 可以看到 VIP 已经配置在 eth0 上了
$ ip a | grep 'inet 10.0'
    inet 10.0.1.201/24 brd 10.0.1.255 scope global noprefixroute eth0
    inet 10.0.1.200/24 brd 10.0.1.255 scope global secondary eth0

5、如果要让当前资源转移到另一个节点，只需要执行下面命令将当前节点转为备用模式：

$ crm node
# 执行 standby 将当前节点转为备用模式
crm(live)node# standby 
crm(live)node# cd
# 查看状态
crm(live)# status
Stack: corosync
Current DC: node1 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum
Last updated: Sun Apr 12 19:21:05 2020
Last change: Sun Apr 12 19:20:50 2020 by root via crm_attribute on node1

2 nodes configured
1 resource configured

Node node1: standby
Online: [ node2 ]

Full list of resources:
# 可以看到 VIP 已经迁移到 node2 上了
 webip  (ocf::heartbeat:IPaddr):        Started node2

6、当然，也可以手动让指定节点上线，比如让当前处于备用模式下的 node1 上线，执行下面命令即可：

$ crm node online 

$ crm status
Stack: corosync
Current DC: node1 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum
Last updated: Sun Apr 12 19:23:36 2020
Last change: Sun Apr 12 19:23:30 2020 by root via crm_attribute on node1

2 nodes configured
1 resource configured

Online: [ node1 node2 ]

Full list of resources:
# 资源并未从 node2 上迁移回 node1
 webip  (ocf::heartbeat:IPaddr):        Started node2

7、再来配置一下 Web 服务资源：

$ crm configure 
# 配置一个 Web 服务资源
crm(live)configure# primitive webserver systemd:nginx
crm(live)configure# verify
crm(live)configure# cd 
There are changes pending. Do you want to commit them (y/n)? y
crm(live)# status
Stack: corosync
Current DC: node1 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum
Last updated: Sun Apr 12 19:27:32 2020
Last change: Sun Apr 12 19:27:30 2020 by root via cibadmin on node1

2 nodes configured
2 resources configured

Online: [ node1 node2 ]

Full list of resources:
# 可以看到默认情况下资源是均衡分配的，IP 资源在 node2 上，Web 服务资源在 node1 上
 webip  (ocf::heartbeat:IPaddr):        Started node2
 webserver      (systemd:nginx):        Starting node1

8、我们显然是需要 VIP 和 Web 服务是在同一个节点上的，这种情况我们可以将 IP 资源和 Web 资源配置为组资源了，如下：

$ crm configure
# 将资源 webip 和 webserver 配置在 webservice 组中
crm(live)configure# group webservice webip webserver
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd
crm(live)# status
Stack: corosync
Current DC: node1 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum
Last updated: Sun Apr 12 19:32:00 2020
Last change: Sun Apr 12 19:31:57 2020 by root via cibadmin on node1

2 nodes configured
2 resources configured

Online: [ node1 node2 ]

Full list of resources:
# 此时会发现资源 webip 和 webserver 已经位于同一个节点上了
 Resource Group: webservice
     webip      (ocf::heartbeat:IPaddr):        Started node2
     webserver  (systemd:nginx):        Starting node2

9、测试访问一下 VIP：

# 此时资源在 node2 上，所以做响应的是 node2 节点
$ curl 10.0.1.200
from node2

10、如果想要访问到 node1 节点，将 node2 节点设为备用模式即可，在 node2 上执行下面命令：

# 切换 node2 节点为备用模式
$ crm node standby
# 再次测试访问 VIP，会发现资源已经在 node 1 上了
$ curl 10.0.1.200
from node1

在上面的测试过程中，我们发现将 node1 设为备用模式时资源会由 node1 迁移到 node2，但当 node1 上线后资源依旧会保留在 node2 上，如果我们想要给资源指定一个默认的节点，则需要设置资源对指定节点的粘性（倾向性）了。

参考：

https://blog.csdn.net/zerozhuxiaozhu/article/details/78128581

https://blog.csdn.net/AtlanSI/article/details/87926326

https://www.cnblogs.com/yue-hong/p/7988821.html

目录CONTENT

使用pcs+crmsh安装管理corosync+pacemaker以构建高可用集群