在本机环境搭建了一个简单的zookeeper集群

本机两台服务器,一个服务器上搭建4个zookeeper实例,一个服务器上搭建3个

zoo.cfg配置类似如下:

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/user/Tools/zookeeper/zookeeper
# the port at which the clients will connect
# 同一台机器上clientPort不能重复
clientPort=2180
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=10.211.55.18:2888:3888
server.2=10.211.55.18:2889:3889
server.3=10.211.55.18:2890:3890
server.4=10.211.55.18:2891:3891
server.5=10.211.55.12:2892:3892
server.6=10.211.55.12:2893:3893
server.7=10.211.55.12:2894:3894

顺序启动7个节点,server.7为master,其他为follower。

实验一 4个节点机器网络断开

更改4个节点机器的网络设置。4个节点正常服务

3个节点不能服务,server端日志如下:

2019-04-08 17:03:39,935 [myid:7] - INFO  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:Learner@332] - Getting a diff from the leader 0xb00000001
2019-04-08 17:22:18,959 [myid:7] - WARN  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:Follower@119] - Got zxid 0xc00000001 expected 0x1
2019-04-08 17:23:02,597 [myid:7] - WARN  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:Follower@90] - Exception when following the leader
java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
	at java.io.DataInputStream.readInt(DataInputStream.java:387)
	at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
	at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
	at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
	at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
	at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:86)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:981)
2019-04-08 17:23:02,622 [myid:7] - INFO  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:Follower@169] - shutdown called
java.lang.Exception: shutdown Follower
	at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:169)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:985)
2019-04-08 17:23:02,650 [myid:7] - INFO  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:FollowerZooKeeperServer@140] - Shutting down
2019-04-08 17:23:02,626 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 6 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), LOOKING (n.state), 6 (n.sid), 0xc (n.peerEpoch) FOLLOWING (my state)
2019-04-08 17:23:02,665 [myid:7] - INFO  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:ZooKeeperServer@501] - shutting down
2019-04-08 17:23:02,675 [myid:7] - INFO  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:FollowerRequestProcessor@107] - Shutting down
2019-04-08 17:23:02,692 [myid:7] - INFO  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:CommitProcessor@184] - Shutting down
2019-04-08 17:23:02,694 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 6 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), LOOKING (n.state), 5 (n.sid), 0xc (n.peerEpoch) FOLLOWING (my state)
2019-04-08 17:23:02,693 [myid:7] - INFO  [FollowerRequestProcessor:7:FollowerRequestProcessor@97] - FollowerRequestProcessor exited loop!
2019-04-08 17:23:02,696 [myid:7] - INFO  [CommitProcessor:7:CommitProcessor@153] - CommitProcessor exited loop!
2019-04-08 17:23:02,696 [myid:7] - INFO  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:FinalRequestProcessor@403] - shutdown of request processor complete
2019-04-08 17:23:02,730 [myid:7] - INFO  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:SyncRequestProcessor@208] - Shutting down
2019-04-08 17:23:02,731 [myid:7] - INFO  [SyncThread:7:SyncRequestProcessor@186] - SyncRequestProcessor exited!
2019-04-08 17:23:02,734 [myid:7] - INFO  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:QuorumPeer@909] - LOOKING
2019-04-08 17:23:02,779 [myid:7] - INFO  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:FastLeaderElection@813] - New election. My id =  7, proposed zxid=0xc00000001
2019-04-08 17:23:02,799 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), LOOKING (n.state), 7 (n.sid), 0xc (n.peerEpoch) LOOKING (my state)
2019-04-08 17:23:02,802 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), LOOKING (n.state), 5 (n.sid), 0xc (n.peerEpoch) LOOKING (my state)
2019-04-08 17:23:02,805 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), LOOKING (n.state), 6 (n.sid), 0xc (n.peerEpoch) LOOKING (my state)
2019-04-08 17:23:03,007 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), LOOKING (n.state), 6 (n.sid), 0xc (n.peerEpoch) LOOKING (my state)
2019-04-08 17:23:03,009 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), LOOKING (n.state), 5 (n.sid), 0xc (n.peerEpoch) LOOKING (my state)
2019-04-08 17:23:03,212 [myid:7] - INFO  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:FastLeaderElection@847] - Notification time out: 400
2019-04-08 17:23:03,212 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), LOOKING (n.state), 7 (n.sid), 0xc (n.peerEpoch) LOOKING (my state)
2019-04-08 17:23:03,615 [myid:7] - INFO  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:FastLeaderElection@847] - Notification time out: 800
2019-04-08 17:23:03,616 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), LOOKING (n.state), 7 (n.sid), 0xc (n.peerEpoch) LOOKING (my state)
2019-04-08 17:23:03,616 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), LOOKING (n.state), 5 (n.sid), 0xc (n.peerEpoch) LOOKING (my state)
2019-04-08 17:23:03,618 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), LOOKING (n.state), 6 (n.sid), 0xc (n.peerEpoch) LOOKING (my state)
2019-04-08 17:23:04,420 [myid:7] - INFO  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:FastLeaderElection@847] - Notification time out: 1600
2019-04-08 17:23:04,421 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), LOOKING (n.state), 6 (n.sid), 0xc (n.peerEpoch) LOOKING (my state)
2019-04-08 17:23:04,422 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), LOOKING (n.state), 5 (n.sid), 0xc (n.peerEpoch) LOOKING (my state)
2019-04-08 17:23:04,424 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), LOOKING (n.state), 7 (n.sid), 0xc (n.peerEpoch) LOOKING (my state)
2019-04-08 17:23:06,023 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), LOOKING (n.state), 6 (n.sid), 0xc (n.peerEpoch) LOOKING (my state)
2019-04-08 17:23:07,623 [myid:7] - INFO  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:FastLeaderElection@847] - Notification time out: 3200
2019-04-08 17:23:07,624 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), LOOKING (n.state), 5 (n.sid), 0xc (n.peerEpoch) LOOKING (my state)
2019-04-08 17:23:07,626 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), LOOKING (n.state), 7 (n.sid), 0xc (n.peerEpoch) LOOKING (my state)
2019-04-08 17:23:10,828 [myid:7] - INFO  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:FastLeaderElection@847] - Notification time out: 6400

client端报错

zk: 10.211.55.12:2181(CONNECTED) 3] 2019-04-08 17:23:02,589 [myid:] - INFO  [main-SendThread(10.211.55.12:2181):ClientCnxn$SendThread@1161] - Unable to read additional data from server sessionid 0x600005c14180000, likely server has closed socket, closing socket connection and attempting reconnect

WATCHER::

WatchedEvent state:Disconnected type:None path:null
2019-04-08 17:23:04,381 [myid:] - INFO  [main-SendThread(10.211.55.12:2181):ClientCnxn$SendThread@1028] - Opening socket connection to server 10.211.55.12/10.211.55.12:2181. Will not attempt to authenticate using SASL (unknown error)
2019-04-08 17:23:04,382 [myid:] - INFO  [main-SendThread(10.211.55.12:2181):ClientCnxn$SendThread@878] - Socket connection established to 10.211.55.12/10.211.55.12:2181, initiating session
2019-04-08 17:23:04,383 [myid:] - INFO  [main-SendThread(10.211.55.12:2181):ClientCnxn$SendThread@1161] - Unable to read additional data from server sessionid 0x600005c14180000, likely server has closed socket, closing socket connection and attempting reconnect

网络恢复后,3个节点恢复,服务端日志

2019-04-08 17:26:36,448 [myid:7] - INFO  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:QuorumPeer@991] - LEADING
2019-04-08 17:26:36,448 [myid:7] - INFO  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:ZooKeeperServer@173] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /home/zhk/Tools/zookeeper2/version-2 snapdir /home/zhk/Tools/zookeeper2/version-2
2019-04-08 17:26:36,448 [myid:7] - INFO  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:Leader@372] - LEADING - LEADER ELECTION TOOK - 213669
2019-04-08 17:26:37,407 [myid:7] - INFO  [LearnerHandler-/10.211.55.18:41572:LearnerHandler@346] - Follower sid: 2 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@e54df42
2019-04-08 17:26:37,408 [myid:7] - INFO  [LearnerHandler-/10.211.55.18:41574:LearnerHandler@346] - Follower sid: 3 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@7922966f
2019-04-08 17:26:37,410 [myid:7] - INFO  [LearnerHandler-/10.211.55.18:41576:LearnerHandler@346] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@9cfff21
2019-04-08 17:26:37,416 [myid:7] - INFO  [LearnerHandler-/10.211.55.12:58626:LearnerHandler@346] - Follower sid: 6 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@64fc06b2
2019-04-08 17:26:37,420 [myid:7] - INFO  [LearnerHandler-/10.211.55.18:41576:LearnerHandler@401] - Synchronizing with Follower sid: 1 maxCommittedLog=0xc00000001 minCommittedLog=0x500000001 peerLastZxid=0xc00000001
2019-04-08 17:26:37,420 [myid:7] - INFO  [LearnerHandler-/10.211.55.18:41572:LearnerHandler@401] - Synchronizing with Follower sid: 2 maxCommittedLog=0xc00000001 minCommittedLog=0x500000001 peerLastZxid=0xc00000001
2019-04-08 17:26:37,421 [myid:7] - INFO  [LearnerHandler-/10.211.55.18:41572:LearnerHandler@410] - leader and follower are in sync, zxid=0xc00000001
2019-04-08 17:26:37,421 [myid:7] - INFO  [LearnerHandler-/10.211.55.18:41572:LearnerHandler@475] - Sending DIFF
2019-04-08 17:26:37,420 [myid:7] - INFO  [LearnerHandler-/10.211.55.18:41574:LearnerHandler@401] - Synchronizing with Follower sid: 3 maxCommittedLog=0xc00000001 minCommittedLog=0x500000001 peerLastZxid=0xc00000001
2019-04-08 17:26:37,422 [myid:7] - INFO  [LearnerHandler-/10.211.55.18:41574:LearnerHandler@410] - leader and follower are in sync, zxid=0xc00000001
2019-04-08 17:26:37,422 [myid:7] - INFO  [LearnerHandler-/10.211.55.18:41574:LearnerHandler@475] - Sending DIFF
2019-04-08 17:26:37,421 [myid:7] - INFO  [LearnerHandler-/10.211.55.12:58628:LearnerHandler@346] - Follower sid: 5 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@6d8f3c9d
2019-04-08 17:26:37,421 [myid:7] - INFO  [LearnerHandler-/10.211.55.18:41576:LearnerHandler@410] - leader and follower are in sync, zxid=0xc00000001
2019-04-08 17:26:37,424 [myid:7] - INFO  [LearnerHandler-/10.211.55.18:41576:LearnerHandler@475] - Sending DIFF
2019-04-08 17:26:37,423 [myid:7] - INFO  [LearnerHandler-/10.211.55.12:58626:LearnerHandler@401] - Synchronizing with Follower sid: 6 maxCommittedLog=0xc00000001 minCommittedLog=0x500000001 peerLastZxid=0xc00000001
2019-04-08 17:26:37,422 [myid:7] - INFO  [LearnerHandler-/10.211.55.18:41578:LearnerHandler@346] - Follower sid: 4 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@2be19154
2019-04-08 17:26:37,424 [myid:7] - INFO  [LearnerHandler-/10.211.55.12:58626:LearnerHandler@410] - leader and follower are in sync, zxid=0xc00000001
2019-04-08 17:26:37,425 [myid:7] - INFO  [LearnerHandler-/10.211.55.12:58626:LearnerHandler@475] - Sending DIFF
2019-04-08 17:26:37,427 [myid:7] - INFO  [LearnerHandler-/10.211.55.18:41578:LearnerHandler@401] - Synchronizing with Follower sid: 4 maxCommittedLog=0xc00000001 minCommittedLog=0x500000001 peerLastZxid=0xc00000001
2019-04-08 17:26:37,428 [myid:7] - INFO  [LearnerHandler-/10.211.55.18:41578:LearnerHandler@410] - leader and follower are in sync, zxid=0xc00000001
2019-04-08 17:26:37,428 [myid:7] - INFO  [LearnerHandler-/10.211.55.18:41578:LearnerHandler@475] - Sending DIFF
2019-04-08 17:26:37,439 [myid:7] - INFO  [LearnerHandler-/10.211.55.12:58628:LearnerHandler@401] - Synchronizing with Follower sid: 5 maxCommittedLog=0xc00000001 minCommittedLog=0x500000001 peerLastZxid=0xc00000001
2019-04-08 17:26:37,450 [myid:7] - INFO  [LearnerHandler-/10.211.55.12:58628:LearnerHandler@410] - leader and follower are in sync, zxid=0xc00000001
2019-04-08 17:26:37,450 [myid:7] - INFO  [LearnerHandler-/10.211.55.12:58628:LearnerHandler@475] - Sending DIFF
2019-04-08 17:26:37,451 [myid:7] - INFO  [LearnerHandler-/10.211.55.12:58626:LearnerHandler@535] - Received NEWLEADER-ACK message from 6
2019-04-08 17:26:37,454 [myid:7] - INFO  [LearnerHandler-/10.211.55.18:41574:LearnerHandler@535] - Received NEWLEADER-ACK message from 3
2019-04-08 17:26:37,454 [myid:7] - INFO  [LearnerHandler-/10.211.55.18:41576:LearnerHandler@535] - Received NEWLEADER-ACK message from 1
2019-04-08 17:26:37,454 [myid:7] - INFO  [LearnerHandler-/10.211.55.18:41578:LearnerHandler@535] - Received NEWLEADER-ACK message from 4
2019-04-08 17:26:37,455 [myid:7] - INFO  [QuorumPeer[myid=7]/0:0:0:0:0:0:0:0:2182:Leader@962] - Have quorum of supporters, sids: [ 1,3,6,7 ]; starting up and setting last processed zxid: 0xd00000000
2019-04-08 17:26:37,454 [myid:7] - INFO  [LearnerHandler-/10.211.55.18:41572:LearnerHandler@535] - Received NEWLEADER-ACK message from 2
2019-04-08 17:26:37,461 [myid:7] - INFO  [LearnerHandler-/10.211.55.12:58628:LearnerHandler@535] - Received NEWLEADER-ACK message from 5
2019-04-08 17:26:44,369 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), FOLLOWING (n.state), 4 (n.sid), 0xd (n.peerEpoch) LEADING (my state)
2019-04-08 17:26:44,370 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), FOLLOWING (n.state), 3 (n.sid), 0xd (n.peerEpoch) LEADING (my state)
2019-04-08 17:26:44,373 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), FOLLOWING (n.state), 4 (n.sid), 0xd (n.peerEpoch) LEADING (my state)
2019-04-08 17:26:44,373 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), FOLLOWING (n.state), 3 (n.sid), 0xd (n.peerEpoch) LEADING (my state)
2019-04-08 17:26:44,374 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), FOLLOWING (n.state), 3 (n.sid), 0xd (n.peerEpoch) LEADING (my state)
2019-04-08 17:26:44,375 [myid:7] - INFO  [WorkerReceiver[myid=7]:FastLeaderElection@595] - Notification: 1 (message format version), 7 (n.leader), 0xc00000001 (n.zxid), 0xc (n.round), FOLLOWING (n.state), 3 (n.sid), 0xd (n.peerEpoch) LEADING (my state)