Time synchronization strategy for BIGDATA nodes
HBase and other BIGDATA management systems manage data constancy using timestamps. Therefore they are designed to be very sensitive to time synchronization across the nodes. Out of sync nodes usually result in region server isolation and shutdown, i.e.
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:284) at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2104) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:857) at java.lang.Thread.run(Thread.java:701)Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ClockOutOfSyncException): org.apache.hadoop.hbase.ClockOutOfSyncException: Server myclusterhbasemaster,60020,1428702750856 has been rejected; Reported time is too far out of sync with master. Time difference of 132440ms > max allowed of 30000ms at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:345) at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:238)
If enough region servers are lost, the remaining region servers including master are overwhelmed with reassigned regions which eventually causes catastrophic failure.
This discussion looks at an effective way of keeping your nodes time-synced.
The strategy is to choose a local master node as the time server. This ensures that the server is always reachable if the cluster becomes isolated from WAN timeservers. The other strategy is to ensure you have a fall back time-server(s) . NTP is the proper tool for this requirement.
Install NTP in master node (I named my master node hbasemaster):
>sudo apt-get install ntp
edit /etc/ntp.conf to broadcast your cluster master, like:
# Use servers from the NTP Pool Project. Approved by Ubuntu Technical Board# on 2011-02-08 (LP: #104525). See http://www.pool.ntp.org/join.html for# more information.server 0.ubuntu.pool.ntp.orgserver 1.ubuntu.pool.ntp.orgserver 2.ubuntu.pool.ntp.orgserver 3.ubuntu.pool.ntp.org
# Use Ubuntu's ntp server as a fallback.server ntp.ubuntu.com...# If you want to provide time to your local subnet, change the next line.# (Again, the address is an example only.)broadcast hbasemaster
Note master is outward looking to ubuntu time servers.
Restart service
>sudo service ntp restart
Verify that the service is running
>sudo ntpq -p
Results:
remote refid st t when poll reach delay offset jitter==============================================================================+time-b.timefreq .ACTS. 1 u 6 64 377 37.681 57.329 10.506*time-b.nist.gov .ACTS. 1 u 2 64 377 22.324 48.675 9.838-server1.nyc.she 209.51.161.238 2 u 4 64 377 22.409 33.617 19.516-ns1.oninit.net 104.131.51.97 3 u 4 64 377 33.764 59.079 12.793+golem.canonical 131.188.3.220 2 u 60 64 377 91.229 54.108 6.458 hbasemaster.s .BCST. 16 u - 64 0 0.000 0.000 0.000
Notice the cluster master is broadcasting.
Next install ntp in all your slave nodes as above and edit /etc/npt.conf to :
# Use servers from the NTP Pool Project. Approved by Ubuntu Technical Board# on 2011-02-08 (LP: #104525). See http://www.pool.ntp.org/join.html for# more information.#server 0.ubuntu.pool.ntp.org#server 1.ubuntu.pool.ntp.org#server 2.ubuntu.pool.ntp.org#server 3.ubuntu.pool.ntp.org
server hbasemaster# Use Ubuntu's ntp server as a fallback.server ntp.ubuntu.com...
Note that this node is looking for cluster master node and using ubuntu time server as a fallback. You can choose second master as fallback as well.
Restart service
>sudo service ntp restart
Verify that the service is running
>sudo ntpq -p
Results:
remote refid st t when poll reach delay offset jitter==============================================================================+ hbasemaster.s 129.6.15.29 2 u 14 64 377 0.195 40.549 25.898*juniperberry.ca 193.79.237.14 2 u 60 64 377 91.039 37.466 1.674
Note the master time server was found. Also look at the delay and jitter columns. The cluster master timeserver delay is very low in comparison to the ubuntu server. However the jitter is high. For my purposes I can live with the high jitter because HBase has a fault tolerance in the millisecond to second differences.
No comments:
Post a Comment