Friday, April 10, 2015

Synchronize time on BIGDATA nodes; a ubuntu ntp example

Time synchronization strategy for BIGDATA nodes


HBase and other BIGDATA management systems manage data constancy using timestamps. Therefore they are designed to be very sensitive to time synchronization across the nodes. Out of sync nodes usually result in region server isolation and shutdown, i.e.
        at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:284)        at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2104)        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:857)        at java.lang.Thread.run(Thread.java:701)Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ClockOutOfSyncException): org.apache.hadoop.hbase.ClockOutOfSyncException: Server myclusterhbasemaster,60020,1428702750856 has been rejected; Reported time is too far out of sync with master.  Time difference of 132440ms > max allowed of 30000ms        at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:345)        at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:238)

If enough region servers are lost, the remaining region servers including master are overwhelmed with reassigned regions which eventually causes catastrophic failure.

This discussion looks at an effective way of keeping your nodes time-synced. 

The strategy is to choose a local master node as the time server. This ensures that the server is always reachable if the cluster becomes isolated from WAN timeservers. The other strategy is to ensure you have a fall back time-server(s) . NTP is the proper tool for this requirement.

Install NTP in master node (I named my master node hbasemaster):
>sudo apt-get install ntp

 edit /etc/ntp.conf to broadcast your cluster master, like:
# Use servers from the NTP Pool Project. Approved by Ubuntu Technical Board# on 2011-02-08 (LP: #104525). See http://www.pool.ntp.org/join.html for# more information.server 0.ubuntu.pool.ntp.orgserver 1.ubuntu.pool.ntp.orgserver 2.ubuntu.pool.ntp.orgserver 3.ubuntu.pool.ntp.org
# Use Ubuntu's ntp server as a fallback.server ntp.ubuntu.com...# If you want to provide time to your local subnet, change the next line.# (Again, the address is an example only.)broadcast hbasemaster

Note master is outward looking to ubuntu time servers.
Restart service
>sudo service ntp restart

Verify that the service is running 
>sudo ntpq -p 
Results:
     remote           refid      st t when poll reach   delay   offset  jitter==============================================================================+time-b.timefreq .ACTS.           1 u    6   64  377   37.681   57.329  10.506*time-b.nist.gov .ACTS.           1 u    2   64  377   22.324   48.675   9.838-server1.nyc.she 209.51.161.238   2 u    4   64  377   22.409   33.617  19.516-ns1.oninit.net  104.131.51.97    3 u    4   64  377   33.764   59.079  12.793+golem.canonical 131.188.3.220    2 u   60   64  377   91.229   54.108   6.458 hbasemaster.s .BCST.          16 u    -   64    0    0.000    0.000   0.000
Notice the cluster master is broadcasting.

Next install ntp in all your slave nodes as above and edit /etc/npt.conf to :
# Use servers from the NTP Pool Project. Approved by Ubuntu Technical Board# on 2011-02-08 (LP: #104525). See http://www.pool.ntp.org/join.html for# more information.#server 0.ubuntu.pool.ntp.org#server 1.ubuntu.pool.ntp.org#server 2.ubuntu.pool.ntp.org#server 3.ubuntu.pool.ntp.org
server hbasemaster# Use Ubuntu's ntp server as a fallback.server ntp.ubuntu.com...

Note that this node is looking for cluster master node and using ubuntu time server as a fallback. You can choose second master as fallback as well.

Restart service
>sudo service ntp restart

Verify that the service is running 
>sudo ntpq -p 
Results:
     remote           refid      st t when poll reach   delay   offset  jitter==============================================================================+ hbasemaster.s 129.6.15.29      2 u   14   64  377    0.195   40.549  25.898*juniperberry.ca 193.79.237.14    2 u   60   64  377   91.039   37.466   1.674

Note the master time server was found. Also look at the delay and jitter columns. The cluster master timeserver delay is very low in comparison to the ubuntu server. However the jitter is high. For my purposes I can live with the high jitter because HBase has a fault tolerance in the millisecond to second differences. 


No comments:

Post a Comment