Advertisement

Wednesday, April 18, 2018

Hadoop V2 - QJM (Manual Configuration)

In this blog I discuss setting up Namenode High availability using QJM (Quorum Journal Manager)

Functioning of QJM
1. 3 (or 5 or odd number) nodes running QJM
2. NN writes Edit logs to these nodes
3. SNN (Standby) reads and applies changes from these edit logs
4. ZKFC (Zookeper failover controller) running on NN AND SNN
5. whenever failure detected a new NN is elected thus

Failover can be manual or automatic
In this blog we only do manual failover, in the following blogs I will add on configuration for automatic failvoer

Stop cluster and spark
[nn-hdfs] stop-dfs.sh
[rm-yarn] - stop-yarn.sh
[rm-spark] - stop-all.sh
[rm-mapred] - mr-jobhistory-daemon.sh stop historyserver



Setting Up Instructions
hdfs-site.xml
1. Create logical nameservice
<property>
    <name>dfs.nameservices</name>
    <value>devcluster</value>
</property>


2. Setup path prefix (core_site.xml)
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://devcluster</value>
</property>


3. Create Namenode identifiers
<property>
    <name>dfs.ha.namenodes.devcluster</name>
    <value>nn1,nn2</value>
</property
>
   
4. Set Journal Nodes
<property>
<name>dfs.namenode.shared.edits.dir</name>
    <value>        <value>qjournal://d1.novalocal:8485;d2.novalocal:8485;d3.novalocal:8485/devcluster</value>
</property>


Here make sure that hosts are fully qualified and resolvable.


5. Provide shared edits directory on JN'setting
<property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/opt/HDPV2/journal/node/local/data</value>
</property>


This is the folder on nodes which will act like journal ndoes

6. Configure RPC Address


<property>
    <name>dfs.namenode.rpc-address.devcluster.nn1</name>
    <value>nn:8020</value>
</property>
<property>
    <name>dfs.namenode.rpc-address.devcluster.nn2</name>
    <value>snn:8020</value>
</property>


7. Configure HTTP Listen addresses

<property>
    <name>dfs.namenode.http-address.devcluster.nn1</name>
    <value>nn:50070</value>
</property>
<property>
    <name>dfs.namenode.http-address.devcluster.nn2</name>
    <value>snn:50070</value>
</property>


8. Configure Java class for Failover
This class helps to determine which namenode is active when client contacts to namenode service.
<property>
    <name> dfs.client.failover.proxy.provider.devcluster </name>
 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>


9. Configure Fencing
<property>
    <name>dfs.ha.fencing.methods</name>
    <value>sshfence</value>
</property>

Note - you must install fuser (package psmisc)

9.1 Specify Fencing keys
<property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/home/hdfs/.ssh/id_rsa</value>
</property>



10. Finally remove properties from your hdfs-site file. These are no longer required as I have already provided these for individual nodes via cluster and service id 

(Also remove rpc configuration if done)
dfs.http.address
dfs.secondary.http.address

11. Now scp hdfs-site.xml to snn, d1n, d2n, d3n and rm
[As root]
cd $CONF
for i in $(cat /tmp/all_hosts) ;do scp hdfs-site.xml core-site.xml ${i}:/etc/hadoop/conf/ ; done
for i in $(cat /tmp/all_hosts) ;do ssh ${i} chmod -R 755 /etc/hadoop/conf ; done;


12. Setup Journal Nodes d1n,d2n and d3n

[As root]
mkdir -p /opt/HDPV2/journal/node/local/data
chown -R hdfs:hadoop /opt/HDPV2/journal


hdfs hadoop-daemon.sh start journalnode

13. [As hdfs start ZKFC on nn]
hdfs zkfc –formatZK -force

[hdfs@nn conf]$ hdfs zkfc -formatZK -force
18/04/18 04:58:40 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at nn/192.168.2.101:8020
18/04/18 04:58:40 FATAL ha.ZKFailoverController: Automatic failover is not enabled for NameNode at nn/192.168.2.101:8020. Please ensure that automatic failover is enabled in the configuration before running the ZK failover controller.



14. Start NameNode [As hdfs on nn]
 hadoop-daemon.sh start namenode

15. Bootstrap Standby

hdfs namenode -bootstrapStandby
=====================================================
About to bootstrap Standby ID nn2 from:
           Nameservice ID: devcluster
        Other Namenode ID: nn1
  Other NN's HTTP address: http://nn:50070
  Other NN's IPC  address: nn/192.168.2.101:8020
             Namespace ID: 1878602801
            Block pool ID: BP-1256862429-192.168.2.101-1523506309307
               Cluster ID: CID-65c51f92-e1c0-4a17-a921-ae0e61f3251f
           Layout version: -63
       isUpgradeFinalized: true



16. Start Secondary NameNode [As hdfs on snn]
 hadoop-daemon.sh start namenode

17. Stop both namenodes [As hdfs on nn and snn]
hadoop-daemon.sh stop namenode

17.1 Stop all journal daemons. [As hdfs on d1n,d2n and d3n]
 hadoop-daemon.sh stop journalnode


18. Start the Cluster [As hdfs on nn]

start-dfs.sh
19. Force Transition one node to Active

This is an important discussion, when you first start, you fill see both your nodes are in standby mode.
This is because we have not configured automatic failover and zoo keeper controller processes.
I will cover that next, but as of now let's focus on task at hand.

Check Service State

[hdfs@nn ~]$ hdfs haadmin -getServiceState nn1
standby
[hdfs@nn ~]$ hdfs haadmin -getServiceState nn2
standby


hdfs haadmin -failover -forcefence -forceactive nn2 nn1
Failover from nn2 to nn1 successful


Remember always - When you do a forced failover, other node must be fenced, so it will be shutdown automatically. You can start it manually.

Now check the Service State

[hdfs@nn ~]$ hdfs haadmin -getServiceState nn1
active
[hdfs@nn ~]$ hdfs haadmin -getServiceState nn2
standby


 You can look into webui of each to confirm the status.




 

Get configuration of namenodes
[hdfs@nn ~]$ hdfs getconf -namenodes
nn snn


Start Cluster
[rm-yarn] - start-yarn.sh
[rm-spark] - start-all.sh
[rm-mapred] - mr-jobhistory-daemon.sh start historyserver



The directory structure of the QJM directory is similar to namenodes metadata directory. however, it has only edits, as they are only required by Standby Node to rollforward it's in memory fsimage.

/opt/HDPV2/journal/node/local/data/devcluster/current
[root@d1n current]# ls -rlht
total 1.1M
-rw-r--r--. 1 hdfs hadoop  155 Apr 18 04:39 VERSION
-rw-r--r--. 1 hdfs hadoop   42 Apr 18 04:39 edits_0000000000000000701-0000000000000000702
-rw-r--r--. 1 hdfs hadoop   42 Apr 18 05:23 edits_0000000000000000703-0000000000000000704
-rw-r--r--. 1 hdfs hadoop   42 Apr 18 05:25 edits_0000000000000000705-0000000000000000706
-rw-r--r--. 1 hdfs hadoop   42 Apr 18 05:26 edits_0000000000000000707-0000000000000000708
-rw-r--r--. 1 hdfs hadoop    2 Apr 18 05:26 last-promised-epoch
drwxr-xr-x. 2 hdfs hadoop    6 Apr 18 05:26 paxos
-rw-r--r--. 1 hdfs hadoop    2 Apr 18 05:26 last-writer-epoch
-rw-r--r--. 1 hdfs hadoop   42 Apr 18 05:28 edits_0000000000000000709-0000000000000000710
-rw-r--r--. 1 hdfs hadoop   42 Apr 18 05:30 edits_0000000000000000711-0000000000000000712
-rw-r--r--. 1 hdfs hadoop   42 Apr 18 05:32 edits_0000000000000000713-0000000000000000714
-rw-r--r--. 1 hdfs hadoop   42 Apr 18 05:34 edits_0000000000000000715-0000000000000000716
-rw-r--r--. 1 hdfs hadoop 1.0M Apr 18 05:34 edits_inprogress_0000000000000000717
-rw-r--r--. 1 hdfs hadoop    8 Apr 18 05:34 committed-txid
 



No comments:
Write comments