Building HADOOP CLUSTER [Using 2 Linux Machines]
STEP 1)  Install Java 6 or above on Linux machine  ( jdk1.6.0.12 )
I am having  'jdk-6u12-linux-i586.bin' on my REDHAT machine.
To Install follow commands :
# chmod 744   jdk-6u12-linux-i586.bin
# ./ jdk-6u12-linux-i586.bin
STEP 2) Download 'jce-policy-6.zip'
extract it.
# cp  -f  jce/*.jar    $JAVA_HOME/jre/lib/seciruty/
# chmod   444    $JAVA_HOME/jre/lib/seciruty/*.jar
STEP 3) Download hadoop-0.20.0.tar.gz   or any latest version
extract it and copy ' hadoop-0.20.0' folder to '/usr/local/' directory.
STEP 4) Set JAVA PATH
# export JAVA_HOME=/java_installation_folder/jdk1.6.0_12
STEP 5) Set HADOOP PATH
# export HADOOP_HOME=/usr/local/hadoop-0.20.2
# export PATH=$PATH:SHADOOP_HOME/bin
Install same on second Linux machine
Then Description of machines is :
Server IP                                                                HostName                                                                    Role
1) 192.168.100.19                              hostmaster                            Master [ NameNode and                        JobTracker ]
2) 192.168.100.17                              hostslave                                 Slave [ Datanode and           TaskTracker] 
STEP 6) Now do following settings on Master :
# vim /etc/hosts
make changes as...
comment all and write at the end
192.168.100.19         hostmaster
save and exit
Changes to be made on Slave Machine :
# vim /etc/hosts
make changes as...
comment all and write at the end
192.168.100.17       hostslave
192.168.100.19       hostmaster
save and exit
STEP 7) For Communication setup SSH :
Do the steps on master as well as on slave-
# ssh-keygen     -t   rsa
it generates the RSA public & private keys.
This is because Hadoop Master Node communicates with Slave Node using SSH.
This will generate  'id_rsa.pub' file under '/root/.ssh' directory. Now rename the Master's id_rsa.pub to '19_rsa.pub'  and copy it to Slave Node (at same path).
Then execute the following command to add the Master's public key to the Slave's authorized keys.
# cat /root/.ssh/19_rsa.pub  >>  /root/.ssh/authorized_keys
Now try to ssh the Slave Node. It should be connected without needing any password.
# ssh 192.168.100.17
STEP 8) Setting up MASTER NODE :
Setup Hadoop to work in a fully distributed mode by configuring the configuration files under the $HADOOP_HOME/conf/   directory.
Configuration Property :
Property                                                                                                                      Explanation
1) fs.default.name                                                                                  NameNode URI
2) mapred.job.tracker                                                                     JobTracker URI
3) dfs.replication                                                                                     Number of replication
4) hadoop.tmp.dir (optional)                                                 Temp Directory
Let us Start with Configuration files :
1) $HADOOP_HOME/conf/hadoop-env.sh
make change as...
export  JAVA_HOME=/java_installation_folder/jdk1.6.0_12
2) $HADOOP_HOME/conf/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
3) $HADOOP_HOME/conf/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
4) $HADOOP_HOME/conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
5) $HADOOP_HOME/conf/masters
192.168.100.19
6) $HADOOP_HOME/conf/slaves
192.168.100.17
Now copy all these files to /conf directory of SLAVE Machine.
STEP 9) Setup Master and Slave Node : (run on both machines)
# hadoop namenode -format
# start-all.sh 
Now your Cluster is Ready to run Jobs
 
No comments:
Post a Comment