Friday, November 12, 2010

Building Simple MapReduce java Program

         Building Simple MapReduce java Program

Map Reduce is a combination of two functions map() and reduce().

Main class for a simple MapReduce Java Application :

public class Main
{
public static void main (String ap[])
{
MyMapReduce my = new MyMapReduce();
my.init ();
}
}

It just instantiates a class called, 'MyMapReduce'.

MapReduce Program for Factorial :


import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;


public static class Map extends MapReduceBase implements Mapper <LongWritable, Text, Text, Text>
          {
             private Text word = new Text();
             private final static Text location = new Text();
             public void map(LongWritable key, Text value, OutputCollector <Text,                                         Text> output, Reporter reporter) throws IOException
               {
                   String line = value.toString();
                   StringTokenizer tokenizerLine = new StringTokenizer(Line, “\n”);
                   Text T1 = new Text();
                   Text t2 = new Text();
                   int num;
                   while (tokenizerLine.hasmoreTokens())
                      {
                          String tokenAsLine = tokenizerLine.nextToken();
                          StringTokenizer tokenizerWord = new StringTokenizer                                                                      (tokenAsLine);
                          List s1 = new ArrayList();
                          while (tokenizerLine.hasMoreTokens())
                               {
                                     String tokenAsLine = tokenizerLine.nextToken();
                                     StringTokenizer tokenizerWord = new StringTokenizer                                                                              (tokenAsList);
                                     List s1=new ArrayList();
                                     while (tokenizerWord.hasMoreTokens())
                                         {
                                             s1.add(tokenizerWord.nextToken());
                                         }
                                     for(int i=0; i<=(s1.size()-1); i++)
                                         {
                                             num = Integer.parseInt((String)s1.get(i));
                                             int fact=1;
                                     for (int j=1 ; j>= num ; j++)
                                        {
                                            fact = fact * j;
                                        }
                                    t1.set((String)s1.get(i));
                                    t2.set(“ ” + fact);
                                    output.collect(t1 , t2);
                                 }
                           }
                    }
             }


public static class Reduce extends MapReduceBase implements Reducer <Text,                                              Text, Text, Text>
          {
                public void reduce (Text key, Iterator <Text> values, outputCollector                          <Text, Text> output, Reporter reporter) throws IOException
                     {
                           boolean first = true;
                           StringBuilder toReturn = new StringBuilder();
                           while (values.hasNext())
                                {
                                    if(!first)
                                    toReturn.append(“ , ”);
                                    first = false;
                                    toReturn.append(values.next().toString());
                                }
                    }
          }


public static void main(String ap[])
{
    JobConf conf= new JobConf (Factorial.class);
    conf.setJobName(“factorial”);
    conf.setOutputKeyClass(Text.class);
    conf.setMapperClass(map.class);
    conf.setReducerClass(TextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);
    FileInputFormat.setInputPaths(conf, new Path(ap[0]));
    FileOutputFormat.setOutputPath(conf, new Path (ap[1]));
    try
      {
          conf.set(“io.sort.mb”, “10”);
          JobClient.runJob(conf);
      }
    catch(IOException e)
       {
           System.err.println(e.getMessage());
       }
}

Tuesday, November 2, 2010

Building HADOOP CLUSTER [ Using 2 Linux Machines ]

Building HADOOP CLUSTER [Using 2 Linux Machines]

STEP 1) Install Java 6 or above on Linux machine ( jdk1.6.0.12 )
I am having 'jdk-6u12-linux-i586.bin' on my REDHAT machine.
To Install follow commands :
# chmod 744 jdk-6u12-linux-i586.bin
# ./ jdk-6u12-linux-i586.bin


STEP 2) Download 'jce-policy-6.zip'
extract it.
# cp -f jce/*.jar $JAVA_HOME/jre/lib/seciruty/
# chmod 444 $JAVA_HOME/jre/lib/seciruty/*.jar


STEP 3) Download hadoop-0.20.0.tar.gz or any latest version
extract it and copy ' hadoop-0.20.0' folder to '/usr/local/' directory.

STEP 4) Set JAVA PATH
# export JAVA_HOME=/java_installation_folder/jdk1.6.0_12
STEP 5) Set HADOOP PATH
# export HADOOP_HOME=/usr/local/hadoop-0.20.2
# export PATH=$PATH:SHADOOP_HOME/bin

Install same on second Linux machine
Then Description of machines is :

Server IP                             HostName                                Role

1) 192.168.100.19             hostmaster         Master [ NameNode and JobTracker ]
2) 192.168.100.17             hostslave            Slave [ Datanode and TaskTracker]


STEP 6) Now do following settings on Master :

# vim /etc/hosts
make changes as...
comment all and write at the end
192.168.100.19 hostmaster
save and exit

Changes to be made on Slave Machine :

# vim /etc/hosts
make changes as...
comment all and write at the end
192.168.100.17 hostslave
192.168.100.19 hostmaster
save and exit

STEP 7) For Communication setup SSH :

Do the steps on master as well as on slave-
# ssh-keygen -t rsa
it generates the RSA public & private keys.
This is because Hadoop Master Node communicates with Slave Node using SSH.
This will generate 'id_rsa.pub' file under '/root/.ssh' directory. Now rename the Master's id_rsa.pub to '19_rsa.pub' and copy it to Slave Node (at same path).
Then execute the following command to add the Master's public key to the Slave's authorized keys.

# cat /root/.ssh/19_rsa.pub >> /root/.ssh/authorized_keys

Now try to ssh the Slave Node. It should be connected without needing any password.

# ssh 192.168.100.17

STEP 8) Setting up MASTER NODE :
Setup Hadoop to work in a fully distributed mode by configuring the configuration files under the $HADOOP_HOME/conf/ directory.

Configuration Property :
Property                                               Explanation
1) fs.default.name                              NameNode URI
2) mapred.job.tracker                       JobTracker URI
3) dfs.replication                                Number of replication
4) hadoop.tmp.dir (optional)              Temp Directory

Let us Start with Configuration files :

1) $HADOOP_HOME/conf/hadoop-env.sh
make change as...
export JAVA_HOME=/java_installation_folder/jdk1.6.0_12

2) $HADOOP_HOME/conf/core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

3) $HADOOP_HOME/conf/hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

4) $HADOOP_HOME/conf/mapred-site.xml

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>

5) $HADOOP_HOME/conf/masters
192.168.100.19

6) $HADOOP_HOME/conf/slaves
192.168.100.17

Now copy all these files to /conf directory of SLAVE Machine.

STEP 9) Setup Master and Slave Node : (run on both machines)

# hadoop namenode -format
# start-all.sh


Now your Cluster is Ready to run Jobs