Thursday 16 July 2015

Hadoop Jar with libjars option and importance of Generic Options Parser

When you have written MapReduce program with the support of Third Party jars, its very important to ensure these jars are fed to MapReduce, which will be used by every slave node that is running Map/Reduce tasks. Including these jars has always been some what difficult job for the users. So they will tend to create fat jar, which will include all the dependencies in the exported archive file. 

However, there is an other elegant option "libjars" which can be included while running the MapReduce job using hadoop jar command, with comma separated dependent jar files.

eg: hadoop jar <some_path>/your_jar.jar <your.class.name> -libjars <lib_path>/commons-lang-1.2.jar,<some_path>/guava-1.13.jar,other_jars <inputs & outputs and other parameters...>

Making this command work is tougher than the saying.

To ensure libjars works, you need to make sure below things. 

Your Driver program has to extend Configured and implement Tool and your main class should have object created for GenericOptionsParser. Example Code has been shown below.

Please Note that: The configuration in run method has to be recieved from main method. Otherwise, the property values will not be received from command line.

It would be explained as below. 

Immediately after calling main method, configuration will be created and that takes the command line properties and updates configuration. 

If this configuration is not used in run, and new Configuration created and used, the properties that set will not be passed and your job will not recognize, lib jars and will throw class not found error. This is one of common mistakes that every programmer does.
 

also note that, your execution command should have class name first, and then libjars as second and then regular inputs.

eg:

hadoop jar ./srini.jar com.srini.test.TestMapreduce -libjars /usr/lib/hadoop/lib/commons-logging-1.1.3.jar,/usr/lib/hadoop/lib/commons-lang-1.1.3.jar <argument1> <argument2> <argument3> etc..


Example Code:

    public int run(String[] args) throws Exception {
       
         Configuration mainConf = super.getConf();
         GenericOptionsParser parser = new GenericOptionsParser(args);
         

         // inputs below will have all the actual arguments passed
         String[] inputs = parser.getRemainingArgs();

       
         Job job = Job.getInstance(mainConf);
       
         job.setMapperClass(TestMapper.class);
         job.setInputFormatClass(TextInputFormat.class);
         job.setJarByClass(TestMapReduce.class);
         FileInputFormat.addInputPath(job, new Path(inputs[0]));
         LazyOutputFormat.setOutputFormatClass(job,TextOutputFormat.class);
         FileOutputFormat.setOutputPath(job, new Path(inputs[1]));
       
         job.setNumReduceTasks(0);
         return (job.waitForCompletion(true)?0:1);
    }
   
    public static void main(String... args) throws Exception
    {
        Configuration conf = new Configuration();
        int res = ToolRunner.run(conf,new TestMapReduce(), args);
        System.exit(res);
    }

No comments:

Post a Comment