How to create a Map/Reduce Job using .NET SDK for Hadoop

The last post we see how to load and query data using hive, this time we going to use the .NET SDK for Hadoop for the same purpose.

In this case we are using the SDK for Hadoop this allow us to use Map/Reduce Jobs to query data in a distribute file system environment that can be composed for hundreds or thousands of nodes. MapReduce is a programming model for processing large data sets being typically used to do distributed computing on clusters of computers.

In a typically Map/Reduces program we can find two class the mapper and the reducer.

The mapper: this is the collection data phase, in this phase the Mapper breaks up large pieces of work into smaller ones and then takes action on each pieces.

The Reducer: this is the processing phase. Reduce combines the many results from the map step into a single output.

The first thing we will do is to load the data, in this case I going to use the same file Products.csv from my last post and load it into the hadoop file system.

For this purpose open the Hadoop command line and type:

>hadoop fs -copyFromLocal C:\BigData\Products.csv input/Products/Products.csv

We can see the file we load using the Hadoop namenode status and browsing the file system.

Now we need to create a Class Library project using Visual Studio 2012, once the project has been created we need to add as reference the Microsoft Hadoop dlls, for this we are going to use Nuget Packages, so if we right click our project you will find the option Manage Nuget Packages (if not you need to install this Visual Studio add-in).

When Manage Nuget Packages open type hadoop in the search box and install all the packages.

The project have three class

MyMapReduceAPP: implement the HadoopJob interface, is the entry point of the job and indicate the mapper and the reducer class.

ProductMapper: collect and process the data

Reducer: aggregate the data.

To test our dll will need to execute it using the mrrunner.exe. So in the Hadoop command line, type.

>cd C:\Users\Administrator.SHAREPOINT2013L\Documents\visual studio 2012\Projects\MapReduceAPP\MapReduceAPP\mrlib

>mrrunner -dll “C:\Users\Administrator.SHAREPOINT2013L\Documents\visual studio 2012\Projects\MapReduceAPP\MapReduceAPP\bin\Debug\MapReduceAPP.dll”

We can see the result using the Hadoop namenode status and browsing the file system.

Has you can see is the same result we got before using hive.

How to create a Map/Reduce Job using .NET SDK for Hadoop

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List