HADOOP Installation and Deployment of a Single Node on a Linux - PowerPoint PPT Presentation

HADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap And Garrett Poppe

Topics ● Create hadoopuser and group ● Edit sudoers ● Set up SSH ● Install JDK ● Install Hadoop ● Editting Hadoop settings ● Running Hadoop ● Resources

Add Hadoopuser

Edit sudoers

Set up SSH sudo chown hadoopuser ~/.ssh ● sudo chmod 700 ~/.ssh ● sudo chmod 600 ~/.ssh/id_rsa ● sudo cat ~/.ssh/id_rsa.pub >> ● ~/.ssh/authorized_keys sudo chmod 600 ~/.ssh/ ● authorized_keys Edit /etc/ssh/sshd_config ●

Install JDK ● Login as hadoopuser ● Uninstall previous versions of JDK ● Download current version of JDK ● Install JDK ● Edit JAVA_HOME and PATH variables in “~/.bashrc” file

Install Hadoop Download current stable release ● Untar the download ● tar xzvf hadoop-2.4.1.tar.gz ● Move the untarred folder ● sudo mv hadoop-2.4.1 /usr/local/ ● hadoop Change ownership and create ● nodes sudo chown -R ● hadoopuser:hadoopgroup /usr/ local/hadoop mkdir -p ~/hadoopspace/hdfs/ ● namenode mkdir -p ~/hadoopspace/hdfs/ ● datanode

Install Hadoop ● Edit Hadoop variables in “~/.bashrc” file ● After editing file, use command to apply. ● “source ~/.bashrc”

Editing Hadoop settings ● Go to directory located at /usr/local/ hadoop/etc/hadoop ● Create a copy of mapred- site.xml.template as mapred-site.xml

Editing Hadoop settings <property> <name>mapreduce.fra ● Edit mapred-site.xml mework.name ● Add code between </name> <configuration> tabs <value>yarn</value> </property>

Editing Hadoop settings <property> <name>yarn.nodemana ● Edit yarn-site.xml ger.aux-services ● Add code between </name> <configuration> tabs <value> mapreduce_shuffle </ value> </property>

Editing Hadoop settings <property> <name> ● Edit core-site.xml fs.default.name ● Add code between <configuration> tabs </name> <value> hdfs://localhost:9000 </value> </property>

Editing Hadoop settings <property> <property> <property> Edit hdfs-site.xml ● <name> <name> <name> Add code ● dfs.replication dfs.name.dir dfs.data.dir between <configuration> </name> </name> </name> tabs <value> <value> <value> 1 file:///home/hadoopuser/ file:///home/hadoopuser/ hadoopspace/hdfs/ hadoopspace/hdfs/ </value> namenode datanode </property> </value> </value> </property> </property>

Editing Hadoop settings ● Edit “hadoop-env.sh” ● Create the JAVA_HOME variable using current JDK path.

Editting Hadoop settings ● Format the namenode using the command “hdfs namenode - format”

Running Hadoop ● Start services ● “start-dfs.sh” ● “start-yarn.sh”

Running Hadoop ● Use jps command to make sure all services are running.

Running Hadoop ● Open web browser. ● Type “localhost: 50070” into address bar to access web interface.

Part 2 ● WRITING MAPREDUCE PROGRAMS FOR HADOOP

Languages/scripts used ● We will talk about two languages used to write mapreduce programs in Hadoop: ● 1) Pig Script (also called Pig Latin) ● 2) Java

Pig ● What is Pig? ● Pig is a high-level platform for creating MapReduce programs used with Hadoop. ● It is somewhat similar to SQL

How Pig Works ● Pig has two modes of execution: ● 1) Local Mode - To run Pig in local mode, you need access to a single machine. ● 2) Mapreduce Mode - To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation.

Syntax to run Pig ● To run Pig in Local Mode, use: ● pig -x local id.pig ● To run Pig in Mapreduce Mode, use: ● pig id.pig or pig -x mapreduce id.pig

Ways to run Pig ● Whether in local or mapreduce mode, there are 3 ways of running Pig: ● 1) Grunt shell ● 2) Batch or script file ● 3) Embedded Program

Sample Grunt Shell Code

Grunt Shell Commands

Batch ● To run Pig with batch files, the pig script is written entirely into a Pig file and the file run with Pig. ● A sample syntax for the file totalmiles.pig is: ● Pig totalmiles.pig

Content of file totalmiles.pig

Content of 1987 flight data file

JAVA ● We tested the mapreduce function of Hadoop on a java program called WordCount.java ● The wordcount.class is provided in the examples that come with hadoop installation

Where to find the Hadoop Examples

Launching WordCount job

WordCount Processing

Results

WordCount.Java - Map

WordCount.java - Reduce

● Fin ● Thank YOU!!

Resources ● http://alanxelsys.com/hadoop-v2-single-node- installation-on-centos-6-5/ ● http://tecadmin.net/setup-hadoop-2-4-single-node- cluster-on-linux/ ● http://hadoop.apache.org/ ● http://cs.smith.edu/dftwiki/index.php/ Hadoop_Tutorial_1_--_Running_WordCount ● https://pig.apache.org/docs/r0.10.0/basic.html

HADOOP Installation and Deployment of a Single Node on a Linux - PowerPoint PPT Presentation

HADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap And Garrett Poppe Topics Create hadoopuser and group Edit sudoers Set up SSH Install JDK Install Hadoop Editting Hadoop

SAS Data Loader for Hadoop Agenda Intro What is Hadoop? What do I get from Hadoop?

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management Andre Luckow,

COMP9313: Big Data Management Hadoop and HDFS Hadoop Apache Hadoop is an open-source

BY SRIJHA REDDY GANGIDI What is Hadoop ? Evolution of Hadoop Created by dough cutting, a part

Spark and Hadoop at Yahoo: Brought to you by YARN Andy Feng Yahoo! Hadoop (afeng@yahoo-inc.com)

HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc.

Apache Hadoop 3.x State of The Union and Upgrade Guidance Wei-Chiu Chuang Wangda Tan

Hadoop Jrg Mllenkamp Principal Field Technologist Sun Microsystems Agenda Introduction

Big Data with R and Hadoop Jamie F Olson June 11, 2015 ; R and Hadoop Review various tools

Working With Hadoop Mostly based on Tom Whites book Hadoop: Now that we covered the

Datenanalyse mit Hadoop Quelle: Apache Software Foundation Datenanalyse mit Hadoop Gideon Zenz

Extension: Combiner Functions import org.apache.hadoop.io.IntWritable; import

Fault Tolerance, Replication, and Consistency 1 Motivation: Hadoop Cluster 2 Motivation:

Using Hadoop for Webscale Computing Ajay Anand Yahoo! aanand@yahoo-inc.com Usenix 2008 Agenda

The Evolution of Hadoop at Spotify Rafal Wojdyla (rav@spotify.com) Josh Baer (jbx@spotify.com)

Large-Scale Data Engineering Frameworks Beyond MapReduce event.cwi.nl/lsde THE HADOOP ECOSYSTEM

Programming Web Applications Programming Web Applications The relation to Servlets with JSP

Errick L. Greene Ed.D., Superintendent Our Agenda Opening & Welcome Mission/Vision & JPS

More Moores Law through More Moore s Law through Computational Scaling -- and EDAs Role

Search for exotic hadrons with the PANDA detector Jan Kisiel Institute of Physics, University of

Enhancing Health and Safety in the Homeless Response System July 23, 2020 Housekeeping A

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Asymptotic analysis of large random graphs Marion Sciauveau Joint work with J-F. Delmas and J-S.

Executive Order on Sanctuary Jurisdictions Slides available at www.naco.org/webinars later

HADOOP Installation and Deployment of a Single Node on a Linux - PowerPoint PPT Presentation

HADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap And Garrett Poppe Topics Create hadoopuser and group Edit sudoers Set up SSH Install JDK Install Hadoop Editting Hadoop

SAS Data Loader for Hadoop Agenda Intro What is Hadoop? What do I get from Hadoop?

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management Andre Luckow,

COMP9313: Big Data Management Hadoop and HDFS Hadoop Apache Hadoop is an open-source

BY SRIJHA REDDY GANGIDI What is Hadoop ? Evolution of Hadoop Created by dough cutting, a part

Spark and Hadoop at Yahoo: Brought to you by YARN Andy Feng Yahoo! Hadoop (afeng@yahoo-inc.com)

HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc.

Apache Hadoop 3.x State of The Union and Upgrade Guidance Wei-Chiu Chuang Wangda Tan

Hadoop Jrg Mllenkamp Principal Field Technologist Sun Microsystems Agenda Introduction

Big Data with R and Hadoop Jamie F Olson June 11, 2015 ; R and Hadoop Review various tools

Working With Hadoop Mostly based on Tom Whites book Hadoop: Now that we covered the

Datenanalyse mit Hadoop Quelle: Apache Software Foundation Datenanalyse mit Hadoop Gideon Zenz

Extension: Combiner Functions import org.apache.hadoop.io.IntWritable; import

Fault Tolerance, Replication, and Consistency 1 Motivation: Hadoop Cluster 2 Motivation:

Using Hadoop for Webscale Computing Ajay Anand Yahoo! aanand@yahoo-inc.com Usenix 2008 Agenda

The Evolution of Hadoop at Spotify Rafal Wojdyla (rav@spotify.com) Josh Baer (jbx@spotify.com)

Large-Scale Data Engineering Frameworks Beyond MapReduce event.cwi.nl/lsde THE HADOOP ECOSYSTEM

Programming Web Applications Programming Web Applications The relation to Servlets with JSP

Errick L. Greene Ed.D., Superintendent Our Agenda Opening &amp; Welcome Mission/Vision &amp; JPS

More Moores Law through More Moore s Law through Computational Scaling -- and EDAs Role

Search for exotic hadrons with the PANDA detector Jan Kisiel Institute of Physics, University of

Enhancing Health and Safety in the Homeless Response System July 23, 2020 Housekeeping A

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Asymptotic analysis of large random graphs Marion Sciauveau Joint work with J-F. Delmas and J-S.

Executive Order on Sanctuary Jurisdictions Slides available at www.naco.org/webinars later

Errick L. Greene Ed.D., Superintendent Our Agenda Opening & Welcome Mission/Vision & JPS