HADOOP Installation and Deployment of a Single Node on a Linux - - PowerPoint PPT Presentation

hadoop
SMART_READER_LITE
LIVE PREVIEW

HADOOP Installation and Deployment of a Single Node on a Linux - - PowerPoint PPT Presentation

HADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap And Garrett Poppe Topics Create hadoopuser and group Edit sudoers Set up SSH Install JDK Install Hadoop Editting Hadoop


slide-1
SLIDE 1

HADOOP

Installation and Deployment of a Single Node on a Linux System

Presented by: Liv Nguekap And Garrett Poppe

slide-2
SLIDE 2

Topics

  • Create hadoopuser and group
  • Edit sudoers
  • Set up SSH
  • Install JDK
  • Install Hadoop
  • Editting Hadoop settings
  • Running Hadoop
  • Resources
slide-3
SLIDE 3

Add Hadoopuser

slide-4
SLIDE 4

Edit sudoers

slide-5
SLIDE 5

Set up SSH

  • sudo chown hadoopuser ~/.ssh
  • sudo chmod 700 ~/.ssh
  • sudo chmod 600 ~/.ssh/id_rsa
  • sudo cat ~/.ssh/id_rsa.pub >>

~/.ssh/authorized_keys

  • sudo chmod 600 ~/.ssh/

authorized_keys

  • Edit /etc/ssh/sshd_config
slide-6
SLIDE 6

Install JDK

  • Login as hadoopuser
  • Uninstall previous

versions of JDK

  • Download current

version of JDK

  • Install JDK
  • Edit JAVA_HOME

and PATH variables in “~/.bashrc” file

slide-7
SLIDE 7

Install Hadoop

  • Download current stable release
  • Untar the download
  • tar xzvf hadoop-2.4.1.tar.gz
  • Move the untarred folder
  • sudo mv hadoop-2.4.1 /usr/local/

hadoop

  • Change ownership and create

nodes

  • sudo chown -R

hadoopuser:hadoopgroup /usr/ local/hadoop

  • mkdir -p ~/hadoopspace/hdfs/

namenode

  • mkdir -p ~/hadoopspace/hdfs/

datanode

slide-8
SLIDE 8

Install Hadoop

  • Edit Hadoop variables

in “~/.bashrc” file

  • After editing file, use

command to apply.

  • “source ~/.bashrc”
slide-9
SLIDE 9

Editing Hadoop settings

  • Go to directory

located at /usr/local/ hadoop/etc/hadoop

  • Create a copy of

mapred- site.xml.template as mapred-site.xml

slide-10
SLIDE 10

Editing Hadoop settings

  • Edit mapred-site.xml
  • Add code between

<configuration> tabs <property> <name>mapreduce.fra mework.name </name> <value>yarn</value> </property>

slide-11
SLIDE 11

Editing Hadoop settings

  • Edit yarn-site.xml
  • Add code between

<configuration> tabs <property> <name>yarn.nodemana ger.aux-services </name> <value> mapreduce_shuffle </ value> </property>

slide-12
SLIDE 12

Editing Hadoop settings

  • Edit core-site.xml
  • Add code between

<configuration> tabs <property> <name> fs.default.name </name> <value> hdfs://localhost:9000 </value> </property>

slide-13
SLIDE 13

Editing Hadoop settings

  • Edit hdfs-site.xml
  • Add code

between <configuration> tabs <property> <name> dfs.replication </name> <value> 1 </value> </property> <property> <name> dfs.name.dir </name> <value> file:///home/hadoopuser/ hadoopspace/hdfs/ namenode </value> </property> <property> <name> dfs.data.dir </name> <value> file:///home/hadoopuser/ hadoopspace/hdfs/ datanode </value> </property>

slide-14
SLIDE 14

Editing Hadoop settings

  • Edit “hadoop-env.sh”
  • Create the

JAVA_HOME variable using current JDK path.

slide-15
SLIDE 15

Editting Hadoop settings

  • Format the namenode

using the command “hdfs namenode - format”

slide-16
SLIDE 16

Running Hadoop

  • Start services
  • “start-dfs.sh”
  • “start-yarn.sh”
slide-17
SLIDE 17

Running Hadoop

  • Use jps command to

make sure all services are running.

slide-18
SLIDE 18

Running Hadoop

  • Open web browser.
  • Type “localhost:

50070” into address bar to access web interface.

slide-19
SLIDE 19
  • WRITING MAPREDUCE

PROGRAMS FOR HADOOP

Part 2

slide-20
SLIDE 20

Languages/scripts used

  • We will talk about two languages used to write

mapreduce programs in Hadoop:

  • 1) Pig Script (also called Pig Latin)
  • 2) Java
slide-21
SLIDE 21

Pig

  • What is Pig?
  • Pig is a high-level platform for creating

MapReduce programs used with Hadoop.

  • It is somewhat similar to SQL
slide-22
SLIDE 22

How Pig Works

  • Pig has two modes of execution:
  • 1) Local Mode - To run Pig in local mode, you

need access to a single machine.

  • 2) Mapreduce Mode - To run Pig in mapreduce

mode, you need access to a Hadoop cluster and HDFS installation.

slide-23
SLIDE 23

Syntax to run Pig

  • To run Pig in Local Mode, use:
  • pig -x local id.pig
  • To run Pig in Mapreduce Mode, use:
  • pig id.pig
  • r

pig -x mapreduce id.pig

slide-24
SLIDE 24

Ways to run Pig

  • Whether in local or mapreduce mode, there are

3 ways of running Pig:

  • 1) Grunt shell
  • 2) Batch or script file
  • 3) Embedded Program
slide-25
SLIDE 25

Sample Grunt Shell Code

slide-26
SLIDE 26

Grunt Shell Commands

slide-27
SLIDE 27

Grunt Shell Commands

slide-28
SLIDE 28

Batch

  • To run Pig with batch files, the pig script is

written entirely into a Pig file and the file run with Pig.

  • A sample syntax for the file totalmiles.pig is:
  • Pig totalmiles.pig
slide-29
SLIDE 29

Content of file totalmiles.pig

slide-30
SLIDE 30

Content of 1987 flight data file

slide-31
SLIDE 31

JAVA

  • We tested the mapreduce function of Hadoop
  • n a java program called WordCount.java
  • The wordcount.class is provided in the

examples that come with hadoop installation

slide-32
SLIDE 32

Where to find the Hadoop Examples

slide-33
SLIDE 33

JAVA

slide-34
SLIDE 34

Launching WordCount job

slide-35
SLIDE 35

WordCount Processing

slide-36
SLIDE 36

WordCount Processing

slide-37
SLIDE 37

Results

slide-38
SLIDE 38

Results

slide-39
SLIDE 39

WordCount.Java - Map

slide-40
SLIDE 40

WordCount.java - Reduce

slide-41
SLIDE 41
  • Fin
  • Thank YOU!!
slide-42
SLIDE 42

Resources

  • http://alanxelsys.com/hadoop-v2-single-node-

installation-on-centos-6-5/

  • http://tecadmin.net/setup-hadoop-2-4-single-node-

cluster-on-linux/

  • http://hadoop.apache.org/
  • http://cs.smith.edu/dftwiki/index.php/

Hadoop_Tutorial_1_--_Running_WordCount

  • https://pig.apache.org/docs/r0.10.0/basic.html