Introduction to Hadoop 1 Distributed Data Processing The idea of - PowerPoint PPT Presentation

Sep 03, 2022 •726 likes •838 views

Introduction to Hadoop 1 Distributed Data Processing The idea of distributed databases is older than you might think Richard Peebles, Eric G. Manning: A Computer Architecture for Large (Distributed) Data Bases. VLDB 1975 : 405-427 Distributed

Introduction to Hadoop 1
Distributed Data Processing The idea of distributed databases is older than you might think Richard Peebles, Eric G. Manning: A Computer Architecture for Large (Distributed) Data Bases. VLDB 1975 : 405-427 Distributed data structures and algorithms have always been around So, what is new? 2
Distributed Data Processing Big input Final data output Data partitioning Load balancing Fault tolerance Synchronization A cluster of machines 3
MapReduce A programing paradigm for expressing distributed algorithms Introduced by Google in 2004 Google File System for distributed storage Google MapReduce for distributed processing Hadoop is the open source counterpart released in 2007 and contributed mainly by Yahoo! HDFS Hadoop MapReduce 4
Hadoop Overview Master node Resource manager Name node Node Node Node Node Node Node Slave nodes manager manager manager manager manager manager Data Data Data Data Data Data node node node node node node 5
HDFS Loading HDFS 128 MB Block 128 MB 128 MB 128 MB 88 MB Input file (600 MB) 6
HDFS Storage B B B B B B B B B B B B B B B 7
Hadoop MapReduce A kind of functional programming A program is expressed in two functions only, map and reduce Map: r → {(k,v)} Takes as input one record and returns zero or more <key, value> pairs Reduce: (k,{v}) → a Takes one key and all its associated values and returns zero or more output values 8
Example: Word Count Map(line) { split line into words if you cannot fly, for each word w then run, if you output (w,1) you: 5 cannot run, then } cannot: 3 walk, if you cannot walk: 2 walk, then crawl, if: 3 but whatever you Reduce(w, c[]) { … do you have to s = Sum(c) keep moving output(w, s) forward } Output Input text file 9
Hadoop Operation Modes RM NN Name Resource node manager Node manager One JRE Data instance node NM DN Standalone mode Pseudo-distributed Cluster mode mode 10

Recommend

SAS Data Loader for Hadoop Agenda Intro What is Hadoop? What do I get from Hadoop?

SAS Data Loader for Hadoop Agenda Intro What is Hadoop? What do I get from Hadoop? Hadoop components Why SAS Data Loader for Hadoop? SAS Data Loader for Hadoop overview Demo Introduction Doug Cutting, creator of Hadoop

285 views • 11 slides

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management Andre Luckow,

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management Andre Luckow, Ioannis Paraskevakos, George Chantzialexiou and Shantenu Jha Hadoop on HPC: Integrating Hadoop and Pilot- based Dynamic Resource Management Overview

322 views • 17 slides

COMP9313: Big Data Management Hadoop and HDFS Hadoop Apache Hadoop is an open-source

COMP9313: Big Data Management Hadoop and HDFS Hadoop Apache Hadoop is an open-source software framework that Stores big data in a distributed manner Processes big data parallelly Builds on large clusters of commodity hardware.

2.93k views • 60 slides

Hadoop Jrg Mllenkamp Principal Field Technologist Sun Microsystems Agenda Introduction

Hadoop Jrg Mllenkamp Principal Field Technologist Sun Microsystems Agenda Introduction CMT+Hadoop Solaris+Hadoop Sun Grid Engine+Hadoop Introduction Im ... Jrg Mllenkamp better known as c0t0d0s0.org Sun Employee

1.37k views • 103 slides

BY SRIJHA REDDY GANGIDI What is Hadoop ? Evolution of Hadoop Created by dough cutting, a part

BY SRIJHA REDDY GANGIDI What is Hadoop ? Evolution of Hadoop Created by dough cutting, a part of Apache project. Hadoop Architecture Ambari Ambari offers a Web-based GUI with wizard scripts for setting up clusters with most of the standard

436 views • 18 slides

Spark and Hadoop at Yahoo: Brought to you by YARN Andy Feng Yahoo! Hadoop (afeng@yahoo-inc.com)

Spark and Hadoop at Yahoo: Brought to you by YARN Andy Feng Yahoo! Hadoop (afeng@yahoo-inc.com) Personalized Web Big-Data in Yahoo! 3 9/10/13 Hadoop + Spark: Empowered by YARN 30k+ Yahoo! production nodes on YARN since Q1 2013 Shark

421 views • 12 slides

HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc.

HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc. Yahoo! Yahoo! 1 Outline Overview of Hadoop, an open source project Design of HDFS On going work Yahoo! 2 Hadoop Hadoop provides

295 views • 25 slides

Apache Hadoop 3.x State of The Union and Upgrade Guidance Wei-Chiu Chuang Wangda Tan

Apache Hadoop 3.x State of The Union and Upgrade Guidance Wei-Chiu Chuang Wangda Tan @Cloudera, Sr. Manager, Compute Platform Apache Hadoop PMC @Cloudera, Apache Hadoop PMC Agenda Hadoop Community Updates & Overview Updates

872 views • 47 slides

Big Data with R and Hadoop Jamie F Olson June 11, 2015 ; R and Hadoop Review various tools

Big Data with R and Hadoop Jamie F Olson June 11, 2015 ; R and Hadoop Review various tools for leveraging Hadoop from R. MapReduce Spark Hive/Impala Revolution R . . . . . . . . . . . . . . . . . . . . . . . . . .

565 views • 52 slides

Working With Hadoop Mostly based on Tom Whites book Hadoop: Now that we covered the

Working With Hadoop Mostly based on Tom Whites book Hadoop: Now that we covered the basics of The Definitive Guide, 3 rd edition MapReduce , lets look at some Hadoop specifics. Note: We will use the new

181 views • 6 slides

Datenanalyse mit Hadoop Quelle: Apache Software Foundation Datenanalyse mit Hadoop Gideon Zenz

Gideon Zenz Frankfurter Entwicklertag 2014 19.02.2014 Datenanalyse mit Hadoop Quelle: Apache Software Foundation Datenanalyse mit Hadoop Gideon Zenz Frankfurter Entwicklertag 2014 Agenda Hadoop Intro Map/Reduce

234 views • 19 slides

Extension: Combiner Functions import org.apache.hadoop.io.IntWritable; import

10/6/2011 import java.io.IOException; import org.apache.hadoop.fs.Path; Extension: Combiner Functions import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import

177 views • 7 slides

Using Hadoop for Webscale Computing Ajay Anand Yahoo! aanand@yahoo-inc.com Usenix 2008 Agenda

Using Hadoop for Webscale Computing Ajay Anand Yahoo! aanand@yahoo-inc.com Usenix 2008 Agenda The Problem Solution Approach / Introduction to Hadoop HDFS File System Map Reduce Programming Pig Hadoop

569 views • 43 slides

Fault Tolerance, Replication, and Consistency 1 Motivation: Hadoop Cluster 2 Motivation:

Fault Tolerance, Replication, and Consistency 1 Motivation: Hadoop Cluster 2 Motivation: Hadoop Cluster Mostly retired desktops Intel Core 2: launched in 2008 Support is gathering old servers 3 Motivation: Hadoop Cluster Mostly retired

383 views • 35 slides

HADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

HADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap And Garrett Poppe Topics Create hadoopuser and group Edit sudoers Set up SSH Install JDK Install Hadoop Editting Hadoop

454 views • 42 slides

The Evolution of Hadoop at Spotify Rafal Wojdyla (rav@spotify.com) Josh Baer (jbx@spotify.com)

The Evolution of Hadoop at Spotify Rafal Wojdyla (rav@spotify.com) Josh Baer (jbx@spotify.com) @l_phant @ravwojdyla Technical Product Owner Data Engineer Hadoop Squad Hadoop Squad Overview Growing Pains Gaining Focus The

774 views • 51 slides

Unifying Heterogeneous Cray Unifying Heterogeneous Cray Resources and Systems into an

Unifying Heterogeneous Cray Unifying Heterogeneous Cray Resources and Systems into an Intelligent Single-scheduled Environment Scott Jackson Engineering Confidential and Proprietary Overview Introduction Heterogeneous Resources

690 views • 34 slides

HDFS Hadoop Distributed File System Motivation File Management Streaming Data Fault Tolerance

HDFS Hadoop Distributed File System Motivation File Management Streaming Data Fault Tolerance 1 Labs Run 227 October (four weeks) at these times: Monday 9am Monday 10am Tuesday 2pm Wednesday 10am Wednesday 2pm Thursday 9am Thursday

1.68k views • 34 slides

Minimum Number Of Nodes Minimum number of nodes in a binary tree whose height is h. At

Binary Tree Properties & Representation Minimum Number Of Nodes Minimum number of nodes in a binary tree whose height is h. At least one node at each of first h levels. minimum number of nodes is h Maximum Number Of Nodes Number

310 views • 5 slides

Sum-of-Product Datatypes in SML and triangles so that we can do things like calculate their

Mo;va;ng example: geometric figures Suppose we want to represent geometric figures like circles, rectangles, Sum-of-Product Datatypes in SML and triangles so that we can do things like calculate their perimeters, scale them, etc. (Dont worry

549 views • 6 slides

BART: Bayesian Additive Regression Trees Hugh Chipman, Acadia Edward George, Wharton, U of

BART: Bayesian Additive Regression Trees Hugh Chipman, Acadia Edward George, Wharton, U of Pennsylvania Robert McCulloch, U. of Chicago, Business School Thanks to Tim Swartz for laying out Bayesian basics. This is going to be a fully Bayesian

409 views • 30 slides

IP-Layer Soft Handoff Implementation in ILNP Ditchaphong (Dean) Phoomikiattisak, Saleem Bhatti

IP-Layer Soft Handoff Implementation in ILNP Ditchaphong (Dean) Phoomikiattisak, Saleem Bhatti School of Computer Science, University of St Andrews {dp32|saleem}@st-andrews.ac.uk Outline Goals Problems Overview of ILNP ILNPv6

592 views • 25 slides

A Local Approximation Algorithm for Maximum Weight Matching Tim Nieberg Research Institute for

A Local Approximation Algorithm for Maximum Weight Matching Tim Nieberg Research Institute for Discrete Mathematics University of Bonn Overview introduction LOCAL model for distributed communication networks locality of graph structures

402 views • 23 slides

Symposium in Honour of Lauri Hellas 60th birthday Tampere, Finland, 4-6 July 2018 On

Symposium in Honour of Lauri Hellas 60th birthday Tampere, Finland, 4-6 July 2018 On Fragments of Higher Order Logics that on Finite Structures Collapse to a Lower Order Jos e Mar a Turull-Torres Universidad Nacional de La

1.69k views • 154 slides