Lecture 15.3 Hadoop! Toolchain EN 600.320/420 Instructor: Randal - PowerPoint PPT Presentation

Jun 15, 2023 •220 likes •303 views

Lecture 15.3 Hadoop! Toolchain EN 600.320/420 Instructor: Randal Burns 4 April 2018 Department of Computer Science, Johns Hopkins University The Hadoop Tool Chain The command line tool chain Build files into directory Construct java

Lecture 15.3 Hadoop! Toolchain EN 600.320/420 Instructor: Randal Burns 4 April 2018 Department of Computer Science, Johns Hopkins University
The Hadoop Tool Chain  The command line tool chain – Build files into directory – Construct java archive (jar) – Point Hadoop! at the jar  Many prefer to use Eclipse instead Lecture 15: Map/Reduce Part 2
Hadoop! Configurations  Hadoop! is a heterogeneous, distributed system – Many components: namenode, hdfs, reporting – Parallelization (mappers, reducers, shuffle, loading) – Typically involves managing a cluster  But can run in several simpler ways – Pseudo-distributed (full runtime on one machine) – Fully distributed (on a cluster)  Running on pre-configured clusters – Specify size and types of nodes – Launch a compiled JAVA jar file or streaming scripts – AWS, Azure, Joyent, IBM, RackSpace – Metaservices: Cloudera Lecture 15: Map/Reduce Part 2
Hadoop! Streaming  Given arbitrary string processing functions to the Hadoop! Environment – A map script and a reduce script  Almost equivalent to: – cat inputdir/* | mapper.py | sort | reducer.py Lecture 15: Map/Reduce Part 2
Streaming and Sorting  Streaming mode in Hadoop! Gives a different sorting guarantee – Recall: cat inputdir/* | mapper.py | sort | reducer.py  Why?  Same or different semantics?  Any performance implications? Lecture 15: Map/Reduce Part 2
Streaming and Sorting  Streaming mode in Hadoop! Gives a different sorting guarantee – Recall: cat inputdir/* | mapper.py | sort | reducer.py  Why? – There is no schema – So, it sorts the whole output of mapper.py as a key – This is more restrictive than the default sort – And, thus, less efficient Lecture 15: Map/Reduce Part 2
Map/Reduce Recast (8 y.o. #s)  Scanning engine – Use massive parallelism to look at large data sets  Performance on 100 TB data sets – 1 node @ 50 MB/s (STR of disk) = 23 days – 1000 nodes = 33 minutes  Batch Processing – Not real-time/user facing  Large production environments – Not useful on small scales – Too much overhead on small jobs Lecture 15: Map/Reduce Part 2

Recommend

SAS Data Loader for Hadoop Agenda Intro What is Hadoop? What do I get from Hadoop?

SAS Data Loader for Hadoop Agenda Intro What is Hadoop? What do I get from Hadoop? Hadoop components Why SAS Data Loader for Hadoop? SAS Data Loader for Hadoop overview Demo Introduction Doug Cutting, creator of Hadoop

285 views • 11 slides

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management Andre Luckow,

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management Andre Luckow, Ioannis Paraskevakos, George Chantzialexiou and Shantenu Jha Hadoop on HPC: Integrating Hadoop and Pilot- based Dynamic Resource Management Overview

322 views • 17 slides

COMP9313: Big Data Management Hadoop and HDFS Hadoop Apache Hadoop is an open-source

COMP9313: Big Data Management Hadoop and HDFS Hadoop Apache Hadoop is an open-source software framework that Stores big data in a distributed manner Processes big data parallelly Builds on large clusters of commodity hardware.

2.93k views • 60 slides

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Open Source FPGA Toolchain Vincent Gatine Introduction Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine EPITA July 15, 2015 Vincent Gatine (EPITA) Open Source FPGA Toolchain July 15, 2015 1

269 views • 25 slides

A free toolchain for 0.01 - computers The free toolchain for the Padauk 8-bit microcontrollers

A free toolchain for 0.01 - computers The free toolchain for the Padauk 8-bit microcontrollers Philipp Klaus Krause February 2, 2020 Table of Contents 1 The Padauk C 2 Free Hardware 3 Small Device C Compiler 4 TODO Table of Contents 1 The

747 views • 22 slides

Chips4Makers Toolchain Is an ASIC made with fully open source tool chain possible ? Is it

Chips4Makers Toolchain Is an ASIC made with fully open source tool chain possible ? Is it affordable ? Staf Verhaegen Overview Chips want to be free I have a dream Pilot Project: Retro-uC ASIC toolchain PCB toolchain

370 views • 25 slides

BY SRIJHA REDDY GANGIDI What is Hadoop ? Evolution of Hadoop Created by dough cutting, a part

BY SRIJHA REDDY GANGIDI What is Hadoop ? Evolution of Hadoop Created by dough cutting, a part of Apache project. Hadoop Architecture Ambari Ambari offers a Web-based GUI with wizard scripts for setting up clusters with most of the standard

436 views • 18 slides

Spark and Hadoop at Yahoo: Brought to you by YARN Andy Feng Yahoo! Hadoop (afeng@yahoo-inc.com)

Spark and Hadoop at Yahoo: Brought to you by YARN Andy Feng Yahoo! Hadoop (afeng@yahoo-inc.com) Personalized Web Big-Data in Yahoo! 3 9/10/13 Hadoop + Spark: Empowered by YARN 30k+ Yahoo! production nodes on YARN since Q1 2013 Shark

421 views • 12 slides

HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc.

HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc. Yahoo! Yahoo! 1 Outline Overview of Hadoop, an open source project Design of HDFS On going work Yahoo! 2 Hadoop Hadoop provides

295 views • 25 slides

Apache Hadoop 3.x State of The Union and Upgrade Guidance Wei-Chiu Chuang Wangda Tan

Apache Hadoop 3.x State of The Union and Upgrade Guidance Wei-Chiu Chuang Wangda Tan @Cloudera, Sr. Manager, Compute Platform Apache Hadoop PMC @Cloudera, Apache Hadoop PMC Agenda Hadoop Community Updates & Overview Updates

872 views • 47 slides

Hadoop Jrg Mllenkamp Principal Field Technologist Sun Microsystems Agenda Introduction

Hadoop Jrg Mllenkamp Principal Field Technologist Sun Microsystems Agenda Introduction CMT+Hadoop Solaris+Hadoop Sun Grid Engine+Hadoop Introduction Im ... Jrg Mllenkamp better known as c0t0d0s0.org Sun Employee

1.37k views • 103 slides

Big Data with R and Hadoop Jamie F Olson June 11, 2015 ; R and Hadoop Review various tools

Big Data with R and Hadoop Jamie F Olson June 11, 2015 ; R and Hadoop Review various tools for leveraging Hadoop from R. MapReduce Spark Hive/Impala Revolution R . . . . . . . . . . . . . . . . . . . . . . . . . .

565 views • 52 slides

Working With Hadoop Mostly based on Tom Whites book Hadoop: Now that we covered the

Working With Hadoop Mostly based on Tom Whites book Hadoop: Now that we covered the basics of The Definitive Guide, 3 rd edition MapReduce , lets look at some Hadoop specifics. Note: We will use the new

181 views • 6 slides

Datenanalyse mit Hadoop Quelle: Apache Software Foundation Datenanalyse mit Hadoop Gideon Zenz

Gideon Zenz Frankfurter Entwicklertag 2014 19.02.2014 Datenanalyse mit Hadoop Quelle: Apache Software Foundation Datenanalyse mit Hadoop Gideon Zenz Frankfurter Entwicklertag 2014 Agenda Hadoop Intro Map/Reduce

234 views • 19 slides

Extension: Combiner Functions import org.apache.hadoop.io.IntWritable; import

10/6/2011 import java.io.IOException; import org.apache.hadoop.fs.Path; Extension: Combiner Functions import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import

177 views • 7 slides

OPEN SOURCE FPGA TOOLCHAIN WHY IF VIVADO AND QUARTUS ARE FREE ANYWAY WHOAMI Open

OPEN SOURCE FPGA TOOLCHAIN WHY IF VIVADO AND QUARTUS ARE FREE ANYWAY WHOAMI Open Source Evangelist T eam: Clifgord Daniel Edmund WHAT DO WE HAVE: FPGA TOOLCHAIN Verilog Sources Synthesis Script IceStrom .TXT File

751 views • 29 slides

Dynamic, Metamorphic (and opensource) Virtual Machines A. Desnos ESIEA - Operational Cryptology

Introduction Obfuscation Virtual Machines Android/Java appplications Conclusion Dynamic, Metamorphic (and opensource) Virtual Machines A. Desnos ESIEA - Operational Cryptology and Virology Laboratory (CVO) 38 rue des Dr Calmette et Gurin,

1.21k views • 75 slides

Whoops! Where did my architecture go? Approaches to architecture management for Java and Spring

Whoops! Where did my architecture go? Approaches to architecture management for Java and Spring applications Oliver Gierke Oliver Gierke SpringSource Engineer Spring Data ogierke@vmware.com olivergierke www.olivergierke.de Background 5

1.05k views • 67 slides

Java Programming Manuel Oriol, March 22nd, 2007 Goal Teach Java to proficient programmers 2

Java Programming Manuel Oriol, March 22nd, 2007 Goal Teach Java to proficient programmers 2 Roadmap Java Basics Eclipse Java GUI Threads and synchronization Class loading and reflection Java Virtual

746 views • 46 slides

Libraries In C++ its possible to create static libraries and shared libraries Static

Libraries In C++ its possible to create static libraries and shared libraries Static libraries (end in .a) are combined/linked into an executable Executables are large. If library is updated in a binary compatible way, programs

298 views • 8 slides

Projections A Performance Tool for Charm++ Applications Chee Wai Lee PPL, UIUC Projections

Projections A Performance Tool for Charm++ Applications Chee Wai Lee PPL, UIUC Projections Tutorial 1 Visit us at http://charm.cs.uiuc.edu Tutorial Outline General Introduction Instrumentation Trace Generation Performance

981 views • 40 slides

2 - Java 2 Micro Edition F. Ricci 2008/ 2009 Content Mobile applications Why Java on

Mobile Services 2 - Java 2 Micro Edition F. Ricci 2008/ 2009 Content Mobile applications Why Java on mobile devices Three main Java environments Java 2 Micro Edition Configurations and profiles Optional packages

1.06k views • 67 slides

Compressed RDF: Practical Uses & Hands-on Antonio Faria, Javier D. Fernndez and Miguel

Compressed RDF: Practical Uses & Hands-on Antonio Faria, Javier D. Fernndez and Miguel A. Martinez-Prieto 3rd KEYSTONE Training School Keyword search in Big Linked Data 23 TH AUGUST 2017 General agenda Session I (09:00 - 10:30) "

935 views • 60 slides

Java Swing 2020/3/21 Java Swing Used to create Window-based applications Part of Java

Poly- mor- phism Abstra ction Class OOP Inheri -tance En- capsu- lation Kuan-Ting Lai Java Swing 2020/3/21 Java Swing Used to create Window-based applications Part of Java Foundation Classes (JFC) Platform-independent

889 views • 21 slides