Hadoop: Scalable Infrastructure for Big Data QCon London 2012 - PowerPoint PPT Presentation

Nov 06, 2022 •239 likes •598 views

Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser parand@xpenser.com QCon London 2012 What is Hadoop? QCon London 2012 Hadoop is the Linux of Big Data Processing QCon London 2012

Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser parand@xpenser.com QCon London 2012
What is Hadoop? QCon London 2012
Hadoop is the Linux of Big Data Processing QCon London 2012
Infrastructure for Large Scale Computation & Data Processing on a network of Commodity Hardware. QCon London 2012
Why Hadoop? QCon London 2012
Scale QCon London 2012
Cost QCon London 2012
Freedom QCon London 2012
Does Anyone Use Hadoop? QCon London 2012
eHarmony IBM Zion's bank VISA NY Times Microsoft Twitter Facebook eBay Yahoo LinkedIn AOL ... ... QCon London 2012
Alternatives Build your own Get creative with RDBMS architecture QCon London 2012
What's the idea? QCon London 2012
Commodity Hardware Distributed Operation QCon London 2012
Wisdom: Embrace Failure (hardware) Be Resilient (software) QCon London 2012
What's in the box? QCon London 2012
Hadoop Distributed File System QCon London 2012
Distributed Computation Framework QCon London 2012
Map-Reduce Programming Model QCon London 2012
HDFS ● Your data in triplicate ● Built-in resiliency to large scale failures ● Intelligent Data Distribution ● Very large data sizes QCon London 2012
Distributed Computation ● Built-in resiliency to large scale failures ● Distribute work to workers, collect results from fastest ● Move computation to data (not data to computation) QCon London 2012
Map Reduce Very simple programming model: Map(anything)->key, value Sort, partition on key Reduce(key,value)->key, value No parallel processing or message passing semantics Programmable in Java or any other language (streaming) QCon London 2012
Ecosystem HBase: NoSQL BigTable clone Hive: Somewhat-SQL data store Pig: SQL-like programming model Chukwa, Scribe, Mahoot, Cassandra, Oozie, Sqoop, ... QCon London 2012
Commercial Support Cloudera HortonWorks IBM ... QCon London 2012
How? Try it in non-distributed mode Try it on a few spare machines Try it on EC2 Try it! http://hadoop.apache.org/ QCon London 2012
Case Studies QCon London 2012
eHarmony QCon London 2012
Biz360 (Attensity) QCon London 2012
Yahoo! QCon London 2012
You! QCon London 2012
Start with ETL QCon London 2012
Start with batch, non time-critical tasks QCon London 2012
Start with storing your large data on HDFS QCon London 2012
Move batch processing to Hadoop Serve from RDBMS QCon London 2012
Embrace. Be One With The Hadoop. QCon London 2012
Questions? Parand Tony Darugar parand@xpenser.com QCon London 2012

Recommend

SAS Data Loader for Hadoop Agenda Intro What is Hadoop? What do I get from Hadoop?

SAS Data Loader for Hadoop Agenda Intro What is Hadoop? What do I get from Hadoop? Hadoop components Why SAS Data Loader for Hadoop? SAS Data Loader for Hadoop overview Demo Introduction Doug Cutting, creator of Hadoop

285 views • 11 slides

COMP9313: Big Data Management Hadoop and HDFS Hadoop Apache Hadoop is an open-source

COMP9313: Big Data Management Hadoop and HDFS Hadoop Apache Hadoop is an open-source software framework that Stores big data in a distributed manner Processes big data parallelly Builds on large clusters of commodity hardware.

2.93k views • 60 slides

Big Data with R and Hadoop Jamie F Olson June 11, 2015 ; R and Hadoop Review various tools

Big Data with R and Hadoop Jamie F Olson June 11, 2015 ; R and Hadoop Review various tools for leveraging Hadoop from R. MapReduce Spark Hive/Impala Revolution R . . . . . . . . . . . . . . . . . . . . . . . . . .

565 views • 52 slides

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management Andre Luckow,

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management Andre Luckow, Ioannis Paraskevakos, George Chantzialexiou and Shantenu Jha Hadoop on HPC: Integrating Hadoop and Pilot- based Dynamic Resource Management Overview

322 views • 17 slides

Big Data processing with Hadoop Luca Pireddu CRS4Distributed Computing Group April 18, 2012

Big Data processing with Hadoop Luca Pireddu CRS4Distributed Computing Group April 18, 2012 luca.pireddu@crs4.it (CRS4) Big Data processing with Hadoop April 18, 2012 1 / 44 Outline Motivation 1 Big Data Parallelizing Big Data

755 views • 49 slides

Hadoop Dr. Mihail Content derived from: Ankam, Venkat. Big Data Analytics. Packt Publishing,

Hadoop Dr. Mihail Content derived from: Ankam, Venkat. Big Data Analytics. Packt Publishing, 2016. July 9, 2019 (Dr. Mihail ) Intro Big Data July 9, 2019 1 / 22 Apache Hadoop What is it? Apache Hadoop is a software framework that enables

869 views • 22 slides

Spark and Hadoop at Yahoo: Brought to you by YARN Andy Feng Yahoo! Hadoop (afeng@yahoo-inc.com)

Spark and Hadoop at Yahoo: Brought to you by YARN Andy Feng Yahoo! Hadoop (afeng@yahoo-inc.com) Personalized Web Big-Data in Yahoo! 3 9/10/13 Hadoop + Spark: Empowered by YARN 30k+ Yahoo! production nodes on YARN since Q1 2013 Shark

421 views • 12 slides

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data Analytics Analysis Big Data Big Value Real world Question Data Model Conclusion Machine Learning Use real data to train a model, which can

625 views • 27 slides

BY SRIJHA REDDY GANGIDI What is Hadoop ? Evolution of Hadoop Created by dough cutting, a part

BY SRIJHA REDDY GANGIDI What is Hadoop ? Evolution of Hadoop Created by dough cutting, a part of Apache project. Hadoop Architecture Ambari Ambari offers a Web-based GUI with wizard scripts for setting up clusters with most of the standard

436 views • 18 slides

HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc.

HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc. Yahoo! Yahoo! 1 Outline Overview of Hadoop, an open source project Design of HDFS On going work Yahoo! 2 Hadoop Hadoop provides

295 views • 25 slides

Apache Hadoop 3.x State of The Union and Upgrade Guidance Wei-Chiu Chuang Wangda Tan

Apache Hadoop 3.x State of The Union and Upgrade Guidance Wei-Chiu Chuang Wangda Tan @Cloudera, Sr. Manager, Compute Platform Apache Hadoop PMC @Cloudera, Apache Hadoop PMC Agenda Hadoop Community Updates & Overview Updates

872 views • 47 slides

Hadoop Jrg Mllenkamp Principal Field Technologist Sun Microsystems Agenda Introduction

Hadoop Jrg Mllenkamp Principal Field Technologist Sun Microsystems Agenda Introduction CMT+Hadoop Solaris+Hadoop Sun Grid Engine+Hadoop Introduction Im ... Jrg Mllenkamp better known as c0t0d0s0.org Sun Employee

1.37k views • 103 slides

Working With Hadoop Mostly based on Tom Whites book Hadoop: Now that we covered the

Working With Hadoop Mostly based on Tom Whites book Hadoop: Now that we covered the basics of The Definitive Guide, 3 rd edition MapReduce , lets look at some Hadoop specifics. Note: We will use the new

181 views • 6 slides

Datenanalyse mit Hadoop Quelle: Apache Software Foundation Datenanalyse mit Hadoop Gideon Zenz

Gideon Zenz Frankfurter Entwicklertag 2014 19.02.2014 Datenanalyse mit Hadoop Quelle: Apache Software Foundation Datenanalyse mit Hadoop Gideon Zenz Frankfurter Entwicklertag 2014 Agenda Hadoop Intro Map/Reduce

234 views • 19 slides

Extension: Combiner Functions import org.apache.hadoop.io.IntWritable; import

10/6/2011 import java.io.IOException; import org.apache.hadoop.fs.Path; Extension: Combiner Functions import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import

177 views • 7 slides

Scalable Data Science with Hadoop, Spark and R Mario Inchiosa, PhD Principal Software Engineer

Scalable Data Science with Hadoop, Spark and R Mario Inchiosa, PhD Principal Software Engineer Microsoft Data Group DSC 2016 July 2, 2016 Microsoft R Server Cloud Hadoop & Spark R Server portfolio R Server Technology EDW RDBMS

475 views • 23 slides

CS520 Data Integration, Warehousing, and Provenance 7. Big Data Systems and Integration IIT

CS520 Data Integration, Warehousing, and Provenance 7. Big Data Systems and Integration IIT DBGroup Boris Glavic http://www.cs.iit.edu/~glavic/ http://www.cs.iit.edu/~cs520/ http://www.cs.iit.edu/~dbgroup/ Outline 0) Course Info 1)

781 views • 54 slides

Algorithms for Big Data (III) Chihao Zhang Shanghai Jiao Tong University Sept. 29, 2019

Algorithms for Big Data (III) Chihao Zhang Shanghai Jiao Tong University Sept. 29, 2019 Algorithms for Big Data (III) 1/16 We introduced the notion of universal families of Hash functions. Review of the Last Lecture Last time, we proved a

666 views • 62 slides

Partnerships across industry, academia, nonprofits, and government Meredith M. Lee Executive

Partnerships across industry, academia, nonprofits, and government Meredith M. Lee Executive Director | West Big Data Innovation Hub NIWR Annual Meeting | Feb 27, 2017 2012: US Federal Big Data R&D Initiative $200M 2014: 2015: 2016:

1.1k views • 14 slides

Data-Intensive Distributed Computing CS 431/631 451/651 (Fall 2019) Part 2: From MapReduce to

Data-Intensive Distributed Computing CS 431/631 451/651 (Fall 2019) Part 2: From MapReduce to Spark (1/2) September 19, 2019 Ali Abedi These slides are available at http://roegiest.com/bigdata-2019w/ This work is licensed under a Creative

437 views • 27 slides

Compact Multi-Signatures for Smaller Blockchains Dan Boneh 1 , Manu Drijvers 2 , Gregory Neven 2 1

Compact Multi-Signatures for Smaller Blockchains Dan Boneh 1 , Manu Drijvers 2 , Gregory Neven 2 1 Stanford University 2 DFINITY Bitcoin Blockchain and transactions Input 1 Output 1 Witness Input 2 Output 2 Witness Pointer to previous

372 views • 16 slides

Bitcoin Anonymity Mike Fleder Mike Kester Sudeep Pillai "Voodah"

Bitcoin Anonymity Mike Fleder Mike Kester Sudeep Pillai "Voodah" "darkskypoet" 1G6EQwiAfTVyTpK4j3XZ65CvonjDGrPsQ 1QEZohXPbh4ywbzPJATjMBDnSjJsZrZtQ1 Goal & Results Attacker Goal Tie real name to

331 views • 13 slides

The Changing Nature of Cryptocurrencies: Bitcoin and Its Copies During Their Cloning Thomas H. A.

Introduction Tested hypothesis Econometric approach Results Robustness Conclusion The Changing Nature of Cryptocurrencies: Bitcoin and Its Copies During Their Cloning Thomas H. A. Joubert Panth eon Assas University Departement of

394 views • 21 slides

Bitcoins Chester Rebeiro Assistant Professor Department of Computer Science and Engineering IIT

Bitcoins Chester Rebeiro Assistant Professor Department of Computer Science and Engineering IIT Madras Traditional Currencies Alice gives bill to Bob, Bob gives coffee to Alice CR 2 Characteristics of Paper Money No double spending

961 views • 54 slides