QCon London 2012
Hadoop: Scalable Infrastructure for Big Data QCon London 2012 - - PowerPoint PPT Presentation
Hadoop: Scalable Infrastructure for Big Data QCon London 2012 - - PowerPoint PPT Presentation
Hadoop: Scalable Infrastructure for Big Data QCon London 2012 Parand Tony Darugar Founder and CEO, Xpenser parand@xpenser.com QCon London 2012 What is Hadoop? QCon London 2012 Hadoop is the Linux of Big Data Processing QCon London 2012
QCon London 2012
What is Hadoop?
QCon London 2012
Hadoop is the Linux
- f Big Data Processing
QCon London 2012
Infrastructure
for
Large Scale Computation & Data Processing
- n a network of
Commodity Hardware.
QCon London 2012
Why Hadoop?
QCon London 2012
Scale
QCon London 2012
Cost
QCon London 2012
Freedom
QCon London 2012
Does Anyone Use Hadoop?
QCon London 2012
IBM VISA Microsoft Facebook Yahoo AOL ... eHarmony Zion's bank NY Times Twitter eBay LinkedIn ...
QCon London 2012
Alternatives
Build your own Get creative with RDBMS architecture
QCon London 2012
What's the idea?
QCon London 2012
Commodity Hardware Distributed Operation
QCon London 2012
Wisdom: Embrace Failure (hardware) Be Resilient (software)
QCon London 2012
What's in the box?
QCon London 2012
Hadoop Distributed File System
QCon London 2012
Distributed Computation Framework
QCon London 2012
Map-Reduce Programming Model
QCon London 2012
HDFS
- Your data in triplicate
- Built-in resiliency to
large scale failures
- Intelligent Data Distribution
- Very large data sizes
QCon London 2012
Distributed Computation
- Built-in resiliency to
large scale failures
- Distribute work to workers,
collect results from fastest
- Move computation to data
(not data to computation)
QCon London 2012
Map Reduce
Very simple programming model: Map(anything)->key, value Sort, partition on key Reduce(key,value)->key, value No parallel processing or message passing semantics Programmable in Java or any other language (streaming)
QCon London 2012
Ecosystem
HBase: NoSQL BigTable clone Hive: Somewhat-SQL data store Pig: SQL-like programming model Chukwa, Scribe, Mahoot, Cassandra, Oozie, Sqoop, ...
QCon London 2012
Commercial Support
Cloudera HortonWorks IBM ...
QCon London 2012
How?
Try it in non-distributed mode Try it on a few spare machines Try it on EC2 Try it! http://hadoop.apache.org/
QCon London 2012
Case Studies
QCon London 2012
eHarmony
QCon London 2012
Biz360 (Attensity)
QCon London 2012
Yahoo!
QCon London 2012
You!
QCon London 2012
Start with ETL
QCon London 2012
Start with batch, non time-critical tasks
QCon London 2012
Start with storing your large data
- n HDFS
QCon London 2012
Move batch processing to Hadoop Serve from RDBMS
QCon London 2012
- Embrace. Be One
With The Hadoop.
QCon London 2012