The Apache Hadoop Ecosystem Doug Cutting Cloudera & Apache - - PowerPoint PPT Presentation

the apache hadoop ecosystem
SMART_READER_LITE
LIVE PREVIEW

The Apache Hadoop Ecosystem Doug Cutting Cloudera & Apache - - PowerPoint PPT Presentation

The Apache Hadoop Ecosystem Doug Cutting Cloudera & Apache Context: exponential for decades! abundance of computing & storage generated data (8ZB in '15) peta-scale is now affordable (kMGTPEZY) petabytes


slide-1
SLIDE 1

The Apache Hadoop Ecosystem

Doug Cutting Cloudera & Apache

slide-2
SLIDE 2

Context: exponential for decades!

  • abundance of

○ computing & storage ○ generated data (8ZB in '15)

  • peta-scale is now affordable (kMGTPEZY)

○ petabytes ○ petahertz

  • traditional data tech doesn't scale well
  • more data provides greater value
  • time for a new approach
slide-3
SLIDE 3

New Hardware Approach

Traditional

  • exotic hardware

○ big central servers ○ SAN ○ RAID

  • hardware reliability
  • expensive
  • limited scalability

Big Data

  • commodity HW

○ racks of pizza boxes ○ Ethernet ○ JBOD

  • unreliable HW
  • cost effective
  • scales further
slide-4
SLIDE 4

New Software Approach

Traditional

  • monolithic

○ centralized storage ○ RDBMS

  • schema first
  • proprietary

Big Data

  • distributed

○ storage & compute nodes

  • raw data
  • open source
slide-5
SLIDE 5

The Ecosystem is the System

  • Hadoop has become the kernel

○ of the distributed operating system for Big Data ○ a de facto industry standard

  • No one uses the kernel alone
  • A collection of projects at Apache
slide-6
SLIDE 6

Open Source at Apache

  • no strategic agenda

○ quality is emergent

  • community based

○ diverse organizations collaborating voluntarily ○ decisions by consensus ○ transparent

  • allows competing projects

○ survival of fittest

  • a loose federation of projects

○ permits evolution

  • insures against vendor lock-in

○ can't buy Apache

slide-7
SLIDE 7

Typical adoption pattern

  • Idea that's impractical without Hadoop.
  • Build Hadoop-based proof of concept.
  • Move initial application to production.
  • Add more datasets and users.

○ removing silos in organizations ○ permitting easy experiments on real data Snowballs into institution's central repository for

  • analysis
  • data processing
slide-8
SLIDE 8

How can you use Hadoop?

  • What data are you ignoring?

○ How can you use it?

  • How can you combine your data with
  • thers?
slide-9
SLIDE 9

Thanks!

Questions? Visit Cloudera at booth 700.