the apache hadoop ecosystem
play

The Apache Hadoop Ecosystem Doug Cutting Cloudera & Apache - PowerPoint PPT Presentation

The Apache Hadoop Ecosystem Doug Cutting Cloudera & Apache Context: exponential for decades! abundance of computing & storage generated data (8ZB in '15) peta-scale is now affordable (kMGTPEZY) petabytes


  1. The Apache Hadoop Ecosystem Doug Cutting Cloudera & Apache

  2. Context: exponential for decades! ● abundance of ○ computing & storage ○ generated data (8ZB in '15) ● peta-scale is now affordable (kMGTPEZY) ○ petabytes ○ petahertz ● traditional data tech doesn't scale well ● more data provides greater value ● time for a new approach

  3. New Hardware Approach Traditional Big Data ● exotic hardware ● commodity HW ○ big central servers ○ racks of pizza boxes ○ SAN ○ Ethernet ○ RAID ○ JBOD ● hardware reliability ● unreliable HW ● expensive ● cost effective ● limited scalability ● scales further

  4. New Software Approach Traditional Big Data ● monolithic ● distributed ○ centralized storage ○ storage & compute ○ RDBMS nodes ● schema first ● raw data ● proprietary ● open source

  5. The Ecosystem is the System ● Hadoop has become the kernel ○ of the distributed operating system for Big Data ○ a de facto industry standard ● No one uses the kernel alone ● A collection of projects at Apache

  6. Open Source at Apache ● no strategic agenda ○ quality is emergent ● community based ○ diverse organizations collaborating voluntarily ○ decisions by consensus ○ transparent ● allows competing projects ○ survival of fittest ● a loose federation of projects ○ permits evolution ● insures against vendor lock-in ○ can't buy Apache

  7. Typical adoption pattern ● Idea that's impractical without Hadoop. ● Build Hadoop-based proof of concept. ● Move initial application to production. ● Add more datasets and users. ○ removing silos in organizations ○ permitting easy experiments on real data Snowballs into institution's central repository for ● analysis ● data processing

  8. How can you use Hadoop? ● What data are you ignoring? ○ How can you use it? ● How can you combine your data with others?

  9. Thanks! Questions? Visit Cloudera at booth 700.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend