Tutorial: HBase Theory and Practice of a Distributed Data Store - PowerPoint PPT Presentation

Tutorial: HBase Theory and Practice of a Distributed Data Store Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Tutorial: HBase 1 / 102

Introduction Introduction Pietro Michiardi (Eurecom) Tutorial: HBase 2 / 102

Introduction RDBMS Why yet another storage architecture? Relational Databse Management Systems (RDBMS) : ◮ Around since 1970s ◮ Countless examples in which they actually do make sense The dawn of Big Data : ◮ Previously: ignore data sources because no cost-effective way to store everything ⋆ One option was to prune, by retaining only data for the last N days ◮ Today: store everything! ⋆ Pruning fails in providing a base to build useful mathematical models Pietro Michiardi (Eurecom) Tutorial: HBase 3 / 102

Introduction RDBMS Batch processing Hadoop and MapReduce : ◮ Excels at storing (semi- and/or un-) structured data ◮ Data interpretation takes place at analysis-time ◮ Flexibility in data classification Batch processing: A complement to RDBMS : ◮ Scalable sink for data, processing launched when time is right ◮ Optimized for large file storage ◮ Optimized for “streaming” access Random Access : ◮ Users need to “interact” with data, especially that “crunched” after a MapReduce job ◮ This is historically where RDBMS excel: random access for structured data Pietro Michiardi (Eurecom) Tutorial: HBase 4 / 102

Introduction Column-Oriented DB Column-Oriented Databases Data layout : ◮ Save their data grouped by columns ◮ Subsequent column values are stored contiguously on disk ◮ This is substantially different from traditional RDBMS, which save and store data by row Specialized databases for specific workloads : ◮ Reduced I/O ◮ Better suited for compression → Efficient use of bandwidth ⋆ Indeed, column values are often very similar and differ little row-by-row ◮ Real-time access to data Important NOTE : ◮ HBase is not a column-oriented DB in the typical term ◮ HBase uses an on-disk column storage format ◮ Provides key-based access to specific cell of data, or a sequential range of cells Pietro Michiardi (Eurecom) Tutorial: HBase 5 / 102

Introduction Column-Oriented DB Column-Oriented and Row-Oriented storage layouts �� Figure: Example of Storage Layouts �� Pietro Michiardi (Eurecom) Tutorial: HBase 6 / 102 ��

Introduction The problem with RDBMS The Problem with RDBMS RDBMS are still relevant ◮ Persistence layer for frontend application ◮ Store relational data ◮ Works well for a limited number of records Example: Hush ◮ Used throughout this course ◮ URL shortener service Let’s see the “scalability story” of such a service ◮ Assumption: service must run with a reasonable budget Pietro Michiardi (Eurecom) Tutorial: HBase 7 / 102

Tutorial: HBase Theory and Practice of a Distributed Data Store - PowerPoint PPT Presentation

Tutorial: HBase Theory and Practice of a Distributed Data Store Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Tutorial: HBase 1 / 102 Introduction Introduction Pietro Michiardi (Eurecom) Tutorial: HBase 2 / 102 Introduction RDBMS

Apache HBase Deploys Michael Stack GOTO Amsterdam 2011 Me Chair of Apache HBase Project

HBase @ Facebook The Technology Behind Messages (and more ) Kannan Muthukkaruppan Software

S2Graph : A large-scale graph database with Hbase Reference 1. HBase Conference 2015

Apache HBase, the Scaling Machine Jean-Daniel Cryans Software Engineer at Cloudera @jdcryans

Tutorial Tutorial A2 is out, its called Inpainting Tutorial Tutorial A2 is out, its called

Advanced HBase Schema Design Berlin Buzzwords, June 2012 Lars

Scaling up HBase Mahdi Roozbahani Lecturer, Computational Science and Engineering, Georgia Tech

A GAMS TUTORIAL A GAMS TUTORIAL A GAMS TUTORIAL WHAT IS GAMS ? General Algebraic Modeling

Excel Tutorial 1 Getting Started with Excel Tutorial 2 Formatting a Workbook Tutorial 3

PROGRAMMING TUTORIAL Thierry Lepley, April 4 th 2016 TUTORIAL GOAL Intermediate Tutorial for

Do Fifty- Two Motivation Overview of the Language

UPPAAL Tutorial UPPAAL Tutorial UPPAAL Tutorial Introduction Introduction Alexandre David

PowerPoint Tutorial 1 Creating a Presentation Tutorial 2 Applying and Modifying Text and

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Comp 1402 Winter 2008 Tutorial #1 Tutorial 1 The objectives of this tutorial will be:

DNA Replication and Repair http://hyperphysics.phy-astr.gsu.edu/hbase/organic/imgorg/cendog.gif

Blockchain Enabled Distributed Data Management A Vision Furqan Baig , Fusheng Wang Stony Brook

Transaction Processing in Distributed Database Systems Dr Janusz R. Getta School of Computing

Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2013/14 Lecture VI:

Distributed Computing on PostgreSQL Marco Slot <marco@citusdata.com> Small data

Beyond Named Function Networking <christian.tschudin@unibas.ch> ICN2016

Course Presentation Distributed Database Systems A Critique of ANSI SQL Isolation Levels

Guest Lecture Daniel Dao & Nick Buroojy OVERVIEW What is Civitas Learning What We Do

Introduction to Database Systems: CS312 Oliver Bonham-Carter 31 August 2020 1 / 19 Meeting

Tutorial: HBase Theory and Practice of a Distributed Data Store - PowerPoint PPT Presentation

Tutorial: HBase Theory and Practice of a Distributed Data Store Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Tutorial: HBase 1 / 102 Introduction Introduction Pietro Michiardi (Eurecom) Tutorial: HBase 2 / 102 Introduction RDBMS

Apache HBase Deploys Michael Stack GOTO Amsterdam 2011 Me Chair of Apache HBase Project

HBase @ Facebook The Technology Behind Messages (and more ) Kannan Muthukkaruppan Software

S2Graph : A large-scale graph database with Hbase Reference 1. HBase Conference 2015

Apache HBase, the Scaling Machine Jean-Daniel Cryans Software Engineer at Cloudera @jdcryans

Tutorial Tutorial A2 is out, its called Inpainting Tutorial Tutorial A2 is out, its called

Advanced HBase Schema Design Berlin Buzzwords, June 2012 Lars

Scaling up HBase Mahdi Roozbahani Lecturer, Computational Science and Engineering, Georgia Tech

A GAMS TUTORIAL A GAMS TUTORIAL A GAMS TUTORIAL WHAT IS GAMS ? General Algebraic Modeling

Excel Tutorial 1 Getting Started with Excel Tutorial 2 Formatting a Workbook Tutorial 3

PROGRAMMING TUTORIAL Thierry Lepley, April 4 th 2016 TUTORIAL GOAL Intermediate Tutorial for

Do Fifty- Two Motivation Overview of the Language

UPPAAL Tutorial UPPAAL Tutorial UPPAAL Tutorial Introduction Introduction Alexandre David

PowerPoint Tutorial 1 Creating a Presentation Tutorial 2 Applying and Modifying Text and

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Comp 1402 Winter 2008 Tutorial #1 Tutorial 1 The objectives of this tutorial will be:

DNA Replication and Repair http://hyperphysics.phy-astr.gsu.edu/hbase/organic/imgorg/cendog.gif

Blockchain Enabled Distributed Data Management A Vision Furqan Baig , Fusheng Wang Stony Brook

Transaction Processing in Distributed Database Systems Dr Janusz R. Getta School of Computing

Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2013/14 Lecture VI:

Distributed Computing on PostgreSQL Marco Slot &lt;marco@citusdata.com&gt; Small data

Beyond Named Function Networking &lt;christian.tschudin@unibas.ch&gt; ICN2016

Course Presentation Distributed Database Systems A Critique of ANSI SQL Isolation Levels

Guest Lecture Daniel Dao &amp; Nick Buroojy OVERVIEW What is Civitas Learning What We Do

Introduction to Database Systems: CS312 Oliver Bonham-Carter 31 August 2020 1 / 19 Meeting

Distributed Computing on PostgreSQL Marco Slot <marco@citusdata.com> Small data

Beyond Named Function Networking <christian.tschudin@unibas.ch> ICN2016

Guest Lecture Daniel Dao & Nick Buroojy OVERVIEW What is Civitas Learning What We Do