HBase on top of HDFS Seminar Software Systems Engineering - - PowerPoint PPT Presentation

hbase on top of hdfs
SMART_READER_LITE
LIVE PREVIEW

HBase on top of HDFS Seminar Software Systems Engineering - - PowerPoint PPT Presentation

HBase on top of HDFS Seminar Software Systems Engineering "Mobile, Security, Cloud Computing" Kevin Bckler Institut fr Telematik February 19, 2015 Kevin Bckler February 19, 2015 1 Outline 1. Introduction 2. Distributed


slide-1
SLIDE 1

HBase on top of HDFS

Seminar Software Systems Engineering "Mobile, Security, Cloud Computing" Kevin Böckler

Institut für Telematik

February 19, 2015

Kevin Böckler February 19, 2015 1

slide-2
SLIDE 2

Outline

  • 1. Introduction
  • 2. Distributed File Systems
  • 3. HBase
  • 4. Application

Kevin Böckler February 19, 2015 2

slide-3
SLIDE 3

Storage in Cloud Computing

Requirements

◮ Millions of users ◮ Realtime access ◮ Reduce loss of data

Use cases

◮ Cloud Storage ◮ Collaborative Platforms ◮ Social Platforms ◮ Messengers

Kevin Böckler February 19, 2015 3

slide-4
SLIDE 4

Hadoop implementation

HBase = Hadoop implementation of a database HDFS = Hadoop Distributed File System

Kevin Böckler February 19, 2015 4

slide-5
SLIDE 5

Hadoop implementation

HBase = Hadoop implementation of a database HDFS = Hadoop Distributed File System

Kevin Böckler February 19, 2015 4

slide-6
SLIDE 6
  • 1. Introduction
  • 2. Distributed File Systems
  • 3. HBase
  • 4. Application

Kevin Böckler February 19, 2015 5

slide-7
SLIDE 7

HDFS

Figure: Participants in a HDFS Cluster

Kevin Böckler February 19, 2015 6

slide-8
SLIDE 8

Properties of HDFS

Scalability

◮ Multiple Nodes distributed ◮ NameNode for Metadata, DataNode for actual payload ◮ ”Moving Computation is Cheaper than Moving Data”

Transparency

◮ UNIX paths (/files/seminar/hbase.pdf)

Location Transparency Location Independency

◮ Hidden replication ◮ Hidden fail-over

Kevin Böckler February 19, 2015 7

slide-9
SLIDE 9

Fault-Tolerance in HDFS

Replication at write process: Pipelining → Robustness Availability

◮ Heartbeat (NameNode ↔ DataNode) ◮ NameNode issues Re-replications

Kevin Böckler February 19, 2015 8

slide-10
SLIDE 10

Fileaccess in HDFS

File Access Write-once-read-many (WORM): Immutable Remote Access

  • 1. Ask the NameNode for filename → DataNode-Connection
  • 2. Open connection to DataNode
  • 3. Transfer payload

Kevin Böckler February 19, 2015 9

slide-11
SLIDE 11
  • 1. Introduction
  • 2. Distributed File Systems
  • 3. HBase
  • 4. Application

Kevin Böckler February 19, 2015 10

slide-12
SLIDE 12

Features of HBase

Efficiency

◮ Bulk loading ◮ Sequential and Random reads ◮ Distributed MapReduce-Tasks

Scalability

◮ Column Families ◮ Concurrency Model: File Locks

Fault Tolerance

◮ inherited by HDFS ◮ Additionally Heartbeat (HBaseMaster ↔ HRegionServer)

Kevin Böckler February 19, 2015 11

slide-13
SLIDE 13

HBase: HRegionServer

Three Abstract Components HRegionServer ↔ HRegion ↔ Store

◮ gets connection from HClient ◮ receives Table-Requests (GET, PUT, DELETE, ...) from HClient ◮ manages HRegions

Kevin Böckler February 19, 2015 12

slide-14
SLIDE 14

HBase: HRegion

Three Abstract Components HRegionServer ↔ HRegion ↔ Store ID a b c d e 1 42 1 world 9 hello 2 43 3 npe 9 hadoop 3 19 3 java 9 ping 4 22 7 easy 9 bye

◮ HRegion ⊆ Table ◮ receives Requests from HRegionServer ◮ manages Stores ◮ Write-ahead-Log (WAL) of Column Writes

(→ eventually flushed to Store)

Kevin Böckler February 19, 2015 13

slide-15
SLIDE 15

HBase: Store

Three Abstract Components HRegionServer ↔ HRegion ↔ Store ID a b c d e 1 42 1 world 9 hello 2 43 3 npe 9 hadoop 3 19 3 java 9 ping 4 22 7 easy 9 bye

◮ Store = ColumnFamily ◮ encapsulates one group of Columns and Rows ◮ holds its data in

◮ MemStore (working Cache) ◮ StoreFiles (→ HFile → HDFS)

◮ compacts StoreFiles

Kevin Böckler February 19, 2015 14

slide-16
SLIDE 16

Architecture of HBase

Kevin Böckler February 19, 2015 15

slide-17
SLIDE 17
  • 1. Introduction
  • 2. Distributed File Systems
  • 3. HBase
  • 4. Application

Kevin Böckler February 19, 2015 16

slide-18
SLIDE 18

API

Java Usage

◮ Configuration Configuration config = HBaseConfiguration . create ( ) ; config . set ( " hbase . zookeeper . quorum " , " 127.0.0.1 " ) ; config . set ( " hbase . zookeeper . property . c l i e n t P o r t " , " 2180 " ) ; ◮ HTable HTable table = new HTable ( config , "someTableName" ) ; ◮ GET, SCAN, PUT, DELETE Get get = new Get ( Bytes . toBytes ( "someRowId" ) ) ; Result r e s u l t = table . get ( get ) ; ◮ Filter SingleColumnValueFilter f i l t e r = new SingleColumnValueFilter ( someColumnFamily , someColumn , CompareOp .EQUAL, Bytes . toBytes ( "someColumnNameValue" ) ) ;

Kevin Böckler February 19, 2015 17

slide-19
SLIDE 19

Demo

  • 1. Using the HBase Shell
  • 2. HDFS - Filesystem and Influence of Compactions
  • 3. Using Java-Implementation

Figure: Process stack of the single-machine-cluster

Kevin Böckler February 19, 2015 18

slide-20
SLIDE 20

Questions?

Outline Storage in Cloud Computing Hadoop implementation Hadoop implementation HDFS Properties of HDFS Fault-Tolerance in HDFS Fileaccess in HDFS Features of HBase HBase: HRegionServer HBase: HRegion HBase: Store Architecture of HBase API Demo Questions?

Kevin Böckler February 19, 2015 19

slide-21
SLIDE 21

HBase on top of HDFS

Seminar Software Systems Engineering "Mobile, Security, Cloud Computing" Kevin Böckler

Institut für Telematik

February 19, 2015

Kevin Böckler February 19, 2015 20