Apache HBase, the Scaling Machine Jean-Daniel Cryans Software - - PowerPoint PPT Presentation

apache hbase the scaling machine
SMART_READER_LITE
LIVE PREVIEW

Apache HBase, the Scaling Machine Jean-Daniel Cryans Software - - PowerPoint PPT Presentation

Apache HBase, the Scaling Machine Jean-Daniel Cryans Software Engineer at Cloudera @jdcryans Tuesday, June 18, 13 Agenda Introduction to Apache HBase HBase at StumbleUpon Overview of other use cases 2 Tuesday, June 18, 13 About


slide-1
SLIDE 1

Apache HBase, the Scaling Machine

Jean-Daniel Cryans Software Engineer at Cloudera @jdcryans

Tuesday, June 18, 13

slide-2
SLIDE 2
  • Introduction to Apache HBase
  • HBase at StumbleUpon
  • Overview of other use cases

Agenda

2

Tuesday, June 18, 13

slide-3
SLIDE 3

About le Moi

  • At Cloudera since October 2012.
  • At StumbleUpon for 3 years before that.
  • Committer and PMC member for Apache HBase

since 2008.

  • Living in San Francisco.
  • From Québec, Canada.

3

Tuesday, June 18, 13

slide-4
SLIDE 4

4

Tuesday, June 18, 13

slide-5
SLIDE 5

What is Apache HBase

Apache HBase is an

  • pen source

distributed scalable consistent low latency random access non-relational database built on Apache Hadoop

5

Tuesday, June 18, 13

slide-6
SLIDE 6

Inspiration: Google BigTable (2006)

  • Goal: Low latency, consistent, random read/

write access to massive amounts of structured data.

  • It was the data store for Google’s crawler web

table, gmail, analytics, earth, blogger, …

6

Tuesday, June 18, 13

slide-7
SLIDE 7

HBase is in Production

  • Inbox
  • Storage
  • Web
  • Search
  • Analytics
  • Monitoring

7

Tuesday, June 18, 13

slide-8
SLIDE 8

HBase is in Production

  • Inbox
  • Storage
  • Web
  • Search
  • Analytics
  • Monitoring

7

Tuesday, June 18, 13

slide-9
SLIDE 9

HBase is in Production

  • Inbox
  • Storage
  • Web
  • Search
  • Analytics
  • Monitoring

7

Tuesday, June 18, 13

slide-10
SLIDE 10

HBase is in Production

  • Inbox
  • Storage
  • Web
  • Search
  • Analytics
  • Monitoring

7

Tuesday, June 18, 13

slide-11
SLIDE 11

HBase is Open Source

  • Apache 2.0 License
  • A Community project with committers and

contributors from diverse organizations:

  • Facebook, Cloudera, Salesforce.com, Huawei, eBay,

HortonWorks, Intel, Twitter …

  • Code license means anyone can modify and use

the code.

8

Tuesday, June 18, 13

slide-12
SLIDE 12

So why use HBase?

9

Tuesday, June 18, 13

slide-13
SLIDE 13

10

Tuesday, June 18, 13

slide-14
SLIDE 14

Old School Scaling

  • Find a scaling problem.
  • Beef up the machine.
  • Repeat until you cannot find a big enough

machine or run out of funding.

11

=>

Tuesday, June 18, 13

slide-15
SLIDE 15

“Get Rid of Everything” Scaling

  • Remove text search queries (LIKE)
  • Remove joins
  • Joins due to Normalization require expensive seeks
  • Remove foreign keys and encode your relations
  • Avoid constraint checks
  • Put all parts of a query in a single table.
  • Use read slaves to scale reads.
  • Shard to scale writes.

12

Tuesday, June 18, 13

slide-16
SLIDE 16

We “optimized the DB” by discarding some fundamental SQL/relational databases features.

13

Tuesday, June 18, 13

slide-17
SLIDE 17

HBase is Horizontally Scalable

  • Adding more servers

linearly increases performance and capacity

  • Storage capacity
  • Input/output operations
  • Store and access data
  • n 1-1000’s

commodity servers

14

  • Largest cluster:

>1000 nodes, >1PB

  • Most clusters:

10-40 nodes, 100GB-4TB

Tuesday, June 18, 13

slide-18
SLIDE 18

HBase is Consistent

  • Brewer’s CAP theorem
  • Consistency:
  • DB-style ACID

guarantees on rows

  • Availability:
  • Favor recovering from

faults over returning stale data

  • Partition Tolerance:
  • If a node goes down,

the system continues.

15

Tuesday, June 18, 13

slide-19
SLIDE 19

HBase Dependencies

  • Apache Hadoop HDFS for data

durability and reliability (Write- Ahead Log).

  • Apache ZooKeeper for distributed

coordination.

  • Apache Hadoop MapReduce

support built-in support for running MapReduce jobs.

16

ZK HDFS App MR

Tuesday, June 18, 13

slide-20
SLIDE 20

HBase on a Cluster

17

Tuesday, June 18, 13

slide-21
SLIDE 21

Tables and Regions

18

Tuesday, June 18, 13

slide-22
SLIDE 22

Load Distribution

19

RegionServer Region Region RegionServer Region

Tuesday, June 18, 13

slide-23
SLIDE 23

Load Distribution

20

RegionServer Region Region RegionServer Region

This region is getting too big and afgects the balancing (more about writing in a moment)

Tuesday, June 18, 13

slide-24
SLIDE 24

Load Distribution

21

RegionServer Region Region RegionServer Region

Let’s split the region in order to split the load

Tuesday, June 18, 13

slide-25
SLIDE 25

Load Distribution

22

RegionServer Region RegionServer Region

Now that we have smaller pieces, it’s easier to move the load around

Region A Region B

Tuesday, June 18, 13

slide-26
SLIDE 26

Load Distribution

23

RegionServer Region RegionServer Region

Region A Region B

No data was actually moved during this process, only its responsibility!

Tuesday, June 18, 13

slide-27
SLIDE 27

24

The region is the unit of load distribution in HBase.

Tuesday, June 18, 13

slide-28
SLIDE 28

25

So HBase can scale, but what about Hadoop?

Tuesday, June 18, 13

slide-29
SLIDE 29

HDFS Data Allocation

26

Client Name node

Locations?

Tuesday, June 18, 13

slide-30
SLIDE 30

HDFS Data Allocation

27

Client Data node Data node Data node Name node

Here you go

Tuesday, June 18, 13

slide-31
SLIDE 31

HDFS Data Allocation

28

Client Data node Data node Data node Name node

Data is sent along the pipeline

Tuesday, June 18, 13

slide-32
SLIDE 32

HDFS Data Allocation

29

Client Data node Data node Data node Name node

ACKs are sent back as soon as the data is in memory in the last node

Tuesday, June 18, 13

slide-33
SLIDE 33

Machine

Putting it Together

30

Region server Data node Data node Data node Name node

Tuesday, June 18, 13

slide-34
SLIDE 34

31

Data locality is extremely important for Hadoop and HBase.

Tuesday, June 18, 13

slide-35
SLIDE 35

32

Scaling is just a matter of adding new nodes to the cluster.

Tuesday, June 18, 13

slide-36
SLIDE 36

Sorted Map Datastore

33

Tuesday, June 18, 13

slide-37
SLIDE 37

Sorted Map Datastore

34

Row key info:height info:state roles:hadoop roles:hbase cutting ‘9ft’ ‘CA’ ‘Founder’ tlipcon ‘5ft7’ ‘CA’ ‘PMC’ @ts=2011 ‘Committer’ @ts=2010 ‘Commiter’ Implicit PRIMARY KEY

Tuesday, June 18, 13

slide-38
SLIDE 38

Sorted Map Datastore

35

Row key info:height info:state roles:hadoop roles:hbase cutting ‘9ft’ ‘CA’ ‘Founder’ tlipcon ‘5ft7’ ‘CA’ ‘PMC’ @ts=2011 ‘Committer’ @ts=2010 ‘Commiter’ Implicit PRIMARY KEY Format is family:qualifier

Tuesday, June 18, 13

slide-39
SLIDE 39

Sorted Map Datastore

36

Row key info:height info:state roles:hadoop roles:hbase cutting ‘9ft’ ‘CA’ ‘Founder’ tlipcon ‘5ft7’ ‘CA’ ‘PMC’ @ts=2011 ‘Committer’ @ts=2010 ‘Commiter’ Implicit PRIMARY KEY Data is all byte[] in HBase Format is family:qualifier

Tuesday, June 18, 13

slide-40
SLIDE 40

Sorted Map Datastore

37

Row key info:height info:state roles:hadoop roles:hbase cutting ‘9ft’ ‘CA’ ‘Founder’ tlipcon ‘5ft7’ ‘CA’ ‘PMC’ @ts=2011 ‘Committer’ @ts=2010 ‘Commiter’ Implicit PRIMARY KEY Data is all byte[] in HBase A single cell might have difgerent values at difgerent timestamps Format is family:qualifier

Tuesday, June 18, 13

slide-41
SLIDE 41

Sorted Map Datastore

38

Row key info:height info:state roles:hadoop roles:hbase cutting ‘9ft’ ‘CA’ ‘Founder’ tlipcon ‘5ft7’ ‘CA’ ‘PMC’ @ts=2011 ‘Committer’ @ts=2010 ‘Commiter’ Implicit PRIMARY KEY Data is all byte[] in HBase A single cell might have difgerent values at difgerent timestamps Difgerent rows may have difgerent sets of columns (table is sparse) Format is family:qualifier

Tuesday, June 18, 13

slide-42
SLIDE 42

Anatomy of a row

  • Each row has a primary key
  • Lexicographically sorted byte[]
  • Timestamp associated for keeping multiple

versions of data (MVCC for consistency)

  • Row is made up of columns.
  • Each (row,column) referred to as a Cell
  • Contents of a cell are all byte[]’s.
  • Apps must “know” types and handle them.
  • Rows are Strongly consistent.

39

Tuesday, June 18, 13

slide-43
SLIDE 43

Access HBase data via an API

  • Data operations
  • Get
  • Put
  • Delete
  • Scan
  • Compare-and-swap
  • DDL operations
  • Create
  • Alter
  • Enable/Disable
  • Access via HBase shell, Java API, REST proxy

40

Tuesday, June 18, 13

slide-44
SLIDE 44

Java API

byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); p.add(fam, qual, putVal) table.put(p); Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);

41

HBase provides utilities for easy conversions

Tuesday, June 18, 13

slide-45
SLIDE 45

Java API

byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); p.add(fam, qual, putVal) table.put(p); Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);

42

HBase provides utilities for easy conversions This reads the configuration files

Tuesday, June 18, 13

slide-46
SLIDE 46

Java API

byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); p.add(fam, qual, putVal) table.put(p); Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);

43

HBase provides utilities for easy conversions This reads the configuration files This creates a connection to the cluster, no master needed

Tuesday, June 18, 13

slide-47
SLIDE 47

Java API

byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); p.add(fam, qual, putVal) table.put(p); Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);

44

HBase provides utilities for easy conversions This reads the configuration files This creates a connection to the cluster, no master needed By default all operations are persisted

Tuesday, June 18, 13

slide-48
SLIDE 48

Java API

byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); p.add(fam, qual, putVal) table.put(p); Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);

45

HBase provides utilities for easy conversions This reads the configuration files This creates a connection to the cluster, no master needed By default all operations are persisted From the moment the call to put() came back, the data became visible to all the readers

Tuesday, June 18, 13

slide-49
SLIDE 49

46

Tuesday, June 18, 13

slide-50
SLIDE 50

StumbleUpon is...

47

Tuesday, June 18, 13

slide-51
SLIDE 51

48

The Product

Tuesday, June 18, 13

slide-52
SLIDE 52

The Product

49

Pushing this button takes you to your next recommendation

Tuesday, June 18, 13

slide-53
SLIDE 53

The Product

50

You can tell the recommendation engine if you liked the page or not

Tuesday, June 18, 13

slide-54
SLIDE 54

The Product

51

You can also browse specific interests

Tuesday, June 18, 13

slide-55
SLIDE 55

Business Model

52

Users are showed sponsored pages that are relevant to their interests

Tuesday, June 18, 13

slide-56
SLIDE 56

53

If HBase goes down, the site goes down

Tuesday, June 18, 13

slide-57
SLIDE 57

Architecture

54

Load Balancer Apache Thrift Apache HTTP Apache HBase

Tuesday, June 18, 13

slide-58
SLIDE 58

Architecture

55

Load Balancer Apache Thrift Apache HTTP Apache HBase PHP

Tuesday, June 18, 13

slide-59
SLIDE 59

Architecture

56

Load Balancer Apache Thrift Apache HTTP Apache HBase ~40 nodes PHP

Tuesday, June 18, 13

slide-60
SLIDE 60

A Few Use Cases

  • A/B testing framework

57

Tuesday, June 18, 13

slide-61
SLIDE 61

A Few Use Cases

  • A/B testing framework
  • Realtime counters for dashboards

58 56

Tuesday, June 18, 13

slide-62
SLIDE 62

A Few Use Cases

  • A/B testing framework
  • Realtime counters for dashboards
  • Queueing

59 56

Tuesday, June 18, 13

slide-63
SLIDE 63

A Few Use Cases

  • A/B testing framework
  • Realtime counters for dashboards
  • Queueing
  • Page sharing with comments

60 56

Tuesday, June 18, 13

slide-64
SLIDE 64

A Few Use Cases

  • A/B testing framework
  • Realtime counters for dashboards
  • Queueing
  • Page sharing with comments
  • User lists of stumbles

61 56

Tuesday, June 18, 13

slide-65
SLIDE 65

A Few Use Cases

  • A/B testing framework
  • Realtime counters for dashboards
  • Queueing
  • Page sharing with comments
  • User lists of stumbles
  • Thumbnails serving

62 56

Tuesday, June 18, 13

slide-66
SLIDE 66

A Few Use Cases

  • A/B testing framework
  • Realtime counters for dashboards
  • Queueing
  • Page sharing with comments
  • User lists of stumbles
  • Thumbnails serving
  • Badges serving

63 56

Tuesday, June 18, 13

slide-67
SLIDE 67

Analytics

64

MR Cluster HBase Prod Apache Logs MySQL >100 nodes Pig, Hive, Cascading, and pure MapReduce

Tuesday, June 18, 13

slide-68
SLIDE 68

Analytics

65

MR Cluster HBase Prod Apache Logs MySQL HBase Replication Sub-second lag

Tuesday, June 18, 13

slide-69
SLIDE 69

Analytics

66

MR Cluster HBase Prod Apache Logs MySQL Cron’d copy

Tuesday, June 18, 13

slide-70
SLIDE 70

Analytics

67

MR Cluster HBase Prod Apache Logs MySQL Cron’d dump and load

Tuesday, June 18, 13

slide-71
SLIDE 71

Monitoring With HBase

  • OpenTSDB

68

Tuesday, June 18, 13

slide-72
SLIDE 72

69

Tuesday, June 18, 13

slide-73
SLIDE 73

Facebook Messages

70

Tuesday, June 18, 13

slide-74
SLIDE 74

Facebook Messages

  • Facebook needed a real email solution.
  • Originally the data was stored in MySQL.
  • They have the biggest HBase deployment.
  • All the emails, SMS, and chats are stored in

HBase.

  • Users are sharded by pods of machines, each

pod is configured the same way.

71

Tuesday, June 18, 13

slide-75
SLIDE 75

Opower

72

Tuesday, June 18, 13

slide-76
SLIDE 76

Opower

73

Tuesday, June 18, 13

slide-77
SLIDE 77

Opower

  • Perfect use case for HBase.
  • Follows a time series pattern.
  • Live traffjc served with short scans.
  • New data constantly being fed.

74

Tuesday, June 18, 13

slide-78
SLIDE 78

What now?

  • Download HBase: http://www.hbase.org
  • Read HBase: The Definitive Guide by Lars

George

  • Watch the videos from last week’s conference

(available within a few weeks): http://www.hbasecon.com/

  • Have a chat on #hbase hosted by

irc.freenode.net

75

Tuesday, June 18, 13

slide-79
SLIDE 79

76

Tuesday, June 18, 13