Apache HBase, the Scaling Machine
Jean-Daniel Cryans Software Engineer at Cloudera @jdcryans
Tuesday, June 18, 13
Apache HBase, the Scaling Machine Jean-Daniel Cryans Software - - PowerPoint PPT Presentation
Apache HBase, the Scaling Machine Jean-Daniel Cryans Software Engineer at Cloudera @jdcryans Tuesday, June 18, 13 Agenda Introduction to Apache HBase HBase at StumbleUpon Overview of other use cases 2 Tuesday, June 18, 13 About
Jean-Daniel Cryans Software Engineer at Cloudera @jdcryans
Tuesday, June 18, 13
2
Tuesday, June 18, 13
3
Tuesday, June 18, 13
4
Tuesday, June 18, 13
distributed scalable consistent low latency random access non-relational database built on Apache Hadoop
5
Tuesday, June 18, 13
table, gmail, analytics, earth, blogger, …
6
Tuesday, June 18, 13
7
Tuesday, June 18, 13
7
Tuesday, June 18, 13
7
Tuesday, June 18, 13
7
Tuesday, June 18, 13
HortonWorks, Intel, Twitter …
8
Tuesday, June 18, 13
9
Tuesday, June 18, 13
10
Tuesday, June 18, 13
11
Tuesday, June 18, 13
12
Tuesday, June 18, 13
13
Tuesday, June 18, 13
14
Tuesday, June 18, 13
guarantees on rows
the system continues.
15
Tuesday, June 18, 13
16
ZK HDFS App MR
Tuesday, June 18, 13
17
Tuesday, June 18, 13
18
Tuesday, June 18, 13
19
Tuesday, June 18, 13
20
This region is getting too big and afgects the balancing (more about writing in a moment)
Tuesday, June 18, 13
21
Let’s split the region in order to split the load
Tuesday, June 18, 13
22
Now that we have smaller pieces, it’s easier to move the load around
Region A Region B
Tuesday, June 18, 13
23
Region A Region B
No data was actually moved during this process, only its responsibility!
Tuesday, June 18, 13
24
Tuesday, June 18, 13
25
Tuesday, June 18, 13
26
Locations?
Tuesday, June 18, 13
27
Here you go
Tuesday, June 18, 13
28
Data is sent along the pipeline
Tuesday, June 18, 13
29
ACKs are sent back as soon as the data is in memory in the last node
Tuesday, June 18, 13
30
Tuesday, June 18, 13
31
Tuesday, June 18, 13
32
Tuesday, June 18, 13
33
Tuesday, June 18, 13
34
Row key info:height info:state roles:hadoop roles:hbase cutting ‘9ft’ ‘CA’ ‘Founder’ tlipcon ‘5ft7’ ‘CA’ ‘PMC’ @ts=2011 ‘Committer’ @ts=2010 ‘Commiter’ Implicit PRIMARY KEY
Tuesday, June 18, 13
35
Row key info:height info:state roles:hadoop roles:hbase cutting ‘9ft’ ‘CA’ ‘Founder’ tlipcon ‘5ft7’ ‘CA’ ‘PMC’ @ts=2011 ‘Committer’ @ts=2010 ‘Commiter’ Implicit PRIMARY KEY Format is family:qualifier
Tuesday, June 18, 13
36
Row key info:height info:state roles:hadoop roles:hbase cutting ‘9ft’ ‘CA’ ‘Founder’ tlipcon ‘5ft7’ ‘CA’ ‘PMC’ @ts=2011 ‘Committer’ @ts=2010 ‘Commiter’ Implicit PRIMARY KEY Data is all byte[] in HBase Format is family:qualifier
Tuesday, June 18, 13
37
Row key info:height info:state roles:hadoop roles:hbase cutting ‘9ft’ ‘CA’ ‘Founder’ tlipcon ‘5ft7’ ‘CA’ ‘PMC’ @ts=2011 ‘Committer’ @ts=2010 ‘Commiter’ Implicit PRIMARY KEY Data is all byte[] in HBase A single cell might have difgerent values at difgerent timestamps Format is family:qualifier
Tuesday, June 18, 13
38
Row key info:height info:state roles:hadoop roles:hbase cutting ‘9ft’ ‘CA’ ‘Founder’ tlipcon ‘5ft7’ ‘CA’ ‘PMC’ @ts=2011 ‘Committer’ @ts=2010 ‘Commiter’ Implicit PRIMARY KEY Data is all byte[] in HBase A single cell might have difgerent values at difgerent timestamps Difgerent rows may have difgerent sets of columns (table is sparse) Format is family:qualifier
Tuesday, June 18, 13
versions of data (MVCC for consistency)
39
Tuesday, June 18, 13
40
Tuesday, June 18, 13
byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); p.add(fam, qual, putVal) table.put(p); Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);
41
HBase provides utilities for easy conversions
Tuesday, June 18, 13
byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); p.add(fam, qual, putVal) table.put(p); Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);
42
HBase provides utilities for easy conversions This reads the configuration files
Tuesday, June 18, 13
byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); p.add(fam, qual, putVal) table.put(p); Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);
43
HBase provides utilities for easy conversions This reads the configuration files This creates a connection to the cluster, no master needed
Tuesday, June 18, 13
byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); p.add(fam, qual, putVal) table.put(p); Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);
44
HBase provides utilities for easy conversions This reads the configuration files This creates a connection to the cluster, no master needed By default all operations are persisted
Tuesday, June 18, 13
byte[] row = Bytes.toBytes(“jdcryans”); byte[] fam = Bytes.toBytes(“roles”); byte[] qual = Bytes.toBytes(“hbase”); byte[] putVal = Bytes.toBytes(“PMC”); Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “employees”); Put p = new Put(row); p.add(fam, qual, putVal) table.put(p); Get g = new Get(row); Result r = table.get(g); byte[] jd = r.getValue(col);
45
HBase provides utilities for easy conversions This reads the configuration files This creates a connection to the cluster, no master needed By default all operations are persisted From the moment the call to put() came back, the data became visible to all the readers
Tuesday, June 18, 13
46
Tuesday, June 18, 13
47
Tuesday, June 18, 13
48
Tuesday, June 18, 13
49
Pushing this button takes you to your next recommendation
Tuesday, June 18, 13
50
You can tell the recommendation engine if you liked the page or not
Tuesday, June 18, 13
51
You can also browse specific interests
Tuesday, June 18, 13
52
Users are showed sponsored pages that are relevant to their interests
Tuesday, June 18, 13
53
Tuesday, June 18, 13
54
Tuesday, June 18, 13
55
Tuesday, June 18, 13
56
Tuesday, June 18, 13
57
Tuesday, June 18, 13
58 56
Tuesday, June 18, 13
59 56
Tuesday, June 18, 13
60 56
Tuesday, June 18, 13
61 56
Tuesday, June 18, 13
62 56
Tuesday, June 18, 13
63 56
Tuesday, June 18, 13
64
Tuesday, June 18, 13
65
Tuesday, June 18, 13
66
Tuesday, June 18, 13
67
Tuesday, June 18, 13
68
Tuesday, June 18, 13
69
Tuesday, June 18, 13
70
Tuesday, June 18, 13
71
Tuesday, June 18, 13
72
Tuesday, June 18, 13
73
Tuesday, June 18, 13
74
Tuesday, June 18, 13
75
Tuesday, June 18, 13
76
Tuesday, June 18, 13