Scuba: Diving into Data at Facebook
Presenter: Lavanya Subramanian
1
Scuba: Diving into Data at Facebook Presenter: Lavanya Subramanian - - PowerPoint PPT Presentation
Scuba: Diving into Data at Facebook Presenter: Lavanya Subramanian 1 Need for Data Analysis Performance monitoring Detect unexpected performance drops/rises Pattern mining Understand user response to new features Ad revenue
1
– Detect unexpected performance drops/rises
– Understand user response to new features
– Identify regional drops/rises in ad clicks and revenue
2
– Low latency – Flexibility – Scalability
3
– In-memory database – Across hundreds of servers
– Holds and processes sampled real-time data – Query interface to access data – Visualization interface to analyze data
4
Server
Leaf nodes
5
– Integers, strings, sets of strings, vectors of strings
Table Characteristics
6
Leaf nodes Scribe
7
– Collect, aggregate and deliver data to Scuba
– Pick two leaf nodes at random – Send the batch to the node with more free memory
8
– Age: Sample and preserve a fraction of old data – Space: When exceeding space limits, delete old data
9
– Web-based – SQL – API to support querying from application code
– Different forms of aggregation – Percentiles, histograms
10
Root Aggregator Leaf nodes Intermediate Aggregators Leaf Aggregators
11
– Depends on the table size and age
– Time is Scuba’s only notion of index
– Small missing pieces of data do not affect accuracy
– Lower response time is a bigger requirement
12
components
aggregator, depth of tree
13
– Intel Xeon E5-2660 – 2.2 GHz – 144 GB DRAM memory
14
15
16
– Might be fine for an internal system
17