Big Data Architectures@ Facebook
QCon London 2012 Ashish Thusoo
Thursday, March 8, 12
Big Data Architectures@ Facebook QCon London 2012 Ashish Thusoo - - PowerPoint PPT Presentation
Big Data Architectures@ Facebook QCon London 2012 Ashish Thusoo Thursday, March 8, 12 Outline Big Data @ Facebook - Scope & Scale Evolution of Big Data Architectures @ FB Past, Present and Future Questions Thursday, March
QCon London 2012 Ashish Thusoo
Thursday, March 8, 12
Thursday, March 8, 12
Thursday, March 8, 12
Thursday, March 8, 12
Thursday, March 8, 12
Thursday, March 8, 12
Thursday, March 8, 12
Thursday, March 8, 12
Thursday, March 8, 12
By Paul Butler - https://www.facebook.com/notes/facebook-engineering/visualizing-friendships/ 469716398919
Thursday, March 8, 12
Thursday, March 8, 12
7500 15000 22500 30000 2007 2008 2009 2010 2011
15 250 800 8000 25000
DW Size in TB
Thursday, March 8, 12
Thursday, March 8, 12
Web Clusters MySQL Clusters
Thursday, March 8, 12
Web Clusters MySQL Clusters RDBMS Data Warehouse
Thursday, March 8, 12
Web Clusters Scribe Mid-Tier MySQL Clusters RDBMS Data Warehouse
Thursday, March 8, 12
Web Clusters Scribe Mid-Tier MySQL Clusters NAS Filers RDBMS Data Warehouse
Thursday, March 8, 12
Web Clusters Scribe Mid-Tier MySQL Clusters Summarization Cluster NAS Filers RDBMS Data Warehouse
Thursday, March 8, 12
Summarization Cluster
Web Clusters Scribe Mid-Tier MySQL Clusters NAS Filers
RDBMS Data Warehouse
Thursday, March 8, 12
Summarization Cluster
Web Clusters Scribe Mid-Tier MySQL Clusters NAS Filers
RDBMS Data Warehouse
Thursday, March 8, 12
Summarization Cluster
Web Clusters Scribe Mid-Tier MySQL Clusters NAS Filers
RDBMS Data Warehouse
Thursday, March 8, 12
Summarization Cluster
Web Clusters Scribe Mid-Tier MySQL Clusters NAS Filers
RDBMS Data Warehouse
Thursday, March 8, 12
Summarization Cluster
Web Clusters Scribe Mid-Tier MySQL Clusters NAS Filers
RDBMS Data Warehouse
(early map/reduce)
Thursday, March 8, 12
Thursday, March 8, 12
Web Clusters Scribe Mid-Tier MySQL Clusters NAS Filers Summarization Cluster RDBMS Data Warehouse
Thursday, March 8, 12
Web Clusters Scribe Mid-Tier MySQL Clusters NAS Filers RDBMS Data Mart Hadoop/Hive Data Warehouse Batch copier/ loaders
Thursday, March 8, 12
Thursday, March 8, 12
Web Clusters Scribe Mid-Tier MySQL Clusters NAS Filers RDBMS Data Mart Hadoop/Hive Data Warehouse
Thursday, March 8, 12
Hadoop/Hive Data Warehouse Databee & Chronos: Data Pipeline Framework HiPal: Adhoc Queries + Data Discovery Nectar: instrumentation & schema aware data collection Scrapes: Configuration Driven
Thursday, March 8, 12
Thursday, March 8, 12
Thursday, March 8, 12
Thursday, March 8, 12
Thursday, March 8, 12
Thursday, March 8, 12
Web Clusters Scribe Mid-Tier MySQL Clusters NAS Filers Hadoop/Hive Data Warehouse
Thursday, March 8, 12
Web Clusters Scribe Mid-Tier MySQL Clusters NAS Filers Platinum Warehouse Silver Warehouse Hive Replication
Thursday, March 8, 12
Web Clusters Scribe HDFS MySQL Clusters Platinum Warehouse Silver Warehouse Hive Replication
Thursday, March 8, 12
Web Clusters Scribe HDFS MySQL Clusters Platinum Warehouse Silver Warehouse Hive Replication near real time data consumers
Thursday, March 8, 12
Web Clusters Scribe HDFS MySQL Clusters Platinum Warehouse Silver Warehouse Hive Replication ptail: parallel tail
near real time data consumers
Thursday, March 8, 12
replicas to 2.2 replicas
format for compressing Hive tables
Thursday, March 8, 12
loaders
save CPU
Thursday, March 8, 12
up to owner/group/team
vs Actual time of arrival
metrics
Thursday, March 8, 12
Thursday, March 8, 12
Thursday, March 8, 12
Web Clusters Scribe HDFS MySQL Clusters Platinum Warehouse Silver Warehouse Hive Replication ptail: parallel tail
near real time data consumers
Thursday, March 8, 12
Web Clusters Scribe HDFS MySQL Clusters Platinum Warehouse Silver Warehouse Hive Replication ptail: parallel tail
near real time data consumers
Thursday, March 8, 12
Thursday, March 8, 12
Scribe HDFS ptail: parallel tail
Thursday, March 8, 12
Scribe HDFS ptail: parallel tail
Puma Clusters
Thursday, March 8, 12
Scribe HDFS ptail: parallel tail
Puma Clusters Hbase Cluster
Thursday, March 8, 12
Thursday, March 8, 12
7500 15000 22500 30000 2007 2008 2009 2010 2011
15 250 800 8000 25000
DW Size in TB
Thursday, March 8, 12
Blog Post on FB by Paul Yang: http://www.facebook.com/notes/paul-yang/moving-an-elephant-large- scale-hadoop-data-migration-at-facebook/10150246275318920
Thursday, March 8, 12
http://www.linkedin.com/pub/ashish-thusoo/0/5a8/50 https://www.facebook.com/athusoo https://twitter.com/ashishthusoo
Thursday, March 8, 12