cc 2 0 by william brawley http flic kr p 7pdup3
play

CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3 August August - PowerPoint PPT Presentation

CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3 August August 31, 31, 2012 2012 Why Hadoop and HBase? 2 Social Media Monitoring Prospective Search and Coprocessors Challenges & Lessons Learned Resources to get


  1. CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3

  2. August August 31, 31, 2012 2012 • Why Hadoop and HBase? 2 • Social Media Monitoring • Prospective Search and Coprocessors • Challenges & Lessons Learned • Resources to get started Agenda

  3. August August 31, 31, 2012 2012 • Spin-o ff of MeMo News AG, the 3 leading provider for Social Media Monitoring & Analytics in Switzerland • Big Data expert, focused on Hadoop, HBase and Solr • Objective: Transforming data into insights About Sentric

  4. CC 2.0 by Editor B| h"p://flic.kr/p/bcU5aD1

  5. August August 31, 31, 2012 2012 5 Information Information Analysis & Insight Gathering Processing Interpretation Presentation Why Hadoop and HBase? Social Media Monitoring Process

  6. August August 31, 31, 2012 2012 6 Cost e ff ective High Freshness scalable SMM Reliable RT Alerting Analytical capabilities Why Hadoop and HBase? Requirements

  7. August August 31, 31, 2012 2012 • HDFS + MapReduce 7 • Based on Google Papers • Distributed Storage and Computation Framework • A ff ordable Hardware, Free Software • Significant Adoption Why Hadoop and HBase? Hadoop

  8. August August 31, 31, 2012 2012 • Non-Relational, Distributed Database 8 • Column-Oriented • Multi-Dimensional • High Availability • High Performance • Build on top of HDFS as storage layer Why Hadoop and HBase? HBase

  9. August August 31, 31, 2012 2012 9 Storage HBase /HDFS Search Solr Analytics Hadoop Mahout Event mechanism (MQ) HBase RowLog Real-time alerting Prospective search Why Hadoop and HBase? Technology Stack

  10. CC 2.0 by nolifebeforeco ff ee | http://flic.kr/p/c1UTf

  11. August August 31, 31, 2012 2012 11 Downloaded Articles match? Search Agents Output Web-UI Reports RT Alerts Icons by http://dryicons.com Social Media Monitoring Overview

  12. August August 31, 31, 2012 2012 12 n News Agents REST HBase Coprocessor Web-UI MySQL Solr RT Alerts Icons by http://dryicons.com Social Media Monitoring Solution Architecture

  13. August August 31, 31, 2012 2012 13 Processing Put operations Prospective Search HRegion RT Alerts HRegionServer Icons by http://dryicons.com Social Media Monitoring Prospective Search with Coprocessors

  14. August August 31, 31, 2012 2012 • Monthly growth 14 • Index: 200GB • 50 Mio. docs/month • HBase: 600 GB • Raw data, meta data and extracted data • A few 1000 map-reduce jobs/ month Social Media Monitoring Key Figures

  15. CC 2.0 by saebaryo | h"p://flic.kr/p/5T4t5L

  16. Augus Augus t 31, t 31, 2012 2012 1 Benchmarks - workloads 16 2 Supervision 3 Keys and shards – Schema design /LG 4 Timestamps, the 4th dimension 5 Short ColumnFamily names-> 6 File handles. OS 7 JVM Tuning, GC !!! 8 Scaling region servers, data locality! 9 Automatic vs manual splits, compaction 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr aktionen, it takes some time 12 Use Hbase for a apropriate use case 13 Tune and tweak – it‘s not a project – it‘s a process 14 You need devops in production 15 Huge know-how curve, you need to know the hole ecosystem 16 Use a distribution, ist packed, tested and supports migration, enterprise grade 17 Virtualisierung, Hardware 18 Dont struggle to much, there is a good community 19 Share your knowledge 20 It‘s early state, many tools around, a few still missing Challenges & Lessons Learned

  17. August August 31, 31, 2012 2012 • Everyone is still learning 17 • Some issues only appear at scale • At scale, nothing works as advertised • Production cluster configuration • Hardware issues • Tuning cluster configuration to our work loads • HBase stability • Monitoring health of HBase Challenges & Lessons Learned Challenges

  18. August August 31, 31, 2012 2012 • Do not rely on HBase as frontend 18 storage layer. It’s not going to be rock solid • Don’t struggle to much, there is a good community • Share your knowledge • It‘s early stage, many tools around, a few still missing Challenges & Lessons Learned Lessons - General

  19. August August 31, 31, 2012 2012 • Use HBase for an appropriate use case 19 • Use a distribution, its packed, tested and supports migration, enterprise grade • Benchmarks – know your workloads & query patterns • YCSB • Schema & Key Design • What’s queried together should be stored together • Scaling region servers, data locality! • Virtualization vs. Real Hardware Challenges & Lessons Learned Lessons - Planning

  20. August August 31, 31, 2012 2012 • Number of CF < 10 20 • Compaction + Flushing I/O intensive • Short ColumnFamily names • HFile index size occupying aloc RAM (storefileindexSize) • OS file handles • ulimit –n 32768 • JVM Tuning, GC !!! • HMaster 1024 MB • RegionServer 8192 MB • -XX:+UseConcMarkSweepGC • -XX:+CMSIncrementalMode • Automatic vs. manual splits • Be careful with expensive operations in coprocessors • Play with all the configurations and benchmark for tuning Challenges & Lessons Learned Lessons - Performance Tuning

  21. August August 31, 31, 2012 2012 • Monitoring/Operational tooling is most 21 important • Forget “emergency actions”, it takes some time • Tune and tweak – it‘s not a project – it‘s a process • You need DevOps in production • Huge know-how curve, you need to know the whole ecosystem • Hadoop, HDFS, MapRed Challenges & Lessons Learned Lessons - Operation

  22. August August 31, 31, 2012 2012 • http://hbase.apache.org/book.html 22 • http://www.sentric.ch/blog/best- practice-why-monitoring-hbase-is- important • http://www.sentric.ch/blog/hadoop- overview-of-top-3-distributions • http://www.sentric.ch/blog/hadoop- best-practice-cluster-checklist • http://outerthought.org/blog/465- ot.html Resources to get started

  23. August August 31, 31, 2012 2012 23 Questions? Questions? Christian Gügi, christian.guegi@sentric.ch Jean-Pierre König, jean-pierre.koenig@sentric.ch NoSQL Roadshow Basel Thank you!

  24. Augus Augus t 31, t 31, 2012 2012 24 Masters Cluster

  25. Augus Augus t 31, t 31, 2012 2012 25 Worker Cluster

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend