pepper an elastic web server farm for cloud based on
play

Pepper: An Elastic Web Server Farm for Cloud based on Hadoop - PowerPoint PPT Presentation

Pepper: An Elastic Web Server Farm for Cloud based on Hadoop Subramaniam Krishnan, Jean Christophe Counio Yahoo! Inc. MAPRED 1 st December 2010 Agenda Motivation Design Features Applications Evaluation Conclusion


  1. Pepper: An Elastic Web Server Farm for Cloud based on Hadoop Subramaniam Krishnan, Jean Christophe Counio Yahoo! Inc. MAPRED 1 st December 2010

  2. Agenda Motivation  Design  Features  Applications  Evaluation  Conclusion  Future Work  Yahoo! Inc 1

  3. Motivation Yahoo! Inc 2

  4. What’s in a Name Wave 1: Grid-ification Crunch 10-100s of Grid workflow Large data like Hosted, multi- GBs of data in management wikipedia tenant platform hours system (PacMan) Wave 2: Content Freshness Web feeds like Process 100s of Scalable, high breaking news, Pepper – elastic web feeds/sec, size in throughput & low tweets, finance server farm on grid KBs in seconds latency platform quotes Motivation Yahoo! Inc 3

  5. Requirements Elastic: handle intra/inter application load variance Multi-tenant: provide process/memory isolation Sub-second platform overhead Simple API Execute user code in platform context Reliability: transparent fault tolerance Motivation Yahoo! Inc 4

  6. Design Yahoo! Inc 5

  7. Deployment Flow Web application • deployed as WAR onto HDFS – Job Manager Embedded Jetty server • runs in Map task, registers with ZooKeeper 1 Hadoop job = 1 Map • task = 1 Web Server = 1 Web application Design Yahoo! Inc 6

  8. Processing Flow Proxy Router receives • incoming requests, looks up ZooKeeper & redirects to appropriate Web Server Design Yahoo! Inc 7

  9. ZooKeeper Hierarchy Design Yahoo! Inc 8

  10. Features Yahoo! Inc 9

  11. Features Scalability : Web application can scale by configuring more • instances (Elasticity), system can scale with addition of Hadoop nodes Performance : High throughput by ensuring that all the • heavy lifting is done during deployment High Availability/Self-healing : Redundant server • instances. Health check piggybacked on TaskTracker heartbeat Isolation : Hadoop map provides process isolation • Ease of Development : Standard Servlet API & WAR • packaging Reuse of Grid Infrastructure : The system runs on a Grid • that can be shared across several applications Features Yahoo! Inc 10

  12. Applications Yahoo! Inc 11

  13. Applications Web Feeds Processing : Configure workflow • orchestration engine to run in-memory, 1 workflow = 1 web-application. Benefits: Scalability  Isolation  Avoids Hadoop job bootstrap latency and HDFS  small files bottleneck. Online Clustering : Extracts features and assigns • incoming feeds to clusters predetermined by offline clustering. Performed online for Yahoo! News to identify hot news clusters during ingestion of articles. Applications Yahoo! Inc 12

  14. Evaluation Yahoo! Inc 13

  15. Setup Hardware: Intel Xeon L5420 2.50GHz with 8GB • DDR2 RAM Software: 64-bit SUN JDK 1.6 update 18 on RHEL AS • 4 U8, Linux 2.6.9- 89.ELsmp x86_64 Configuration: 8 map slots/node with 512MB heap, • 25 threads/Jetty server Number of Computing Hadoop nodes: 3 • Evaluation Yahoo! Inc 14

  16. Linear Scaling for Predefined Capacity Throughput: number of requests handled successfully • per second for a specified number of tasks Evaluation Yahoo! Inc 15

  17. Elastic Scaling for Dynamic Capacity Rejection: failure to execute within predefined timeout • Load is increased and additional map task allocated at • points A and B based on predefined schedule Failure rate of < 0.001% observed in Production • Evaluation Yahoo! Inc 16

  18. Pepper Performance Numbers System Burst Rate Throughput Platform Response (request/mi (requests/da Latency Time (Avg.) n) y) (Avg.) Pepper 2,000 3 million 75 ms 4s PacMan 50 10,000 90s 120s • Dataset is Yahoo! News feeds with sizes < 1MB • Processing is typically computation intensive like processing and enriching web feeds that involves validation, normalization, geo tagging, persistence in service stores, etc Evaluation Yahoo! Inc 17

  19. Conclusion Yahoo! Inc 18

  20. Conclusion Pepper marries the benefits of traditional server farms • i.e. low latency and high throughput with those of cloud i.e. elasticity and isolation In production within Yahoo! from December 2009 • Current Y! properties - Newspaper Consortium, • Finance & News. Sports & Entertainment are in pipeline System scales linearly with addition of more Hadoop • computing nodes Conclusion Yahoo! Inc 19

  21. Future Work Yahoo! Inc 20

  22. Future Work On demand allocation of servers • Experimenting with async NIO between Proxy Router • & Map Web Engine to increase scalability Improving distribution of requests across web servers • Integrate into Hadoop (?) • Future Work Yahoo! Inc 21

  23. References Hadoop, Web Page http://hadoop.apache.org/ • J. Dean and S. Ghemawat, “MapReduce: Simplified Data • Processing on Large Cluster”, 6th Symposium on Operating Systems Design and Implementation (OSDI’04), San Francisco, CA, December 2004, pp. 137–150 P. Hunt, M. Konar, F.P. Junqueira, and B. Reed, • “ZooKeeper: Wait-free coordination for Internet-scale systems”, Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, Boston, MA, June 2010, pp. 11- 11 Oozie (successor to PacMan), Web Page • http://yahoo.github.com/oozie/, http://www.cloudera.com/blog/2010/07/whats-new-in- cdh3-b2- oozie/ Yahoo! Inc 22

  24. Questions ? Yahoo! Inc 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend