cluster based scalable
play

Cluster-Based Scalable Scalability Linear increase in hardware to - PDF document

Advantages of Clusters Cluster-Based Scalable Scalability Linear increase in hardware to handle load Network Services Adding resources easy for clusters Availability 24 x 7 service, despite transient hardware or software errors


  1. Advantages of Clusters Cluster-Based Scalable � Scalability � Linear increase in hardware to handle load Network Services � Adding resources easy for clusters � Availability � 24 x 7 service, despite transient hardware or software errors Armando Fox, Steven D. Gribble, Yatin � Nodes are independent in a cluster. Failures masked by Chawathe, Eric A. Brewer and Paul Gauthier software Presented by Hari Sivaramakrishnan � Cost Effectiveness � Economical to maintain and expand � Commodity hardware Challenges to using Clusters Architectural Features � Administration � Exploits strengths of cluster computing � Software available � Component vs System Replication � Separation of content from services � Can support part of a service, not all of it � Handled in the architecture design � Programming model based on composition of worker � Functions are well described, and interchangable models � Partial Failures � BASE semantics � Shared State � None in a cluster � B asically A vailable, S oft State, E ventual Consistency � Can be emulated, but performance can be improved if need for shared state is minimized � Measurements and monitoring Architecture of a SNS Layered Architecture 1

  2. SNS Layer TACC : Programming model � Scalability � Transformation � Use incrementally added nodes to spawn new components � Operation on a single data object � Workers are simple and stateless � Example : encryption, encoding, compression � Centralized load balancing � Aggregation Policy implemented in manager, can be changed easily � � Collating data from various objects Trace information collected from workers, decisions sent to FEs � Fault tolerant � � Customization � Prolonged Bursts, Incremental growth � User specific data automatically fed to workers Overflow pool � � Same worker can be used with different parameter sets Workers spawned by manager � � Caching API � � ISPs observed 40 – 50 % savings…critical Provided by manager and FE to allow for new services � � Can cache original and transformed data � Worker stub handles load balancing, fault tolerance etc. � Worker code focuses on service implementation TansSend TansSend contd. � Front Ends � Fault Tolerance � SPARCstation machine cluster � Registration system used to locate distillers � HTTP interface � Timeouts detect dead nodes � Request served from cache if available or � All state is soft computed � Watcher process needs to know if peer is alive by � 400 threads periodic monitoring � Peers start one another � Load balancer � Manager starts FE � FE starts a manager � MS contacts manager to locate a distiller � Manager reports distiller failures to MS which � WS accepts requests and reports load info updates its cache � Manager spawns distiller if load increases � Programmed in the manager stubs TransSend contd. TansSend’s use of BASE � User profile database � Load balancing data � Normal ACID database � MS don’t have most recent information � Errors are corrected by using timeouts � Caching � Perf improvements outweigh problems � Harvest object cache workers � Distillers � Soft state � Image processing � Transformed content is cached � Off the shelf code � Did not have to remove all the bugs because if a node crashes, it will be restarted by a peer � Approximate answers � If system is overloaded, can return a slightly different version � Graphical Monitor of data from cache � Detect system state and resource usage � User can get accurate answer by resubmitting a request 2

  3. Input Characteristics Cache Performance � Average cache hit takes 27ms to serve � 95% of hits take less than 100ms � Miss penalty anywhere from 100ms to 100s � Cache perf related to number of users and size � Hit rate increases monotonically with size � When sum of users exceeds cache size, hit rate falls Load balancing Scalability � Limited by shared or centralized components – � Metric – queue length at distillers SAN, manager, user profile DB � New distillers spawned � DB when load is very high � Was never near saturation in their tests � Delay D to allow for new � Manager distillers to stabilize the � Has capability to handle three orders of magnitude system before adding more traffic than the peak load more distillers � Even commodity hardware can get the job done Scalability of SAN Economic Feasibility � Close to saturation, unreliable multicast � Caching saves an ISP a lot of money traffic dropped � This information is needed by manager to � A server can pay for itself in 2 months load balance � Administration costs not considered � Workarounds � Do not expect it to be very significant � Separate network for data and control traffic � High performance interconnect 3

  4. Conclusion � Architecture works around deficiencies of using clusters � Defined a new programming model which makes adding new services extremely easy � BASE (weaker than ACID) semantics enhances performance 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend