LARGE SCALE INTERNET SERVICES
2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D.
2110414 - Large Scale Computing Systems 1
INTERNET SERVICES 2110414 Large Scale Computing Systems Natawut - - PowerPoint PPT Presentation
2110414 - Large Scale Computing Systems 1 LARGE SCALE INTERNET SERVICES 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2 Overview Background Knowledge Architectural Case Studies Real-World Case Study
2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D.
2110414 - Large Scale Computing Systems 1
2110414 - Large Scale Computing Systems
2
Overview Background Knowledge Architectural Case Studies Real-World Case Study
2110414 - Large Scale Computing Systems
2110414 - Large Scale Computing Systems
4
Internet services become very essential and popular
Google serves hundreds of millions of search requests
Main requirements
Availability Scalability
2110414 - Large Scale Computing Systems
5
2110414 - Large Scale Computing Systems
2110414 - Large Scale Computing Systems
7
2110414 - Large Scale Computing Systems
8
Web Server AppServer
search.jsp - params HTML page
Web Browser
Request Response
search.jsp
2110414 - Large Scale Computing Systems 9
How to ensures a certain absolute degree of
Availability includes ability of the user community to
Model of Availability Active-Standby: HA Cluster or Failover Cluster Active-Active: Server Load Balancing
2110684 - Basic Infrastructure
Redundant servers and
Only one server is active
(master)
One server is standing-by Shared storages Pro: Simple Half software license costs Con: Double hardware cost with
single performance
2110684 - Basic Infrastructure
Spread work between two or more computers,
Approaches
DNS Round-Robin Reverse Proxy Load Balancer
12
2110684 - Basic Infrastructure
2110414 - Large Scale Computing Systems
13
Pro:
Inexpensive
Con:
Load distribution, but
Problem with DNS
14
2110414 - Large Scale Computing Systems
15
2110684 - Basic Infrastructure
Special equipment
Clients will see only
16
2110684 - Basic Infrastructure
Stateful server
server maintains some persistent data Allow current request to relate to one of the earlier
Stateless server
server does not keep data A request is independent from earlier requests Example: Web server, NFS
2110684 - Basic Infrastructure
17
Server has to maintain
Current request may depend
Consume server’s resources
Lead to limit number of
If connection is broken, the
18
2110684 - Basic Infrastructure
Database Server use db1 select * from … connect disconnect
Example: Database, FTP
Server does not maintain
Connect-request-reply-
Consume less server’s
Lead to large number of
19
2110684 - Basic Infrastructure
GET /index.html
connect disconnect Web Server
GET /i/logo.jpg
connect disconnect
Example: Web server, NFS
Utilize the fact that LAN has more bandwidth and
t = accessing latency + data size / bandwidth
Web pages usually have some “popularity”
User usually goes back-and-forth between pages Users tend to share the same interest (fashion)
2110414 - Large Scale Computing Systems
20
21
2110414 - Large Scale Computing Systems
Source: http://www.useit.com/alertbox/zipf.html
22
Source: http://en.wikibooks.org/wiki/Computer_Networks/HTTP
2110414 - Large Scale Computing Systems
23
Source: http://knowledgehub.zeus.com/articles/2009/08/05/cache_your_website_for_just_a_second
2110684 - Basic Infrastructure
2110414 - Large Scale Computing Systems
25
Online - an online service/Internet portal (Hotmail) Content - a global content-hosting service (File sharing) ReadMostly - a high-traffic Internet service with a very high
Load balancing servers Front-end servers Run stateless codes to service requests and gather data
Web server / AppServer Back-end servers Provide persistent data (databases, files, emails, user
Should utilize RAID-based storages
2110414 - Large Scale Computing Systems
26
2110414 - Large Scale Computing Systems
27
Front-end: functional partitioned Back-end: single file, single database
2110414 - Large Scale Computing Systems 28
Front-end: all the same Back-end: data partitioned
2110414 - Large Scale Computing Systems
29
Front-end: all the same Back-end: full replication
2110414 - Large Scale Computing Systems
Lots of workloads
212 millions registered users 1 billion page views a day 2 petabytes of data
Large number of servers
15,000 AppServers (IBM WebSphere) 100 database servers (Oracle) Utilize Akamai (CDN) for static contents
2110414 - Large Scale Computing Systems
31
Reduce bottlenecks by
Client gets contents from
32
2110414 - Large Scale Computing Systems Source: http://en.wikipedia.org/wiki/Akamai_Technologies
Application Tier
Segmented by function Horizontal load-balancing Minimize dependencies
Data Tier
Data partitioned by functional areas Minimize database work
No stored procedure / business logic in database Move CPU-intensive work to applications (no join, sort, etc.) AppServers are cheap, databases are bottlenecks
2110414 - Large Scale Computing Systems
33
2110414 - Large Scale Computing Systems
34
Source: R. Shoup and D. Pritchett, “The eBay Architecture”, SD Forum 2006
Internet Services”, IEEE Internet Computing, Sept-Oct 2002
to you by my late night frustrations”, http://www.hanselman.com/blog/AReminderOnThreeMultiTierLayerArchitectureDesi gnBroughtToYouByMyLateNightFrustrations.aspx, June 2004
2110414 - Large Scale Computing Systems
35