◭◭ ◮◮ ◭ ◮ Page 1 of 18 Go Back Full Screen Close Quit
GlobeTP: Template-Based Database Replication for Scalable - - PowerPoint PPT Presentation
GlobeTP: Template-Based Database Replication for Scalable - - PowerPoint PPT Presentation
GlobeTP: Template-Based Database Replication for Scalable Web Applications Page 1 of 18 Tobias Groothuyse, Swaminathan Sivasubramanian, and Guillaume Pierre. In procedings of WWW 2007, May 8-12, Go Back 2007, Banff,
◭◭ ◮◮ ◭ ◮ Page 2 of 18 Go Back Full Screen Close Quit
Problem Definition
- How to provide a scalable infrastructure
for hosting dynamically generated web content?
- Past Solutions:
- 1. Cache generated pages
- 2. Distribute
the computational across multiple application servers
- 3. Cache the results of DB queries.
- Problems:
Bottleneck resides in the throughput of the origin DB.
◭◭ ◮◮ ◭ ◮ Page 3 of 18 Go Back Full Screen Close Quit
Problem Definition (cont.)
- Solution: Use DB Replication.
- Problem: Doesn’t scale linearly because
all update, delete, insert (UDI) queries are performed to each DB relipca.
- Past solutions:
- 1. Increase the throughput of each indi-
vidual sever
- 2. Partial Replication
◭◭ ◮◮ ◭ ◮ Page 4 of 18 Go Back Full Screen Close Quit
Partial Replication
- Past Solutions:
– Depending on the application program- mer Gao et al. [2003] – GlobeDB: Sivasubramanian et al. [2005].
∗ Record-level replication granularity ∗ Provides excellent query latency ∗ A central sever maintains all the updates then sends batch updates to other servers. ∗ Does not improve the thoughput because the central server provides a bottleneck.
◭◭ ◮◮ ◭ ◮ Page 5 of 18 Go Back Full Screen Close Quit
DBTP: Template-Based solution
- The nature of web applications belong to
small number of query templates.
- Query
template: parameterized SQL query where parameters are passed at run time.
- By knowing these templates, table place-
ments are selected to insure maximum throughput and reasonable latency.
◭◭ ◮◮ ◭ ◮ Page 6 of 18 Go Back Full Screen Close Quit
Models
- Application Model:
– The application programmer is required to specify explicity the application templates.
- System Model:
◭◭ ◮◮ ◭ ◮ Page 7 of 18 Go Back Full Screen Close Quit
Main problems to consider
- 1. Cluster Identification:
Ensure that the placement of tables would find at least
- ne server to execute each query tem-
plate.
- 2. Consider all the defined templates, read
- r UDI, and determine the best place-
ment to provide the maximum through- put.
- 3. Define a load balancing algorithm that al-
lows read queries to distribute efficiently.
◭◭ ◮◮ ◭ ◮ Page 8 of 18 Go Back Full Screen Close Quit
Data Placement: Cluster Identification
- Goal: Determines the set of tables that is
needed to be replicated together so that templates function correctly. Meanwhile, number of servers that must execute the UDI query should be minimized.
- Characterize each query template:
- 1. Whether it is read or UDI
- 2. The set of tables that it accesses.
◭◭ ◮◮ ◭ ◮ Page 9 of 18 Go Back Full Screen Close Quit
Data Placement: Load Analysis
- Determines the load received by each of
the cluster.
- Determines the load on Table Clusters:
– Read or UDI query – Frequency of template occurrence – Computational complexity for executing this query: ∗ Use DB systems tools to estimate the actual execution time. ∗ Run the query in a live system.
- Determines the load on DB servers (Read or UDI
query)
◭◭ ◮◮ ◭ ◮ Page 10 of 18 Go Back Full Screen Close Quit
Data Placement: Cluster Placement
- Determines the placement of the cluster
across the set of DB servers
load achieved by each replica is minimized.
- Using exhaustive search O(2N∗T/N!),
where T is No. of tables and N number of Nodes.
◭◭ ◮◮ ◭ ◮ Page 11 of 18 Go Back Full Screen Close Quit
Query Routing
- Round Robin (RR): Efficient if all coming
queries have the same cost.
- RR-QID: RR by Query ID
– Each Query template is identified by its QID. – Each queue is associated with the set of DB servers that can server a certain QID. – RR fashion is implemented for each queue.
- Cost-based Routing
– Upon arrival of incoming query, the query router estimates the current load on each DB server. – The Query is scheduled to the least loaded DB server (that can serve the query).
◭◭ ◮◮ ◭ ◮ Page 12 of 18 Go Back Full Screen Close Quit
Experiments
- Compare Globe-TP with full DB replica-
tion using:
– TPC-W: standard e-commerce benchmark – RUBBoS: bulletin-board benchmark modeled after slashdot.org
◭◭ ◮◮ ◭ ◮ Page 13 of 18 Go Back Full Screen Close Quit
Experiments (cont.)
- Query
latency distributions using 4 servers.
◭◭ ◮◮ ◭ ◮ Page 14 of 18 Go Back Full Screen Close Quit
Experiments (cont.)
- Maximum achievable throughputs with
90% of queries processed within 100ms.
◭◭ ◮◮ ◭ ◮ Page 15 of 18 Go Back Full Screen Close Quit
Advantages
- Easily coupled with a distributed DB
query cache.
- Does not require any modification in the
application itself.
◭◭ ◮◮ ◭ ◮ Page 16 of 18 Go Back Full Screen Close Quit
Disadvantages
- Does not support transactions. However,
it can be implemented through query router.
- Limitation due to table granularity par-
tial replication.
- Fault Tolerance issues.
- Does not take into consideration the long-
term load variations that must be ex- pected when operating a popular dy- namic web site.
◭◭ ◮◮ ◭ ◮ Page 17 of 18 Go Back Full Screen Close Quit
References
Lei Gao, Mike Dahlin, Amol Nayate, Jiandan Zheng, and Arun Iyengar. Application specific data replication for edge services. In WWW ’03: Proceedings of the 12th international conference on World Wide Web, 449–460, Budapest, Hungary. 2003. ISBN 1-58113-680-3. Swaminathan Sivasubramanian, Gustavo Alonso, Guillaume Pierre, and Maarten van Steen. Globedb: autonomic data replication for web applications. In WWW ’05: Proceedings of the 14th international conference on World Wide Web, 33–42, Chiba, Japan. 2005. ISBN 1-59593-046-9.
◭◭ ◮◮ ◭ ◮ Page 18 of 18 Go Back Full Screen Close Quit