GlobeTP: Template-Based Database Replication for Scalable - - PowerPoint PPT Presentation

globetp template based
SMART_READER_LITE
LIVE PREVIEW

GlobeTP: Template-Based Database Replication for Scalable - - PowerPoint PPT Presentation

GlobeTP: Template-Based Database Replication for Scalable Web Applications Page 1 of 18 Tobias Groothuyse, Swaminathan Sivasubramanian, and Guillaume Pierre. In procedings of WWW 2007, May 8-12, Go Back 2007, Banff,


slide-1
SLIDE 1

◭◭ ◮◮ ◭ ◮ Page 1 of 18 Go Back Full Screen Close Quit

GlobeTP: Template-Based Database Replication for Scalable Web Applications

Tobias Groothuyse, Swaminathan Sivasubramanian, and Guillaume Pierre. In procedings of WWW 2007, May 8-12, 2007, Banff, Alberta, Canada. Dina Adel Said

dsaid@vt.edu

slide-2
SLIDE 2

◭◭ ◮◮ ◭ ◮ Page 2 of 18 Go Back Full Screen Close Quit

Problem Definition

  • How to provide a scalable infrastructure

for hosting dynamically generated web content?

  • Past Solutions:
  • 1. Cache generated pages
  • 2. Distribute

the computational across multiple application servers

  • 3. Cache the results of DB queries.
  • Problems:

Bottleneck resides in the throughput of the origin DB.

slide-3
SLIDE 3

◭◭ ◮◮ ◭ ◮ Page 3 of 18 Go Back Full Screen Close Quit

Problem Definition (cont.)

  • Solution: Use DB Replication.
  • Problem: Doesn’t scale linearly because

all update, delete, insert (UDI) queries are performed to each DB relipca.

  • Past solutions:
  • 1. Increase the throughput of each indi-

vidual sever

  • 2. Partial Replication
slide-4
SLIDE 4

◭◭ ◮◮ ◭ ◮ Page 4 of 18 Go Back Full Screen Close Quit

Partial Replication

  • Past Solutions:

– Depending on the application program- mer Gao et al. [2003] – GlobeDB: Sivasubramanian et al. [2005].

∗ Record-level replication granularity ∗ Provides excellent query latency ∗ A central sever maintains all the updates then sends batch updates to other servers. ∗ Does not improve the thoughput because the central server provides a bottleneck.

slide-5
SLIDE 5

◭◭ ◮◮ ◭ ◮ Page 5 of 18 Go Back Full Screen Close Quit

DBTP: Template-Based solution

  • The nature of web applications belong to

small number of query templates.

  • Query

template: parameterized SQL query where parameters are passed at run time.

  • By knowing these templates, table place-

ments are selected to insure maximum throughput and reasonable latency.

slide-6
SLIDE 6

◭◭ ◮◮ ◭ ◮ Page 6 of 18 Go Back Full Screen Close Quit

Models

  • Application Model:

– The application programmer is required to specify explicity the application templates.

  • System Model:
slide-7
SLIDE 7

◭◭ ◮◮ ◭ ◮ Page 7 of 18 Go Back Full Screen Close Quit

Main problems to consider

  • 1. Cluster Identification:

Ensure that the placement of tables would find at least

  • ne server to execute each query tem-

plate.

  • 2. Consider all the defined templates, read
  • r UDI, and determine the best place-

ment to provide the maximum through- put.

  • 3. Define a load balancing algorithm that al-

lows read queries to distribute efficiently.

slide-8
SLIDE 8

◭◭ ◮◮ ◭ ◮ Page 8 of 18 Go Back Full Screen Close Quit

Data Placement: Cluster Identification

  • Goal: Determines the set of tables that is

needed to be replicated together so that templates function correctly. Meanwhile, number of servers that must execute the UDI query should be minimized.

  • Characterize each query template:
  • 1. Whether it is read or UDI
  • 2. The set of tables that it accesses.
slide-9
SLIDE 9

◭◭ ◮◮ ◭ ◮ Page 9 of 18 Go Back Full Screen Close Quit

Data Placement: Load Analysis

  • Determines the load received by each of

the cluster.

  • Determines the load on Table Clusters:

– Read or UDI query – Frequency of template occurrence – Computational complexity for executing this query: ∗ Use DB systems tools to estimate the actual execution time. ∗ Run the query in a live system.

  • Determines the load on DB servers (Read or UDI

query)

slide-10
SLIDE 10

◭◭ ◮◮ ◭ ◮ Page 10 of 18 Go Back Full Screen Close Quit

Data Placement: Cluster Placement

  • Determines the placement of the cluster

across the set of DB servers

load achieved by each replica is minimized.

  • Using exhaustive search O(2N∗T/N!),

where T is No. of tables and N number of Nodes.

slide-11
SLIDE 11

◭◭ ◮◮ ◭ ◮ Page 11 of 18 Go Back Full Screen Close Quit

Query Routing

  • Round Robin (RR): Efficient if all coming

queries have the same cost.

  • RR-QID: RR by Query ID

– Each Query template is identified by its QID. – Each queue is associated with the set of DB servers that can server a certain QID. – RR fashion is implemented for each queue.

  • Cost-based Routing

– Upon arrival of incoming query, the query router estimates the current load on each DB server. – The Query is scheduled to the least loaded DB server (that can serve the query).

slide-12
SLIDE 12

◭◭ ◮◮ ◭ ◮ Page 12 of 18 Go Back Full Screen Close Quit

Experiments

  • Compare Globe-TP with full DB replica-

tion using:

– TPC-W: standard e-commerce benchmark – RUBBoS: bulletin-board benchmark modeled after slashdot.org

slide-13
SLIDE 13

◭◭ ◮◮ ◭ ◮ Page 13 of 18 Go Back Full Screen Close Quit

Experiments (cont.)

  • Query

latency distributions using 4 servers.

slide-14
SLIDE 14

◭◭ ◮◮ ◭ ◮ Page 14 of 18 Go Back Full Screen Close Quit

Experiments (cont.)

  • Maximum achievable throughputs with

90% of queries processed within 100ms.

slide-15
SLIDE 15

◭◭ ◮◮ ◭ ◮ Page 15 of 18 Go Back Full Screen Close Quit

Advantages

  • Easily coupled with a distributed DB

query cache.

  • Does not require any modification in the

application itself.

slide-16
SLIDE 16

◭◭ ◮◮ ◭ ◮ Page 16 of 18 Go Back Full Screen Close Quit

Disadvantages

  • Does not support transactions. However,

it can be implemented through query router.

  • Limitation due to table granularity par-

tial replication.

  • Fault Tolerance issues.
  • Does not take into consideration the long-

term load variations that must be ex- pected when operating a popular dy- namic web site.

slide-17
SLIDE 17

◭◭ ◮◮ ◭ ◮ Page 17 of 18 Go Back Full Screen Close Quit

References

Lei Gao, Mike Dahlin, Amol Nayate, Jiandan Zheng, and Arun Iyengar. Application specific data replication for edge services. In WWW ’03: Proceedings of the 12th international conference on World Wide Web, 449–460, Budapest, Hungary. 2003. ISBN 1-58113-680-3. Swaminathan Sivasubramanian, Gustavo Alonso, Guillaume Pierre, and Maarten van Steen. Globedb: autonomic data replication for web applications. In WWW ’05: Proceedings of the 14th international conference on World Wide Web, 33–42, Chiba, Japan. 2005. ISBN 1-59593-046-9.

slide-18
SLIDE 18

◭◭ ◮◮ ◭ ◮ Page 18 of 18 Go Back Full Screen Close Quit

Thank you dsaid@vt.edu