How Fast Indexing Makes Databases Greener Martin Farach-Colton - - PowerPoint PPT Presentation

how fast indexing makes databases greener
SMART_READER_LITE
LIVE PREVIEW

How Fast Indexing Makes Databases Greener Martin Farach-Colton - - PowerPoint PPT Presentation

How Fast Indexing Makes Databases Greener Martin Farach-Colton Michael A. Bender Rutgers and Tokutek Stony Brook and Tokutek Bradley C. Kuszmaul MIT and Tokutek Fast Indexing Makes Databases Greener Obligatory reference to Data centers


slide-1
SLIDE 1

How Fast Indexing Makes Databases Greener

Michael A. Bender Stony Brook and Tokutek Martin Farach-Colton Rutgers and Tokutek Bradley C. Kuszmaul MIT and Tokutek

slide-2
SLIDE 2

How Fast Indexing Makes Databases Greener

Fast Indexing Makes Databases Greener

2

  • Data centers used 1.5% of US

electricity in 2006.

  • Servers: 50% data-center power
  • Storage systems: 27% data-

center power

[Battles, Belleville, Grabau, Maurier.’07]

Databases are both storage and CPU intensive. Obligatory reference to EPA study.

slide-3
SLIDE 3

How Fast Indexing Makes Databases Greener

Fast Indexing Makes Databases Greener

2

  • Data centers used 1.5% of US

electricity in 2006.

  • Servers: 50% data-center power
  • Storage systems: 27% data-

center power

[Battles, Belleville, Grabau, Maurier.’07]

Databases are both storage and CPU intensive. We believe big energy savings & performance gains are still on the table Obligatory reference to EPA study.

slide-4
SLIDE 4

How Fast Indexing Makes Databases Greener

Fast Indexing Makes Databases Greener

Modern indexing structures overcome disk-seek bottlenecks of traditional structures

  • If B=1024, then B/logB≈1000. → 100x speedup.

(Asymptotically same point-query cost.)

  • Other structures supporting fast inserts:

3

[O'Neil1,Cheng2, Gawlick3, O'Neil 96] [Argel 03] [Graefe 03] [Buchsbaum, Goldwasser, Venkatasubramanian, Westbrook 00] [Brodal, Fagerberg 03] [Brodal, Demaine, Fineman, Iacono, Langerman, Munro 00]

B-tree Fractal TreeR structure Insert/delete

O(logBN)=O( ) O( )

logN logB logN B1-ε

slide-5
SLIDE 5

How Fast Indexing Makes Databases Greener

Fast Indexing Makes Databases Greener

Modern indexing structures overcome disk-seek bottlenecks of traditional structures

  • If B=1024, then B/logB≈100. → 100x speedup.

(No asymptotic loss in point queries.)

  • Other structures supporting fast inserts:

4

[O'Neil1,Cheng2, Gawlick3, O'Neil 96] [Argel 03] [Graefe 03] [Buchsbaum, Goldwasser, Venkatasubramanian, Westbrook 00] [Brodal, Fagerberg 03] [Brodal, Demaine, Fineman, Iacono, Langerman, Munro 00]

B-tree Fractal TreeR structure Insert/delete

O(logBN)=O( ) O( )

logN logB logN B1-ε

slide-6
SLIDE 6

How Fast Indexing Makes Databases Greener

Fast Indexing Makes Databases Greener

Modern indexing structures overcome disk-seek bottlenecks of traditional structures

  • If B=1024, then B/logB≈100. → 100x speedup.

(No asymptotic loss in point queries.)

  • Other structures supporting fast inserts:

5

[O'Neil1,Cheng2, Gawlick3, O'Neil 96] [Argel 03] [Graefe 03] [Buchsbaum, Goldwasser, Venkatasubramanian, Westbrook 00] [Brodal, Fagerberg 03] [Brodal, Demaine, Fineman, Iacono, Langerman, Munro 00]

B-tree Fractal TreeR structure Insert/delete

O(logBN)=O( ) O( )

logN logB logN B1-ε

slide-7
SLIDE 7

How Fast Indexing Makes Databases Greener

Fast Indexing Makes Databases Greener

  • Ex. TokuDBR supports >20,000 index inserts/sec

even on high-entropy workloads.

  • Effectively transform random I/O into sequential I/O.

6

0! 5,000! 10,000! 15,000! 20,000! 25,000! 30,000! 35,000! 40,000! 45,000! 50,000! 0! 200,000,000! 400,000,000! 600,000,000! 800,000,000! 1,000,000,000! Rows/Second! Rows Inserted!

iiBench - 1B Row Insert Test!

InnoDB! TokuDB!

slide-8
SLIDE 8

How Fast Indexing Makes Databases Greener

Fast Indexing Makes Databases Greener

7

  • n

e r e a s

  • n

w h y

Fast insertions means ➡ !we can efficiently maintain sophisticated indexes, ➡ !both !insert !& !query-dominated workloads also can be more energy- efficient.

slide-9
SLIDE 9

How Fast Indexing Makes Databases Greener

Fast Indexing Makes Databases Greener

7

  • n

e r e a s

  • n

w h y

Fast insertions means ➡ !we can efficiently maintain sophisticated indexes, ➡ !both !insert !& !query-dominated workloads also can be more energy- efficient.

customer hat

slide-10
SLIDE 10

How Fast Indexing Makes Databases Greener

Fast Indexing Makes Databases Greener

7

  • n

e r e a s

  • n

w h y

Fast insertions means ➡ !we can efficiently maintain sophisticated indexes, ➡ !both !insert !& !query-dominated workloads also can be more energy- efficient.

Many users who think they have query bottlenecks actually have insertion bottlenecks. Customer issues can be solved by fast inserts into sophisticated indexes. customer hat

slide-11
SLIDE 11

How Fast Indexing Makes Databases Greener

Fast Indexing Makes Databases Greener

Promise of green algorithms: enable more power-efficient hardware. Data centers are already designed around algorithmic specs because existing algorithms should run well on existing hardware. Algorithms + Enabled Hardware = Big Win

8

a n

  • t

h e r r e a s

  • n

w h y

slide-12
SLIDE 12

How Fast Indexing Makes Databases Greener

Fast Indexing Makes Databases Greener

Example: Data centers use many small-capacity disks rather than a few large-capacity disks

  • Why? One reason is to get more I/Os.
  • Fractal Tree indexes don’t need more spindles.

Power consumption of disks

  • Enterprise 80 to 160 GB disk runs at 4W (idle power).
  • Enterprise 1-2 TB disk runs at 8W (idle power).

Savings on the table: ~10x in storage

  • Other considerations modify this factor
  • e.g., CPUs necessary to drive disks, scale-out infrastructure, cooling, etc.

Algorithms + Enabled Hardware = Big Win

9

a n

  • t

h e r r e a s

  • n

w h y

slide-13
SLIDE 13

How Fast Indexing Makes Databases Greener

Fast Indexing Makes Databases Greener

Example: Data centers use many small-capacity disks rather than a few large-capacity disks

  • Why? One reason is to get more I/Os.
  • Fractal Tree indexes don’t need more spindles.

Power consumption of disks

  • Enterprise 80 to 160 GB disk runs at 4W (idle power).
  • Enterprise 1-2 TB disk runs at 8W (idle power).

Savings on the table: ~10x in storage

  • Other considerations modify this factor
  • e.g., CPUs necessary to drive disks, scale-out infrastructure, cooling, etc.

Algorithms + Enabled Hardware = Big Win

10

a n

  • t

h e r r e a s

  • n

w h y

slide-14
SLIDE 14

How Fast Indexing Makes Databases Greener

Open Prob 1: Highly Concurrent & Multithreaded Indexing Develop concurrent, multithreaded indexing data structures for slow, high-core-count machines

  • server CPU: ~100 W
  • laptop CPU: 5-10 W
  • 4x less capable, 10-20x less power hungry
  • 5x more energy efficient
  • mobile-phone CPU
  • another factor of 5 is on the table

Fractal Trees drive more CPUs than B-trees

  • CPU intensive. E.g, TokuDB is CPU bound
  • which means big savings are on the table

11

0! 5,000! 10,000! 15,000! 20,000! 25,000! 30,000! 35,000! 40,000! 45,000! 50,000! 0! 200,000,000! 400,000,000! 600,000,000! 800,000,000! 1,000,000,000! Rows/Second! Rows Inserted!

iiBench - 1B Row Insert Test!

InnoDB! TokuDB!

slide-15
SLIDE 15

How Fast Indexing Makes Databases Greener

Open Prob 1: Highly Concurrent & Multithreaded Indexing Develop concurrent, multithreaded indexing data structures for slow, high-core-count machines

  • server CPU: ~100 W
  • laptop CPU: 5-10 W
  • 4x less capable, 10-20x less power hungry
  • 5x more energy efficient
  • mobile-phone CPU
  • another factor of 5 is on the table

Fractal Trees drive more CPUs than B-trees

  • CPU intensive. e.g, TokuDB is CPU bound
  • big efficiency gains are on the table

12

0! 5,000! 10,000! 15,000! 20,000! 25,000! 30,000! 35,000! 40,000! 45,000! 50,000! 0! 200,000,000! 400,000,000! 600,000,000! 800,000,000! 1,000,000,000! Rows/Second! Rows Inserted!

iiBench - 1B Row Insert Test!

InnoDB! TokuDB!

slide-16
SLIDE 16

How Fast Indexing Makes Databases Greener

Open Prob 2: Energy-Efficient SSD/Rotational Disk Hybrid

Design a SSD/rotational disk hybrid for a streaming-B-tree-based storage system.

  • Rotational devices are more efficient for sequential I/O
  • SSDs are more efficient for random I/O.

Can a hybrid offer energy savings by using each device for the workload it is best suited for?

13

5000 10000 15000 20000 25000 30000 35000 5e+07 1e+08 1.5e+08 Insertion Rate Cummulative Insertions RAID10 X25-E FusionIO InnoDB TokuDB RAID10 X25E FusionIO

Fractal Trees deliver >10x speedups on SSDs vs B-trees

slide-17
SLIDE 17

How Fast Indexing Makes Databases Greener

Open Prob 3: The proof is in the pudding

Proof is in the

14

Ten thousand? We were talking about a lot more money than this. Yes, sir, we were, but this is genuine coin of the realm. With a dollar of this, you can buy ten dollars of talk.

slide-18
SLIDE 18

How Fast Indexing Makes Databases Greener

Open Prob 3: The proof is in the pudding

Proof is in the

14

Ten thousand? We were talking about a lot more money than this. Yes, sir, we were, but this is genuine coin of the realm. With a dollar of this, you can buy ten dollars of talk.

We require research in the classics: algorithms, parallelism, concurrency, data structures, storage systems, etc.