YCSB Brian F. Cooper , Adam Silberstein, Erwin Tam, Raghu - - PowerPoint PPT Presentation

ycsb
SMART_READER_LITE
LIVE PREVIEW

YCSB Brian F. Cooper , Adam Silberstein, Erwin Tam, Raghu - - PowerPoint PPT Presentation

YCSB Brian F. Cooper , Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Brians guide to writing a widely cited paper 1. Work in a new-ish, hot area 2. Discover that there is no good way to compare different systems 3. Come


slide-1
SLIDE 1

YCSB

Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears

slide-2
SLIDE 2

Brian’s guide to writing a widely cited paper

1. Work in a new-ish, hot area 2. Discover that there is no good way to compare different systems 3. Come up with something halfway reasonable 4. Make it super easy for people to use

slide-3
SLIDE 3

Funny story #1

So Raghu was invited to give a keynote at VLDB...

slide-4
SLIDE 4

Funny story #1

So Raghu was invited to give a keynote at VLDB...

Hey Brian, can I see the data that shows if our system is fastest? Umm...

slide-5
SLIDE 5

The (database) world at that time

NoSQL systems

  • Google BigTable
  • HBase
  • Cassandra
  • ...

Cloud systems

  • PNUTS/Sherpa
  • Amazon Dynamo
  • ...

Lots of options Not a lot of experience yet

slide-6
SLIDE 6

The (Yahoo!) world at that time

Existing scalable but inconsistent storage systems We were building PNUTS/Sherpa, but other parts of Yahoo were considering HBase and Cassandra We were mere researchers, with no ability to force anybody to use our system So we turned to science!

slide-7
SLIDE 7

Funny story #2

How do you scale up a Yahoo user database?

slide-8
SLIDE 8

How do we figure out if our system is faster?

Traditional answer: TPC-something

  • But these were “NoSQL” systems!
  • Also, the workloads were different

New answer: Write a blog post

  • But hard to compare one blog post to another
slide-9
SLIDE 9

What even is the question?

Fast at what?

  • Reads? Writes?
  • Large scans? Point operations?
  • Throughput/latency? Scalability? Elasticity?
slide-10
SLIDE 10

Our answer

We wanted to:

  • Define some workloads approximating what a web serving

system would need

  • Put the same workloads on multiple systems
  • Draw some pretty graphs

The result:

  • Yahoo! Cloud Serving Benchmark (YCSB)
slide-11
SLIDE 11

Benchmark tool

Workload parameter file

  • R/W mix
  • Record size
  • Data set

Command-line parameters

  • DB to use
  • Target throughput
  • Number of threads

YCSB client

DB client Client threads Stats Workload executor

Cloud DB

slide-12
SLIDE 12

Benchmark tool

Workload parameter file

  • R/W mix
  • Record size
  • Data set

Command-line parameters

  • DB to use
  • Target throughput
  • Number of threads

YCSB client

DB client Client threads Stats Workload executor

Cloud DB

Extensible: plug in new clients Extensible: define new workloads

slide-13
SLIDE 13

Workloads

  • A - Update heavy

○ Session store

  • B - Read heavy

○ Photo tagging

  • C - Read only

○ Serving user profiles

  • D - Read latest

○ User status updates

  • E - Short ranges

○ Threaded conversations

  • F - Read-modify-write

○ User metadata store

slide-14
SLIDE 14

Sample results: Workload A (write heavy)

slide-15
SLIDE 15

Sample results: Workload B (read heavy)

slide-16
SLIDE 16

Lessons learned

Tools can be as valuable as new techniques or systems

slide-17
SLIDE 17

Funny story #3

We wrote a paper, but it’s the tool that had the impact https://github.com/brianfrankcooper/YCSB

slide-18
SLIDE 18

The key to our success

Make it open source, easily extensible, and super easy to get results

slide-19
SLIDE 19

Lessons learned

Not everybody is motivated by scientific inquiry

slide-20
SLIDE 20

Lessons learned

Not everybody is motivated by scientific inquiry

  • Or -

Researchers don’t necessarily understand industry

slide-21
SLIDE 21

Funny story #4

So I wrote this email to the HBase developer list...

Hi everybody! My name is Brian and I’m new here and I thought you’d like to know we benchmarked your system and it’s pretty slow.

slide-22
SLIDE 22

Funny story #4

The reaction was not positive

  • They had their own measurements that showed HBase was

very fast

  • They thought we were a big corporation trying to ruin

their open source project For us, it was “just” a research project. For them, it was a fight for their project’s survival

slide-23
SLIDE 23

Luckily...

We all got in a room and made nice and became friends

  • They helped us tune their system to get better results
  • They shipped some improvements to make their system

faster

  • We helped pick apart the distinction between scan and

point workloads

slide-24
SLIDE 24

Since that day...

Support for ~50 different backends Widely used as an research experiment framework and as a commercial system benchmark Managed by a great team of maintainers

  • Sean Busbey, Andy Kruth, Eugene Blikh, Connor McCoy,

Allan Bank, Chris Larsen, Chrisjan Matser, Govind Kamat, Kevin Risden, Jason Tedor, Stanley Feng

slide-25
SLIDE 25

Funny story #5

All of the authors have worked at Google… … except Raghu (Someday we’ll get him.)

slide-26
SLIDE 26

Conclusion

A little science was needed at that time We made it easy to measure Cloud (serving storage) systems We are thankful for:

  • Yahoo engineers who helped us run benchmarks
  • Open source maintainers who have kept the tool going

strong

  • All the users!
slide-27
SLIDE 27

Thanks!