YCSB Brian F. Cooper , Adam Silberstein, Erwin Tam, Raghu - PowerPoint PPT Presentation

YCSB Brian F. Cooper , Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears

Brian’s guide to writing a widely cited paper 1. Work in a new-ish, hot area 2. Discover that there is no good way to compare different systems 3. Come up with something halfway reasonable 4. Make it super easy for people to use

Funny story #1 So Raghu was invited to give a keynote at VLDB...

Funny story #1 So Raghu was invited to give a keynote at VLDB... Hey Brian, can I see the data that shows if our system is fastest? Umm...

The (database) world at that time NoSQL systems Google BigTable ● Lots of options HBase ● Not a lot of experience yet Cassandra ● ... ● Cloud systems PNUTS/Sherpa ● Amazon Dynamo ● ... ●

The (Yahoo!) world at that time Existing scalable but inconsistent storage systems We were building PNUTS/Sherpa, but other parts of Yahoo were considering HBase and Cassandra We were mere researchers, with no ability to force anybody to use our system So we turned to science!

Funny story #2 How do you scale up a Yahoo user database?

How do we figure out if our system is faster? Traditional answer: TPC-something But these were “NoSQL” systems! ● Also, the workloads were different ● New answer: Write a blog post But hard to compare one blog post to another ●

What even is the question? Fast at what? Reads? Writes? ● Large scans? Point operations? ● Throughput/latency? Scalability? Elasticity? ●

Our answer We wanted to: Define some workloads approximating what a web serving ● system would need Put the same workloads on multiple systems ● Draw some pretty graphs ● The result: Yahoo! Cloud Serving Benchmark (YCSB) ●

Benchmark tool Command-line parameters • DB to use • Target throughput • Number of threads • … Workload parameter file YCSB client Cloud • R/W mix DB • Record size client Client DB Workload • Data set threads executor • … Stats

Benchmark tool Command-line parameters • DB to use • Target throughput • Number of threads • … Workload parameter file YCSB client Cloud • R/W mix DB • Record size client Client DB Workload • Data set threads executor • … Stats Extensible: define new workloads Extensible: plug in new clients

Workloads A - Update heavy ● Session store ○ B - Read heavy ● Photo tagging ○ C - Read only ● Serving user profiles ○ D - Read latest ● User status updates ○ E - Short ranges ● Threaded conversations ○ F - Read-modify-write ● User metadata store ○

Sample results: Workload A (write heavy)

Sample results: Workload B (read heavy)

Lessons learned Tools can be as valuable as new techniques or systems

Funny story #3 We wrote a paper, but it’s the tool that had the impact https://github.com/brianfrankcooper/YCSB

The key to our success Make it open source, easily extensible, and super easy to get results

Lessons learned Not everybody is motivated by scientific inquiry

Lessons learned Not everybody is motivated by scientific inquiry - Or - Researchers don’t necessarily understand industry

Funny story #4 So I wrote this email to the HBase developer list... Hi everybody! My name is Brian and I’m new here and I thought you’d like to know we benchmarked your system and it’s pretty slow.

Funny story #4 The reaction was not positive They had their own measurements that showed HBase was ● very fast They thought we were a big corporation trying to ruin ● their open source project For us, it was “just” a research project. For them, it was a fight for their project’s survival

Luckily... We all got in a room and made nice and became friends They helped us tune their system to get better results ● They shipped some improvements to make their system ● faster We helped pick apart the distinction between scan and ● point workloads

Since that day... Support for ~50 different backends Widely used as an research experiment framework and as a commercial system benchmark Managed by a great team of maintainers Sean Busbey, Andy Kruth, Eugene Blikh, Connor McCoy, ● Allan Bank, Chris Larsen, Chrisjan Matser, Govind Kamat, Kevin Risden, Jason Tedor, Stanley Feng

Funny story #5 All of the authors have worked at Google… … except Raghu (Someday we’ll get him.)

Conclusion A little science was needed at that time We made it easy to measure Cloud (serving storage) systems We are thankful for: Yahoo engineers who helped us run benchmarks ● Open source maintainers who have kept the tool going ● strong All the users! ●

Thanks!

YCSB Brian F. Cooper , Adam Silberstein, Erwin Tam, Raghu - PowerPoint PPT Presentation

YCSB Brian F. Cooper , Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Brians guide to writing a widely cited paper 1. Work in a new-ish, hot area 2. Discover that there is no good way to compare different systems 3. Come

S3 Resource Plugin S3 Resource Plugin Cacheless and Detached Cacheless and Detached Justin

Client-side plug-ins for Tukey Eric Griffis Joshua Eisenberg Current State of Tukey All

Developing API Plug-ins for CloudStack* * Specifically Using Version 4.5 Mike Tutkowski

Developing Protg Plugins Ray Fergerson Stanford Overview What is a Plugin? How

Migration to E4 Eclipse Con France 2016 8th June 2016 Table des matires I - M i g r a t i o n

Whats the fuss with all this Welcome! Weve got a lot to cover, so lets get going. Hi,

To be or not to be. Neo4j Full Text Search Tips and Tricks Christophe Willemsen CTO -

Advanced fulltext search with Sphinx Adrian Nuta // Sphinxsearch // 2014 Fulltext search in

}w !"#$%&'()+,-./012345<yA| Illustraons by Ji Franek. Semanc Indexing

Query Optimization 2 Instructor: Matei Zaharia cs245.stanford.edu Recap: Data Statistics

SASI, Cassandra on the full text search ride DuyHai DOAN Apache Cassandra Evangelist 1 5

Scalable Full-Text Search for Petascale File Systems Andrew W. Leung Ethan L. Miller

Trees (Part 2) 1 / 59 Trees (Part 2) Recap Recap 2 / 59 Trees (Part 2) Recap B + Tree A B

Beyond full-text searches With Lucene and Solr Bertrand Delacrtaz ApacheCon EU 2007, Amsterdam

Search and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17

Inverted Index Lecture 12 Inverted Index 1 December 2014 1 Wentworth Institute of Technology

PB Scale with MarkLogic Server A talk by Nuno Job,

Advanced Document Similarity With Apache Lucene Alessandro Benedetti, Software Engineer, Sease

Media Indexing & Retrieval Media Indexing & Retrieval Prepared by Ling Guan Jose Lay

XQuery Full Text Implementation in BaseX XSym/VLDB 2009 XSym/VLDB 2009 Christian Grn,

Capabilities Capabilities Indexing and Publishing Indexing and Publishing Jason M. Coposky

Indexing Extracts from: Witten, Moffat, and Bell, Managing Gigabytes , 2nd ed., Morgan

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

automatically identify malware capabilities Joshua Saxe, Rafael Turner, Kristina Blokhin, Jose

YCSB Brian F. Cooper , Adam Silberstein, Erwin Tam, Raghu - PowerPoint PPT Presentation

YCSB Brian F. Cooper , Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Brians guide to writing a widely cited paper 1. Work in a new-ish, hot area 2. Discover that there is no good way to compare different systems 3. Come

S3 Resource Plugin S3 Resource Plugin Cacheless and Detached Cacheless and Detached Justin

Client-side plug-ins for Tukey Eric Griffis Joshua Eisenberg Current State of Tukey All

Developing API Plug-ins for CloudStack* * Specifically Using Version 4.5 Mike Tutkowski

Developing Protg Plugins Ray Fergerson Stanford Overview What is a Plugin? How

Migration to E4 Eclipse Con France 2016 8th June 2016 Table des matires I - M i g r a t i o n

Whats the fuss with all this Welcome! Weve got a lot to cover, so lets get going. Hi,

To be or not to be. Neo4j Full Text Search Tips and Tricks Christophe Willemsen CTO -

Advanced fulltext search with Sphinx Adrian Nuta // Sphinxsearch // 2014 Fulltext search in

}w !&quot;#$%&amp;'()+,-./012345&lt;yA| Illustraons by Ji Franek. Semanc Indexing

Query Optimization 2 Instructor: Matei Zaharia cs245.stanford.edu Recap: Data Statistics

SASI, Cassandra on the full text search ride DuyHai DOAN Apache Cassandra Evangelist 1 5

Scalable Full-Text Search for Petascale File Systems Andrew W. Leung Ethan L. Miller

Trees (Part 2) 1 / 59 Trees (Part 2) Recap Recap 2 / 59 Trees (Part 2) Recap B + Tree A B

Beyond full-text searches With Lucene and Solr Bertrand Delacrtaz ApacheCon EU 2007, Amsterdam

Search and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17

Inverted Index Lecture 12 Inverted Index 1 December 2014 1 Wentworth Institute of Technology

PB Scale with MarkLogic Server A talk by Nuno Job,

Advanced Document Similarity With Apache Lucene Alessandro Benedetti, Software Engineer, Sease

Media Indexing &amp; Retrieval Media Indexing &amp; Retrieval Prepared by Ling Guan Jose Lay

XQuery Full Text Implementation in BaseX XSym/VLDB 2009 XSym/VLDB 2009 Christian Grn,

Capabilities Capabilities Indexing and Publishing Indexing and Publishing Jason M. Coposky

Indexing Extracts from: Witten, Moffat, and Bell, Managing Gigabytes , 2nd ed., Morgan

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

automatically identify malware capabilities Joshua Saxe, Rafael Turner, Kristina Blokhin, Jose

}w !"#$%&'()+,-./012345<yA| Illustraons by Ji Franek. Semanc Indexing

Media Indexing & Retrieval Media Indexing & Retrieval Prepared by Ling Guan Jose Lay