A Trillion Rows Per Second as a Foundation for Interactive Analytics - PowerPoint PPT Presentation

A Trillion Rows Per Second as a Foundation for Interactive Analytics Eric Hanson, Principal Product Manager April 18, 2018

Overview § MemSQL § Interactivity and user satisfaction § State-of-the-art query execution technology § Demo § Where can we go with this technology? 2

MemSQL Overview 3

What is MemSQL? § SQL DBMS § Fast: scale-out, compilation, in-memory, vectorized § In-memory rowstore § Disk-based columnstore § Transactions and analytics § Fantastic operational data store 4

Why MemSQL? LOW LATENCY FAST DATA Queries Ingest HIGH Concurrency 5

MemSQL scale-out architecture Client App Aggregator Leaf Leaf Leaf Leaf

Challenges to lightning-fast response § Large data volume § Many concurrent users § Query complexity § Rapidly changing data 7

Response Time, Productivity, and User Satisfaction 8

Stimulation is the indispensable requisite for pleasure in an experience, and the feeling of bare time is the least stimulating experience we can have. WILLIAM JAMES, 1842-1910 Principles of Psychology, Volume I (1890) 9

The need for speed § Users become used to fast response & expect it § Satisfaction increases as Response Time (sec) 70 response time decreases 60 § Delays over 50-150 msec 50 40 are noticeable in realtime 30 apps 20 10 § ~250 msec is median human 0 Snooze Meh Good Wow! reaction time Response Time (sec) 10

Subtleties about response time § High variance can bother users • < ¼ of mean or > 2X the mean • Can help to give message if high variance § Unexpectedly fast results can make users apprehensive § Fast response • Can lead to more “input errors” • Makes users interact and explore more ç Creates business value 11

MemSQL Query Execution Technology 12

MemSQL technology to give lightning-fast response for analytics § Scale-out § Compiled query § In-memory row store § Columnstore § Vectorization § Intel AVX2 SIMD 13

MemSQL Scale-Out § True horizontal scaling Client • Shared nothing App • Not shared disk § Hash partitioning across leaf nodes § Can resize cluster and redistribute data Aggregator § Can add aggregators or leaves § Scales both transactions and analytics Leaf Leaf Leaf Leaf 14

MemSQL Compiles Queries § Queries compile to machine code memsql> select count(*) from t; +----------+ | count(*) | § Example is Row Store +----------+ | 8388608 | § First run takes compile time +----------+ 1 row in set (0.10 sec) § 49.3 million rows/sec on 2 cores memsql> select count(*) from t where color = "Red"; +----------+ | count(*) | § 24.7 rows/sec/core +----------+ | 4194304 | +----------+ § Compare to 1 to 2 million 1 row in set (0.42 sec) ç includes compile time rows/sec/core on interpreted memsql> select count(*) from t where color = "Red"; +----------+ | count(*) | DBMS +----------+ | 4194304 | +----------+ 1 row in set (0.17 sec) ç executes from cache 15

MemSQL Columnstore § On disk § 1M-row segments § Each column stored in separate file § Only read columns you touch § Highly compressed • Dictionary • Run-length • LZ • Integer value § Min/max per column per segment 16

MemSQL columnstore ctd. § Sorted by key § Segment elimination § Compiled code built into system for handling segments § Linux file buffer caches keeps data in RAM § In-memory row store segment for new data § Background merger 17

Vectorization 4K-row chunk § Process data in 4,000-row chunks § a.k.a. “vector projections” § Process column vector in a tight loop of C++ • Filters • Local group-by • Joins § Few hundred million Column vector rows/sec/core 18

SIMD overview ▪ Intel AVX-2 ▪ 256-bit registers 1 2 3 4 ▪ Pack multiple values per 1 1 1 1 register ▪ Special instructions for + SIMD register operations 2 3 4 5 ▪ Arithmetic, logic, load, store etc. ▪ Allows multiple operations in 1 instruction 19

Operations on Encoded Data in MemSQL § Intel AVX-2 SIMD § Filters § Group-By § Process 256-bit chunk of encoded (compressed) data at once § Can process > 3 billion rows/sec/core § Applied before vectorization for local group-by 20

Encoded data example § Dictionary encoding Red Red Blue Green Red Blue § Values: • Green: 00 01 01 10 00 01 10 • Red: 01 • Blue: 10 6 values in only 12 bits! § select color, count(*) from t group by color SIMD can process multiple 2-bit values at once 21 MemSQL Confidential

DEMO 22

The Hardware 2 x Intel Xeon Platinum 8180 CPU @ 2.50GHz, 28 cores, “Skylake” Aggregator Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Total leaf cores = 8 x 2 x 28 = 448

The data § Synthetically-generated stock trades § 57.8 billion rows 24

How big is a trillion? Dollar amount of a football field covered with stacks of $100 bills 6 feet high Number of tweets in 5 years Number of text messages in the world in 45 days More than the number of checkout transactions at Walmart since it was founded 25

Drum Roll Please! 26

The results • Avg query time: 0.0525 sec • 57.8 billion / 0.0525 = 1.10 trillion rows/sec 27

What does it mean? § You can encourage analytic exploration § The technology exists to meet these challenges: • Expectation of interactive response • Data explosion • Higher concurrency demands • Preference for SQL • Real-time update • Need to run on economical hardware 28

Thank You!

A Trillion Rows Per Second as a Foundation for Interactive Analytics - PowerPoint PPT Presentation

A Trillion Rows Per Second as a Foundation for Interactive Analytics Eric Hanson, Principal Product Manager April 18, 2018 Overview MemSQL Interactivity and user satisfaction State-of-the-art query execution technology Demo Where

9/30/2013 Us- One species- Homo Sapiens The Microbiome: 10 Trillion cells Your Trillion

Econom ical Aspects Econom ical Aspects Pay per Risk Pay per Use Pay per Use Pay per

FY1 0 Second Quarter FY1 0 Second Quarter FY1 0 Second Quarter FY1 0 Second Quarter

Welcome GAS LEASING AND TITLING IN THE MARCELLUS AND UTICA SHALES Potential Gas of 489 trillion

Introducing The CATS Foundation The CATS Foundation The very beginning www.cats-foundation.org

Approaching the 1M columns / rows limit in Calc Online By Marco Cecchet Collabora Productiity

1 Tabulation means putting data into tables. A table is a matrix of data in rows and

Working with tidy data in R: dplyr Fundamental actions on data tables: choose rows

Working with tidy data in R: tidyverse Fundamental actions on data tables: choose rows

Second Quarter 2011 July 28, 2011 Results Second Quarter 2011 Results 1 Contents 1. Second

US LOAN SERVICES APRIL 2016 NICK OLDFIELD / TOBY WELLS US Mortgage Servicing Market $10 trillion

Contents Indias Next Trillion Dollar Opportunity (NTDOP) Diversified Market Opportunity Why

Legit. Prince Patel, Andrew Min, Jacqueline Henry & Henry Wadsworth $1.77 Trillion An

Contents Indias Next Trillion Dollar Opportunity (NTDOP) Multi Cap Market Opportunity Why

June 2019 Contents Indias Next Trillion Dollar Opportunity (NTDOP) Diversified Market

1 THE OPPORTUNITY CONSUMER GROWTH PEOPLE 6 of the 10 $4 trillion fastest growing economies

Giraph: Production-grade graph processing infrastructure for trillion edge graphs 6/22/2014

One Trillion edges : Graph Processing at Facebook- Scale Tong Niu tong.niu.cn@outlook.com 11.

of Charm Physics Alexey Dzyuba \ HEPD PNPI NRC KI on behalf of LHCb Collaboration 21 st of May

PANDA Software Trigger Status Report PANDA Collaboration Meeting Computing Session March 2014,

11/10/20 Triple Threat or Epiphany? The Need for a Biopsychosocial Approach to Pain Management

The tunnel leveling addendum Darryl McCullough University of Oklahoma Geometric Topology in 3

Parabolic Solar Trough Section: Red A Use for Parabolic Solar Trough n Energy from sun is 1000

M.Sc. in Meteorology Synoptic Meteorology [MAPH P312] Prof Peter Lynch Second Semester,

A Trillion Rows Per Second as a Foundation for Interactive Analytics - PowerPoint PPT Presentation

A Trillion Rows Per Second as a Foundation for Interactive Analytics Eric Hanson, Principal Product Manager April 18, 2018 Overview MemSQL Interactivity and user satisfaction State-of-the-art query execution technology Demo Where

9/30/2013 Us- One species- Homo Sapiens The Microbiome: 10 Trillion cells Your Trillion

Econom ical Aspects Econom ical Aspects Pay per Risk Pay per Use Pay per Use Pay per

FY1 0 Second Quarter FY1 0 Second Quarter FY1 0 Second Quarter FY1 0 Second Quarter

Welcome GAS LEASING AND TITLING IN THE MARCELLUS AND UTICA SHALES Potential Gas of 489 trillion

Introducing The CATS Foundation The CATS Foundation The very beginning www.cats-foundation.org

Approaching the 1M columns / rows limit in Calc Online By Marco Cecchet Collabora Productiity

1 Tabulation means putting data into tables. A table is a matrix of data in rows and

Working with tidy data in R: dplyr Fundamental actions on data tables: choose rows

Working with tidy data in R: tidyverse Fundamental actions on data tables: choose rows

Second Quarter 2011 July 28, 2011 Results Second Quarter 2011 Results 1 Contents 1. Second

US LOAN SERVICES APRIL 2016 NICK OLDFIELD / TOBY WELLS US Mortgage Servicing Market $10 trillion

Contents Indias Next Trillion Dollar Opportunity (NTDOP) Diversified Market Opportunity Why

Legit. Prince Patel, Andrew Min, Jacqueline Henry &amp; Henry Wadsworth $1.77 Trillion An

Contents Indias Next Trillion Dollar Opportunity (NTDOP) Multi Cap Market Opportunity Why

June 2019 Contents Indias Next Trillion Dollar Opportunity (NTDOP) Diversified Market

1 THE OPPORTUNITY CONSUMER GROWTH PEOPLE 6 of the 10 $4 trillion fastest growing economies

Giraph: Production-grade graph processing infrastructure for trillion edge graphs 6/22/2014

One Trillion edges : Graph Processing at Facebook- Scale Tong Niu tong.niu.cn@outlook.com 11.

of Charm Physics Alexey Dzyuba \ HEPD PNPI NRC KI on behalf of LHCb Collaboration 21 st of May

PANDA Software Trigger Status Report PANDA Collaboration Meeting Computing Session March 2014,

11/10/20 Triple Threat or Epiphany? The Need for a Biopsychosocial Approach to Pain Management

The tunnel leveling addendum Darryl McCullough University of Oklahoma Geometric Topology in 3

Parabolic Solar Trough Section: Red A Use for Parabolic Solar Trough n Energy from sun is 1000

M.Sc. in Meteorology Synoptic Meteorology [MAPH P312] Prof Peter Lynch Second Semester,

Legit. Prince Patel, Andrew Min, Jacqueline Henry & Henry Wadsworth $1.77 Trillion An