Pocket Data The Case for TPC-MOBILE Oliver Kennedy, Jerry Ajay, - - PowerPoint PPT Presentation

pocket data
SMART_READER_LITE
LIVE PREVIEW

Pocket Data The Case for TPC-MOBILE Oliver Kennedy, Jerry Ajay, - - PowerPoint PPT Presentation

Pocket Data The Case for TPC-MOBILE Oliver Kennedy, Jerry Ajay, Geoff Challen, Lukasz Ziarek http://odin.cse.buffalo.edu/research/astral 1 The ODIn Lab @ Big Data! 2 The ODIn Lab @ Big Data! 2 The ODIn Lab @ Big Data! GB, TB or PB of


slide-1
SLIDE 1

The ODIn Lab @

Pocket Data

The Case for TPC-MOBILE

Oliver Kennedy, Jerry Ajay, Geoff Challen, Lukasz Ziarek http://odin.cse.buffalo.edu/research/astral

1

slide-2
SLIDE 2

The ODIn Lab @

Big Data!

2

slide-3
SLIDE 3

The ODIn Lab @

Big Data!

2

slide-4
SLIDE 4

The ODIn Lab @

Big Data!

  • GB, TB or PB of data!
  • Hundreds of thousands of

updates per second

  • Thousands of nodes

computing together!

  • “Virtually” infinite resources!

3

slide-5
SLIDE 5

The ODIn Lab @

Big Data!

4

slide-6
SLIDE 6

The ODIn Lab @

Big Data!

TPC-C TPC-H YCSB SSB TPC-DS TPC-DI TPC-E

4

slide-7
SLIDE 7

The ODIn Lab @

Big Data!

TPC-C TPC-H YCSB SSB TPC-DS TPC-DI TPC-E What about other types of databases?

4

slide-8
SLIDE 8

The ODIn Lab @

The average smartphone processes almost 180 thousand queries per day That’s about 2 queries per second

5

slide-9
SLIDE 9

The ODIn Lab @

2 Queries per Second

6

slide-10
SLIDE 10

The ODIn Lab @

2 Queries per Second

  • Is this Big Data? No!

6

slide-11
SLIDE 11

The ODIn Lab @

2 Queries per Second

  • Is this Big Data? No!
  • Is this Important?

6

slide-12
SLIDE 12

The ODIn Lab @

2 Queries per Second

  • Is this Big Data? No!
  • Is this Important?
  • Multi-Tenancy: The phone is

more than just a DB.

6

slide-13
SLIDE 13

The ODIn Lab @

2 Queries per Second

  • Is this Big Data? No!
  • Is this Important?
  • Multi-Tenancy: The phone is

more than just a DB.

  • Power: 1-2 days of battery life

under ideal circumstances.

6

slide-14
SLIDE 14

The ODIn Lab @

2 Queries per Second

  • Is this Big Data? No!
  • Is this Important?
  • Multi-Tenancy: The phone is

more than just a DB.

  • Power: 1-2 days of battery life

under ideal circumstances.

  • It’s Everywhere: Odds are

that your phone is running some queries right now!

6

slide-15
SLIDE 15

The ODIn Lab @

2 Queries per Second

  • Is this Big Data? No!
  • Is this Important?
  • Multi-Tenancy: The phone is

more than just a DB.

  • Power: 1-2 days of battery life

under ideal circumstances.

  • It’s Everywhere: Odds are

that your phone is running some queries right now!

YES!

6

slide-16
SLIDE 16

The ODIn Lab @

We need to better understand pocket-scale data

7

slide-17
SLIDE 17

The ODIn Lab @

SQLite

  • Embedded: SQLite is a library
  • Un-shared: SQLite DBs are

specific to one client “app”.

  • Lightweight: Entire SQLite

DB is backed to one file.

  • Universal: SQLite client

library is available by default in nearly all major OSes.

  • “Easy”: Duck Typing,

Relaxed SQL Syntax, One Big Lock (file)

8

Client Application Android OS SQLite Library

slide-18
SLIDE 18

The ODIn Lab @

9

How do developers and users use Pocket Scale Data?

slide-19
SLIDE 19

The ODIn Lab @

PhoneLab

A Smartphone Platform Testbed

~200 UB students, faculty, and staff using instrumented LG Nexus 5 smartphones in exchange for discounted service.

10

slide-20
SLIDE 20

The ODIn Lab @

  • Preliminary Trial: 11 phones for ~1 month (254 phone/days)
  • Instrumented SQLite logs all statements (~45 mil statements)
  • ~33.5 million SELECT statements
  • ~9.4 million INSERT statements
  • ~1 million UPDATE statements
  • ~1.2 million DELETE statements
  • 179 distinct ‘apps’ issuing statements

PhoneLab

A Smartphone Platform Testbed

11

https://phone-lab.org/experiment/request

slide-21
SLIDE 21

The ODIn Lab @

  • SELECT Complexity
  • ORM Effects
  • Function Usage
  • Read/Write Ratios
  • Query Periodicity

12

slide-22
SLIDE 22

The ODIn Lab @

  • SELECT Complexity
  • ORM Effects
  • Function Usage
  • Read/Write Ratios
  • Query Periodicity

12

slide-23
SLIDE 23

The ODIn Lab @

SELECT Complexity

13

1 10 100 1000 10000 100000 1x106 1x107 1x108 1 2 3 4 Maximum Nesting Depth 0.1 1 10 100 1000 10000 100000 1x106 1x107 1x108 1 2 3 4 5 6 7 8 Number of SELECT Queries Number of Tables Accessed

slide-24
SLIDE 24

The ODIn Lab @

SELECT Complexity

13

1 10 100 1000 10000 100000 1x106 1x107 1x108 1 2 3 4 Maximum Nesting Depth 0.1 1 10 100 1000 10000 100000 1x106 1x107 1x108 1 2 3 4 5 6 7 8 Number of SELECT Queries Number of Tables Accessed

30 million simple “SPA” queries

slide-25
SLIDE 25

The ODIn Lab @

SELECT Complexity

13

1 10 100 1000 10000 100000 1x106 1x107 1x108 1 2 3 4 Maximum Nesting Depth 0.1 1 10 100 1000 10000 100000 1x106 1x107 1x108 1 2 3 4 5 6 7 8 Number of SELECT Queries Number of Tables Accessed

30 million simple “SPA” queries Infrequent, but extremely complex queries

slide-26
SLIDE 26

The ODIn Lab @

SELECT Complexity

14

(by app)

CDF % of SELECT Queries That Are Key-Value Queries 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

slide-27
SLIDE 27

The ODIn Lab @

SELECT Complexity

14

(by app)

CDF % of SELECT Queries That Are Key-Value Queries 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

24 / 179 apps using SQLite

  • nly as a K/V Store
slide-28
SLIDE 28

The ODIn Lab @

SELECT Complexity

15

INSERT OR REPLACE INTO properties(property_key,property_value) VALUES (?,?); SELECT property_value FROM properties WHERE property_key=?;

(These are actual real queries from the trace)

slide-29
SLIDE 29

The ODIn Lab @

100 101 102 103 104 Returned Row Count 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 CDF

SELECT Complexity

16

slide-30
SLIDE 30

The ODIn Lab @

100 101 102 103 104 Returned Row Count 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 CDF

SELECT Complexity

16

80% of SELECTs return one row

slide-31
SLIDE 31

The ODIn Lab @

100 101 102 103 104 Returned Row Count 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 CDF

SELECT Complexity

16

80% of SELECTs return one row Small % of SELECTs return 100s of rows

slide-32
SLIDE 32

The ODIn Lab @

100 101 102 103 104 Returned Row Count 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 CDF

SELECT Complexity

17

(by app)

slide-33
SLIDE 33

The ODIn Lab @

100 101 102 103 104 Returned Row Count 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 CDF

SELECT Complexity

17

(by app) Google+, GMail, Facebook, Contacts

slide-34
SLIDE 34

The ODIn Lab @

  • SELECT Complexity
  • ORM Effects
  • Function Usage
  • Read/Write Ratios
  • Query Periodicity

18

slide-35
SLIDE 35

The ODIn Lab @

19

Image courtesy of http://openclipart.org

Object-Relational Mapper

slide-36
SLIDE 36

The ODIn Lab @

19

Image courtesy of http://openclipart.org

Object-Relational Mapper pers = Persons.get(10) name = pers.firstName()

slide-37
SLIDE 37

The ODIn Lab @

19

Image courtesy of http://openclipart.org

Object-Relational Mapper pers = Persons.get(10) name = pers.firstName() SELECT first_name FROM Persons WHERE id = 10;

slide-38
SLIDE 38

The ODIn Lab @

19

Image courtesy of http://openclipart.org

Object-Relational Mapper pers = Persons.get(10) name = pers.firstName() SELECT first_name FROM Persons WHERE id = 10; SQL DB used for persisting objects

slide-39
SLIDE 39

The ODIn Lab @

20

Object-Relational Mapper

Image courtesy of http://openclipart.org

pers = Persons.get(10)

  • rg = pers.employer()

name = org.name()

slide-40
SLIDE 40

The ODIn Lab @

20

Object-Relational Mapper

Image courtesy of http://openclipart.org

pers = Persons.get(10)

  • rg = pers.employer()

name = org.name() SELECT employer_id FROM Persons WHERE id = 10; SELECT name FROM Organizations WHERE id = ?;

slide-41
SLIDE 41

The ODIn Lab @

20

Object-Relational Mapper

Image courtesy of http://openclipart.org

pers = Persons.get(10)

  • rg = pers.employer()

name = org.name() SELECT employer_id FROM Persons WHERE id = 10; SELECT name FROM Organizations WHERE id = ?; ORMs are not always efficient

slide-42
SLIDE 42

The ODIn Lab @

21

Object-Relational Mapper

Image courtesy of http://openclipart.org

pers = Persons.get(10) pers.setSalary( pers.salary() * 1.1 )

slide-43
SLIDE 43

The ODIn Lab @

21

Object-Relational Mapper

Image courtesy of http://openclipart.org

pers = Persons.get(10) pers.setSalary( pers.salary() * 1.1 ) SELECT salary FROM Persons WHERE id = 10; UPDATE Persons SET salary = ? WHERE id = 10;

slide-44
SLIDE 44

The ODIn Lab @

21

Object-Relational Mapper

Image courtesy of http://openclipart.org

pers = Persons.get(10) pers.setSalary( pers.salary() * 1.1 ) SELECT salary FROM Persons WHERE id = 10; UPDATE Persons SET salary = ? WHERE id = 10; We saw NO update value computations in SQL

slide-45
SLIDE 45

The ODIn Lab @

22

Object-Relational Mapper

Image courtesy of http://openclipart.org

pers = Persons.get(10) pers.setSalary( pers.salary() * 1.1 ) SELECT salary FROM Persons WHERE id = 10; INSERT OR REPLACE INTO Persons(id, salary) VALUES (?, 10);

slide-46
SLIDE 46

The ODIn Lab @

22

Object-Relational Mapper

Image courtesy of http://openclipart.org

pers = Persons.get(10) pers.setSalary( pers.salary() * 1.1 ) SELECT salary FROM Persons WHERE id = 10; INSERT OR REPLACE INTO Persons(id, salary) VALUES (?, 10); Insert or Replace used very frequently

slide-47
SLIDE 47

The ODIn Lab @

  • SELECT Complexity
  • ORM Effects
  • Function Usage
  • Read/Write Ratios
  • Query Periodicity

23

slide-48
SLIDE 48

The ODIn Lab @

Aggregates

24

Function Call Sites GROUP_CONCAT 583,474 SUM 321,387 MAX 314,970 COUNT 173,031 MIN 19,566 AVG 15

slide-49
SLIDE 49

The ODIn Lab @

Aggregates

24

Function Call Sites GROUP_CONCAT 583,474 SUM 321,387 MAX 314,970 COUNT 173,031 MIN 19,566 AVG 15

Aggregates most common function type

slide-50
SLIDE 50

The ODIn Lab @

Aggregates

25

Function Call Sites GROUP_CONCAT 583,474 SUM 321,387 MAX 314,970 COUNT 173,031 MIN 19,566 AVG 15

Concatenate all strings in a column: Non-algebraic

slide-51
SLIDE 51

The ODIn Lab @

Other Functions

26

slide-52
SLIDE 52

The ODIn Lab @

Other Functions

  • Mostly string manipulation (length, substr)

26

slide-53
SLIDE 53

The ODIn Lab @

Other Functions

  • Mostly string manipulation (length, substr)
  • Some Android-Specific (phone_numbers_equal)

26

slide-54
SLIDE 54

The ODIn Lab @

Other Functions

  • Mostly string manipulation (length, substr)
  • Some Android-Specific (phone_numbers_equal)
  • NO UDFs at all

26

slide-55
SLIDE 55

The ODIn Lab @

  • SELECT Complexity
  • ORM Effects
  • Function Usage
  • Read/Write Ratios
  • Query Periodicity

27

slide-56
SLIDE 56

The ODIn Lab @

Reads vs Writes

28

CDF Read/Write Ratio (100% = All Reads) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

slide-57
SLIDE 57

The ODIn Lab @

Reads vs Writes

28

CDF Read/Write Ratio (100% = All Reads) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

~15% of apps write more frequently than they read

slide-58
SLIDE 58

The ODIn Lab @

Reads vs Writes

28

CDF Read/Write Ratio (100% = All Reads) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

~15% of apps write more frequently than they read ~15% of apps do not perform a single write!

slide-59
SLIDE 59

The ODIn Lab @

Read-Only Workloads

29

slide-60
SLIDE 60

The ODIn Lab @

Read-Only Workloads

  • JuiceSSH, Key Chain
  • Credential store, infrequent writes

29

slide-61
SLIDE 61

The ODIn Lab @

Read-Only Workloads

  • JuiceSSH, Key Chain
  • Credential store, infrequent writes
  • Google Play Newsstand, Eventbrite, …
  • Frequent queries over changing data
  • Data bulk updated by copying entire SQLite DB

29

slide-62
SLIDE 62

The ODIn Lab @

  • SELECT Complexity
  • ORM Effects
  • Function Usage
  • Read/Write Ratios
  • Query Periodicity

30

slide-63
SLIDE 63

The ODIn Lab @

Query Arrival Frequency

31

100 101 102 103 104 105 106 107 108 Previous Query Arrival Time (µs) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 CDF

slide-64
SLIDE 64

The ODIn Lab @

Query Arrival Frequency

31

100 101 102 103 104 105 106 107 108 Previous Query Arrival Time (µs) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 CDF

15-20% of queries arrive ~10ms after last query

slide-65
SLIDE 65

The ODIn Lab @

By Query Type

32

100 101 102 103 104 105 106 107 108 Next Query Arrival Time (µs) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 CDF

UPDATE SELECT INSERT DELETE

100 101 102 103 104 105 106 107 108 Previous Query Arrival Time (µs) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 CDF

UPDATE SELECT INSERT DELETE

slide-66
SLIDE 66

The ODIn Lab @

By Query Type

32

100 101 102 103 104 105 106 107 108 Next Query Arrival Time (µs) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 CDF

UPDATE SELECT INSERT DELETE

100 101 102 103 104 105 106 107 108 Previous Query Arrival Time (µs) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 CDF

UPDATE SELECT INSERT DELETE

70% of inserts come less than 0.1 ms before another query

slide-67
SLIDE 67

The ODIn Lab @

By Query Type

32

100 101 102 103 104 105 106 107 108 Next Query Arrival Time (µs) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 CDF

UPDATE SELECT INSERT DELETE

100 101 102 103 104 105 106 107 108 Previous Query Arrival Time (µs) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 CDF

UPDATE SELECT INSERT DELETE

70% of inserts come less than 0.1 ms before another query Most sequences consist of INSERTs and SELECTs

slide-68
SLIDE 68

The ODIn Lab @

33

100 101 102 103 104 105 106 107 108 Previous Query Arrival Time (µs) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 CDF

Google Play services Media Storage Gmail Google+ Facebook Hangouts Android System Messenger Calendar Storage Contacts Storage

By App

slide-69
SLIDE 69

The ODIn Lab @

33

100 101 102 103 104 105 106 107 108 Previous Query Arrival Time (µs) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 CDF

Google Play services Media Storage Gmail Google+ Facebook Hangouts Android System Messenger Calendar Storage Contacts Storage

By App

Google Play Services and Media Storage are VERY bursty

slide-70
SLIDE 70

The ODIn Lab @

33

100 101 102 103 104 105 106 107 108 Previous Query Arrival Time (µs) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 CDF

Google Play services Media Storage Gmail Google+ Facebook Hangouts Android System Messenger Calendar Storage Contacts Storage

By App

Google Play Services and Media Storage are VERY bursty Same 10 ms periodicity evident across all apps

slide-71
SLIDE 71

The ODIn Lab @

A Call to Action!

  • Mobile phones process ~2 queries/second
  • DB performance important for power, latency, …
  • Embedded DBs used differently than Server DBs.
  • We need to understand these access patterns

before we can optimize for them.

34

We need a TPC-MOBILE for pocket-scale data!