Just-in-Time Data Structures Languages and Runtimes for Big Data - - PowerPoint PPT Presentation

just in time data structures
SMART_READER_LITE
LIVE PREVIEW

Just-in-Time Data Structures Languages and Runtimes for Big Data - - PowerPoint PPT Presentation

Just-in-Time Data Structures Languages and Runtimes for Big Data Updates Slack Channel #cse662-fall2017 @ http://ubodin.slack.com Reading for Monday: MCDB Exactly one piece of feedback (see next slide) Dont parrot the paper


slide-1
SLIDE 1

Just-in-Time Data Structures

Languages and Runtimes for Big Data

slide-2
SLIDE 2

Updates

  • Slack Channel
  • #cse662-fall2017 @ http://ubodin.slack.com
  • Reading for Monday: MCDB
  • Exactly one piece of feedback (see next slide)
slide-3
SLIDE 3

Don’t parrot the paper back

  • Find something that the paper says is good and

figure out a set of circumstances where it's bad.

  • What else does something similar, why is the

paper better, and under what circumstances?

  • Think of circumstances and real-world settings

where the proposed system is good.

  • Evaluation: How would you evaluate their solution

in a way that they didn’t.

slide-4
SLIDE 4

What is best in life?

(for organizing your data)

slide-5
SLIDE 5

Storing & Organizing Data

1 2 3 4 5

Binary Tree Which should you use? Sorted Array

1 2 3 4 5

Heap

5 1 2 4 3

API Insert Range Scan … and many more.

slide-6
SLIDE 6

You guessed wrong.

(Unless you didn’t)

slide-7
SLIDE 7

Workloads

Read Cost Write Cost

Sorted Array BTree Heap

Each data structure makes a fixed set of tradeoffs Which structure is best can even change at runtime

slide-8
SLIDE 8

Workloads

Read Cost Write Cost

Sorted Array BTree Heap

Many Reads Some Writes Many Reads No Reads Current Workload We want to gracefully transition between different DSes

slide-9
SLIDE 9

Traditional

Physical Layout & Logic Manipulation Logic Access Logic

Data Structures

slide-10
SLIDE 10

Physical Layout & Logic Manipulation Logic Access Logic Abstraction Layer

Data Structures Just-in-Time

slide-11
SLIDE 11

➡ Picking The Right Abstraction Accessing and Manipulating a JITD Case Study: Adaptive Indexes Experimental Results Demo

slide-12
SLIDE 12

Abstractions

Black Box

(A set of integer records) My Data

slide-13
SLIDE 13

Insertions

Let’s say I want to add a 3?

Black Box

U

This is correct, but probably not efficient

3

My Data

slide-14
SLIDE 14

Insertions

U

Insertion creates a temporary representation…

1 2 4 5 3 1 2 4 5 3

slide-15
SLIDE 15

Insertions

U

3 1 2 4 5 1 2 4 5 3

… that we can eventually rewrite into a form that is correct and efficient (once we know what ‘efficient’ means)

slide-16
SLIDE 16

Binary Tree

Traditional Data Structure Design

1 2 3 4 5

Leaf Nodes

(Maybe In a Linked List)

Inner Nodes

<

slide-17
SLIDE 17

Traditional Data Structure Design

Binary Tree Sorted Array

1 2 3 4 5

Heap

5 1 2 4 3

Contiguous Array

  • f Records
slide-18
SLIDE 18

Building Blocks

1 2 4 5 3 1 2 4 5 3

U

<

BinTree Node Concatenate Array (Sorted) Array (Unsorted) Structural Properties Semantic Properties

slide-19
SLIDE 19

Picking The Right Abstraction ➡ Accessing and Manipulating a JITD Case Study: Adaptive Indexes Experimental Results Demo

slide-20
SLIDE 20

Binary Tree Insertions

Let’s try something more complex: A Binary Tree

U

< < <

… … … …

U

3

< < <

… … … …

slide-21
SLIDE 21

U

3

< < <

… … … …

Binary Tree Insertions

U

3

< < <

… … … … A rewrite pushes the inserted object down into the tree

slide-22
SLIDE 22

Black Box 2 Black Box 2 Black Box 1

Binary Tree Insertions

U

<

U

<

Black Box 1

The rewrites are local. The rest of the data structure doesn’t matter!

slide-23
SLIDE 23

Binary Tree Insertions

Terminate recursion at the leaves

U

3

<

5 5 3

slide-24
SLIDE 24

Range Scan(low, high)

1 2 4 5 3 1 2 4 5 3

U

A B

[Recur into A] UNION [Recur into B]

<

A B

IF(sep > high) { [Recur into A] } ELSIF(sep ≤ low) { [Recur into B] } ELSE { [Recur into A] UNION [Recur into B] }

Full Scan 2x Binary Search

slide-25
SLIDE 25

Synergy

slide-26
SLIDE 26

Hybrid Insertions

<

1 2 4 5

U

3

slide-27
SLIDE 27

Hybrid Insertions

<

1 2 4 5

U

3 1 2 4 5

U

3

<

BinTree Rewrite

slide-28
SLIDE 28

Hybrid Insertions

<

1 2 4 5

U

3 1 2 4 5

U

3

<

Binary Tree Rewrite Sorted Array Rewrite

1 2

<

3 4 5

slide-29
SLIDE 29

Synergy

<

1 2 4 5

U

3 1 2 4 5

U

3

<

Binary Tree Rewrite Binary Tree Leaf Rewrite

1 2

<

3 4 5

<

Which rewrite gets used depends on workload-specific policies.

slide-30
SLIDE 30

Picking The Right Abstraction Accessing and Manipulating a JITD ➡ Case Study: Adaptive Indexes Experimental Results Demo

slide-31
SLIDE 31

Adaptive Indexes

Your Index Your Workload

slide-32
SLIDE 32

Adaptive Indexes

← Time Your Index Your Workload

slide-33
SLIDE 33

Adaptive Indexes

← Time Your Index Your Workload

slide-34
SLIDE 34

Range-Scan Adaptive Indexes

Start with an Unsorted List of Records Converge to a Binary Tree or Sorted Array

  • Cracker Index
  • Converge by emulating quick-sort
  • Adaptive Merge Trees
  • Converge by emulating merge-sort
slide-35
SLIDE 35

5

Cracker Indexes

1 2 4 3

Read [2,4)

slide-36
SLIDE 36

Cracker Indexes

Read [2,4)

1 2 4 5 3 1 2 4 5 3

[2,4) [4,∞) [-∞,2)

Read [1,3)

Answer

Radix Partition on Query Boundaries (Don’t Sort)

slide-37
SLIDE 37

1 2 4 5 3

Cracker Indexes

Read [2,4)

1 2 4 5 3 1 2 4 5 3

[2,3) [4,∞) [1,2)

Read [1,3)

[3,4) Answer

Each query does less and less work

slide-38
SLIDE 38

Rewrite-Based Cracking

5 1 2 4 3

Read [2,4)

slide-39
SLIDE 39

Rewrite-Based Cracking

1 2 4 5 3

In-Place Sort as Before

slide-40
SLIDE 40

Rewrite-Based Cracking

1 2 4 5 3 <2 <4

Fragment and Organize

slide-41
SLIDE 41

Rewrite-Based Cracking

1 2 4 5 3 <2 <4 <3

Continue fragmenting as queries arrive. (Can use Splay Tree For Balance)

slide-42
SLIDE 42

Adaptive Merge Trees

5 1 2 4 3

Before the first query, partition data…

slide-43
SLIDE 43

Adaptive Merge Trees

5 1 2 4 3

…and build fixed-size sorted runs

slide-44
SLIDE 44

Adaptive Merge Trees

5 1 2 4 3

Merge only relevant records into target array Read [2,4)

slide-45
SLIDE 45

Adaptive Merge Trees

5 1 2 4 3

Merge only relevant records into target array Read [2,4)

slide-46
SLIDE 46

Adaptive Merge Trees

5 1 2 4 3

Continue merging as new queries arrive Read [1,3)

slide-47
SLIDE 47

Rewrite-Based Merging

5 1 2 4 3

slide-48
SLIDE 48

Adaptive Merge Trees

5 1 2 4 3

Rewrite any unsorted array into a union of sorted runs

U

slide-49
SLIDE 49

Adaptive Merge Trees

5 1 2 4 3

Method 1: Merge Relevant Records into LHS Run (Sub-Partition LHS Runs to Keep Merges Fast) Read [2,4)

U

<3

slide-50
SLIDE 50

Adaptive Merge Trees

5 1 2 4 3

  • r…

U

slide-51
SLIDE 51

Adaptive Merge Trees

5 1 2 4 3

Method 2: Partition Records into High/Mid/Low (Union Back High & Low Records) Read [2,4)

<2 <4

U

slide-52
SLIDE 52

Synergy

  • Cracking creates smaller unsorted arrays, so fewer

runs are needed for adaptive merge

  • Sorted arrays don’t need to be cracked!
  • Insertions naturally transformed into sorted runs.
  • (not shown) Partial crack transform pushes newly

inserted arrays down through merge tree.

slide-53
SLIDE 53

Picking The Right Abstraction Accessing and Manipulating a JITD Case Study: Adaptive Indexes ➡ Experimental Results Demo

slide-54
SLIDE 54

Experiments

Cracker Index Adaptive Merge Tree vs vs JITDs API

  • RangeScan(low, high)
  • Insert(Array)

Gimmick

  • Insert is Free.
  • RangeScan uses work

done to answer the query to also organize the data.

slide-55
SLIDE 55

Experiments

vs vs JITDs Less organization per-read More organization per-read Cracker Index Adaptive Merge Tree

slide-56
SLIDE 56

1e-05 0.0001 0.001 0.01 0.1 1 10 2000 4000 6000 8000 10000 Time (s) Iteration Reads 1e-05 0.0001 0.001 0.01 0.1 1 10 2000 4000 6000 8000 10000 Time (s) Iteration Reads

Cracker Index Adaptive Merge Tree 100 M records (1.6 GB) 10,000 reads for 2-3 k records each 10M additional records written after 5,000 reads

slide-57
SLIDE 57

Bimodal Distribution Super-High Initial Costs

33s (not shown)

1e-05 0.0001 0.001 0.01 0.1 1 10 2000 4000 6000 8000 10000 Time (s) Iteration Reads 1e-05 0.0001 0.001 0.01 0.1 1 10 2000 4000 6000 8000 10000 Time (s) Iteration Reads

Cracker Index Adaptive Merge Tree Slow Convergence

slide-58
SLIDE 58

1e-05 0.0001 0.001 0.01 0.1 1 10 2000 4000 6000 8000 10000 Time (s) Iteration Reads

Policy 1: Swap

(Crack for 2k reads after write, then merge)

slide-59
SLIDE 59

1e-05 0.0001 0.001 0.01 0.1 1 10 2000 4000 6000 8000 10000 Time (s) Iteration Reads

Policy 1: Swap

(Crack for 2k reads after write, then merge)

Switchover from Crack to Merge

slide-60
SLIDE 60

1e-05 0.0001 0.001 0.01 0.1 1 10 2000 4000 6000 8000 10000 Time (s) Iteration Reads

Synergy from Cracking (lower upfront cost)

Policy 1: Swap

(Crack for 2k reads after write, then merge)

slide-61
SLIDE 61

Policy 2: Transition (Gradient from Crack to Merge at 1k)

1e-05 0.0001 0.001 0.01 0.1 1 10 2000 4000 6000 8000 10000 Time (s) Iteration Reads

slide-62
SLIDE 62

1e-05 0.0001 0.001 0.01 0.1 1 10 2000 4000 6000 8000 10000 Time (s) Iteration Reads

Gradient Period (% chance of Crack or Merge)

Policy 2: Transition (Gradient from Crack to Merge at 1k)

slide-63
SLIDE 63

1e-05 0.0001 0.001 0.01 0.1 1 10 2000 4000 6000 8000 10000 Time (s) Iteration Reads

Tri-modal distribution: Cracking and Merging

  • n a per-operation basis

Policy 2: Transition (Gradient from Crack to Merge at 1k)

slide-64
SLIDE 64

Overall Throughput

1 10 100 1000 10000 2000 4000 6000 8000 10000 Throughput (ops/s) Iteration Cracking Merge Swap Transition

JITDs allow fine-grained control over DS behavior

slide-65
SLIDE 65

Just-in-Time Data Structures

  • Separate logic and structure/semantics
  • Composable Building Blocks
  • Local Rewrite Rules
  • Result: Flexible, hybrid data structures.
  • Result: Graceful transitions between different behaviors.
  • https://github.com/UBOdin/jitd

Questions?