[PPT] - Introducing Switch: a framework for custom data applications Josh PowerPoint Presentation

SLIDE 1

Introducing Switch: a framework for custom data applications

Josh Ferguson Chief Architect @ Mode josh@modeanalytics.com

SLIDE 2

we’re going to talk about building tools to make better decisions with data

SLIDE 3

i’ve been obsessed with building data tools for about 20 years

SLIDE 4

@besquared almost everywhere

SLIDE 5

mode(.com)

SLIDE 6

a collaborative data science platform

SLIDE 7

ur users are data scientists, analysts, and engineers

SLIDE 8

help everybody make better decisions with data

SLIDE 9

we’re here to talk about data applications

SLIDE 10

custom data applications

SLIDE 11

what’s a custom data application?

SLIDE 12

BUSINESS OPERATIONS (SORTED BY SPECIFICITY)

well supported by companies and tools this long tail is our competitive advantage

INDUSTRY STANDARD DOMAIN SPECIFIC LEVEL OF OFF-THE-SHELF TOOL SUPPORT

SLIDE 13

there's no collection of off-the-shelf tools that will provide everything our organization needs to make better decisions with data

SLIDE 14

this is where we should focus

SLIDE 15

logistics tracking and monitoring

SLIDE 16

customer health monitoring tools for success

SLIDE 17

a/b testing tools for our product team

SLIDE 18

...

SLIDE 19

SLIDE 20

SLIDE 21

everyone one of these apps is a one-off today

SLIDE 22

SLIDE 23

switch is a collection of typescript libraries and tools that let us build richer and more interactive data applications

SLIDE 24

the data layer between our database and

ur user interface

SLIDE 25

it lets us address some of the major challenges we face when we’re building

ur data apps

SLIDE 26

challenge number one

SLIDE 27

CHALLENGE #1

ur users always want to slice and dice their data

in ways that we don’t anticipate

SLIDE 28

CHALLENGE #1

we don’t know what we’ll need ahead of time

SLIDE 29

CHALLENGE #1

we can’t build a new etl pipeline or deploy

ur app every time we need to answer a

slightly different question

SLIDE 30

CHALLENGE #1

we should give our users the tools they need to quickly and easily express data in new and different ways on their own

SLIDE 31

Introducing Formulas

SLIDE 32

an excel-like language for data expression

SLIDE 33

they let our users build custom calculations, even if they’re not database or programming language experts

SLIDE 34

what can they do with them?

SLIDE 35

unlike excel whose formulas operate on cells,

ur formulas operate on entire datasets at a time

SLIDE 36

FORMULAS

sample dataset

ID Date Product Quantity Price Filled 1 2019-01-01 A 10 10.00 true 2 2019-01-02 B 5 20.00 false ... ... ... ... ...

SLIDE 37

[Price] / [Quantity]

EXAMPLES

calculate ratios!

SLIDE 38

Dollar to cents [Price] * 100

EXAMPLES

convert units!

SLIDE 39

CASE [Product] WHEN “A,” THEN “A” ELSE [Product] END

EXAMPLES

clean data!

SLIDE 40

AVG([Price] / [Quantity])

EXAMPLES

aggregate data!

SLIDE 41

LOOKUP(AVG([Price]), FIRST())

EXAMPLES

lookup values!

SLIDE 42

what else!?

SLIDE 43

NULL

LITERALS

nulls

SLIDE 44

TRUE FALSE

LITERALS

booleans

SLIDE 45

42

1000 3.1415926 0xBEEF

LITERALS

numbers

SLIDE 46

‘Category’ “Product Name”

LITERALS

strings

SLIDE 47

#2019-04-18# #2019-04-18T10:50:15#

LITERALS

dates

SLIDE 48

/[\w\d]+/ig

LITERALS

regular expressions

SLIDE 49

[Product] [Quantity]

ACCESS

data access

SLIDE 50

[Quantity] * 500 [Quantity] / 500 [Quantity] + 500 [Quantity] - 500 [Quantity] % 500

OPERATORS

mathematic

SLIDE 51

[Quantity] = 500 [Quantity] <> 500 [Quantity] < 500 [Quantity] <= 500 [Quantity] > 500 [Quantity] >= 500

OPERATORS

relational

SLIDE 52

NOT [Filled] [Filled] AND [Quantity] > 500 [Filled] OR [Quantity] <= 500

OPERATORS

logical

SLIDE 53

CASE [Filled] WHEN TRUE THEN “Filled” WHEN FALSE THEN “Unfilled” ELSE “Unknown” END

CONDITIONAL

case

SLIDE 54

NOW()

FUNCTIONS

constant

SLIDE 55

FLOOR([Price]) TRIM([Product]) DATETRUNC(‘day’, [Date])

FUNCTIONS

scalar

SLIDE 56

SUM([Price]) AVG([Quantity]) COUNTD([Product])

FUNCTIONS

aggregate

SLIDE 57

RANK(SUM([Quantity])) RUNNING_SUM(COUNT([Price])) LOOKUP(AVG([Price]), FIRST())

FUNCTIONS

analytic

SLIDE 58

that’s it, simple and powerful

SLIDE 59

we can build interfaces that let users extend our apps with their own business logic and calculations

SLIDE 60

for example at Mode we’re working on a formula editor that lets our users add custom calculations to their visualizations

SLIDE 61

a single formula that takes someone a few minutes to write might take hours or days to implement and deploy otherwise

SLIDE 62

not having to build etl pipelines or write app code every time we want to answer a different question amplifies our effort 100x

SLIDE 63

that’s pretty rad

SLIDE 64

let’s keep going and see how we use formulas to query our data

SLIDE 65

challenge number two

SLIDE 66

CHALLENGE #2

getting from data to visualization

SLIDE 67

CHALLENGE #2

a common characteristic of custom data apps is custom data visualizations

SLIDE 68

CHALLENGE #2

we don’t want to write ad-hoc data transformation code every time we want to build a visualization

SLIDE 69

CHALLENGE #2

we should use a language that let’s us describe the data we need in way that matches the visualizations we’re trying to build

SLIDE 70

Introducing Queries

SLIDE 71

ur queries speak the language of data visualization

SLIDE 72

QUERIES

grammar

f graphics

SLIDE 73

most of the visualizations that we can encode with tools like vega-lite can be translated directly into switch queries

SLIDE 74

how do they work?

SLIDE 75

we define the data we want in our query

SLIDE 76

Field { formula: string; }

QUERIES

we use fields which are defined with a formula

SLIDE 77

Field { formula: string; }

QUERIES

they let us describe the data and calculations we want to get back in our query result

SLIDE 78

“SUM([Quantity])” “[Price] / [Quantity]” “DATETRUNC(‘day’, [Date])”

QUERIES

they’re the atomic unit of data in a query

SLIDE 79

Names { formula: “$[Names]”; } Values { formula: “$[Values]”; }

QUERIES

there are two pre-defined fields called names and values

SLIDE 80

NAMES/VALUES

they let us combine multiple aggregate fields together into a single field

Names { formula: “$[Names]”; } Values { formula: “$[Values]”; }

SLIDE 81

QUERIES

we’ve got filters

Filter { field: Field; conds: Conditions; }

SLIDE 82

QUERIES

they let us get rid

f data we don’t

want by adding conditions on our fields

Filter { field: Field; conds: Conditions; }

SLIDE 83

QUERIES

we’ve got sorts

Sort { field: Field; type: SortType;

rder: SortOrder;

}

SLIDE 84

QUERIES

they let us re-arrange our result by adding

rders to our fields

Sort { field: Field; type: SortType;

rder: SortOrder;

}

SLIDE 85

we map our data to our visualization

SLIDE 86

QUERIES

the first way to do that is with marks

Mark { field: Field; color: Field[]; size: Field[]; label: Field[]; ... }

SLIDE 87

QUERIES

marks are how we describe the layers

f our visualization

Mark { field: Field; color: Field[]; size: Field[]; label: Field[]; ... }

SLIDE 88

MARKS

every layer is defined by a single field

Mark { field: Field; color: Field[]; size: Field[]; label: Field[]; ... }

SLIDE 89

MARKS

it’s got channels like color, size, and label, that let us map fields to visual properties

Mark { field: Field; color: Field[]; size: Field[]; label: Field[]; ... }

SLIDE 90

MARKS

we can map as many channels as we want based on the needs of our visualization

Mark { field: Field; color: Field[]; size: Field[]; label: Field[]; ... }

SLIDE 91

QUERIES

using marks and the other pieces we talked about we can build a complete visual mapping which we call a pivot query

PivotQuery { column: Field[]; x: Field[]; row: Field[]; y: Field[]; values: Field[]; marks: Mark[]; filters: Filter[]; sorts: Sort[]; }

SLIDE 92

QUERIES

marks, filters, and sorts

PivotQuery { column: Field[]; x: Field[]; row: Field[]; y: Field[]; values: Field[]; marks: Mark[]; filters: Filter[]; sorts: Sort[]; }

SLIDE 93

PIVOT QUERY

more channels

PivotQuery { column: Field[]; x: Field[]; row: Field[]; y: Field[]; values: Field[]; marks: Mark[]; filters: Filter[]; sorts: Sort[]; }

SLIDE 94

PIVOT QUERY

column and row which let us facet data across or down our visualization

PivotQuery { column: Field[]; x: Field[]; row: Field[]; y: Field[]; values: Field[]; marks: Mark[]; filters: Filter[]; sorts: Sort[]; }

SLIDE 95

PIVOT QUERY

x and y which let us position data across or down

ur visualization

within those facets

PivotQuery { column: Field[]; x: Field[]; row: Field[]; y: Field[]; values: Field[]; marks: Mark[]; filters: Filter[]; sorts: Sort[]; }

SLIDE 96

PIVOT QUERY

values which let’s us combine all of the fields in it into a single field that we can use in the

ther channels

PivotQuery { column: Field[]; x: Field[]; row: Field[]; y: Field[]; values: Field[]; marks: Mark[]; filters: Filter[]; sorts: Sort[]; }

SLIDE 97

SLIDE 98

QUERIES

a beautiful chart

SLIDE 99

PivotQuery { x: [ “DATETRUNC(‘day’, [Date])” ], y: [ “$[Values]” ], values: [ “SUM([Price])”, “RUNNING_SUM(SUM([Quantity]))” ], marks: [{ field: “$[Values]”, color: [ “$[Names]” ] }] }

QUERIES

a beautiful query

SLIDE 100

EXAMPLE

day on the x axis

PivotQuery { x: [ “DATETRUNC(‘day’, [Date])” ], y: [ “$[Values]” ], values: [ “SUM([Price])”, “RUNNING_SUM(SUM([Quantity]))” ], marks: [{ field: “$[Values]”, color: [ “$[Names]” ] }] }

SLIDE 101

EXAMPLE

values field on the y axis

PivotQuery { x: [ “DATETRUNC(‘day’, [Date])” ], y: [ “$[Values]” ], values: [ “SUM([Price])”, “RUNNING_SUM(SUM([Quantity]))” ], marks: [{ field: “$[Values]”, color: [ “$[Names]” ] }] }

SLIDE 102

EXAMPLE

sum of price and a running sum of quantity in values

PivotQuery { x: [ “DATETRUNC(‘day’, [Date])” ], y: [ “$[Values]” ], values: [ “SUM([Price])”, “RUNNING_SUM(SUM([Quantity]))” ], marks: [{ field: “$[Values]”, color: [ “$[Names]” ] }] }

SLIDE 103

EXAMPLE

a single layer so we’ve got one mark

PivotQuery { x: [ “DATETRUNC(‘day’, [Date])” ], y: [ “$[Values]” ], values: [ “SUM([Price])”, “RUNNING_SUM(SUM([Quantity]))” ], marks: [{ field: “$[Values]”, color: [ “$[Names]” ] }] }

SLIDE 104

EXAMPLE

defined by our values field

PivotQuery { x: [ “DATETRUNC(‘day’, [Date])” ], y: [ “$[Values]” ], values: [ “SUM([Price])”, “RUNNING_SUM(SUM([Quantity]))” ], marks: [{ field: “$[Values]”, color: [ “$[Names]” ] }] }

SLIDE 105

EXAMPLE

within that layer we want to see two distinct series each with its own color so we add names to our color channel

PivotQuery { x: [ “DATETRUNC(‘day’, [Date])” ], y: [ “$[Values]” ], values: [ “SUM([Price])”, “RUNNING_SUM(SUM([Quantity]))” ], marks: [{ field: “$[Values]”, color: [ “$[Names]” ] }] }

SLIDE 106

PivotQuery { x: [ “DATETRUNC(‘day’, [Date])” ], y: [ “$[Values]” ], values: [ “SUM([Price])”, “RUNNING_SUM(SUM([Quantity]))” ], marks: [{ field: “$[Values]”, color: [ “$[Names]” ] }] }

SLIDE 107

ver time it becomes second nature

SLIDE 108

nce we learn to speak the language our

ability to quickly transform and visualize data is increased by 10x

SLIDE 109

challenge number three

SLIDE 110

CHALLENGE #3

ur datasets are millions and billions of rows

and growing

SLIDE 111

CHALLENGE #3

we can’t constantly move it around or try to materialize everything we might need to analyze ahead of time

SLIDE 112

CHALLENGE #3

we should work with our data as it exists in the places where it already lives

SLIDE 113

Introducing Processors

SLIDE 114

they’re the secret sauce

SLIDE 115

they make it possible for our data apps to take advantage of the high performance and massive scale of the databases we already have

SLIDE 116

they’re our database’s analytical co-pilots

SLIDE 117

what do we mean by that?

SLIDE 118