Molham Aref Reinventing the Database for AI July 19, 2019 1 We are - - PowerPoint PPT Presentation

molham aref
SMART_READER_LITE
LIVE PREVIEW

Molham Aref Reinventing the Database for AI July 19, 2019 1 We are - - PowerPoint PPT Presentation

Molham Aref Reinventing the Database for AI July 19, 2019 1 We are a mission-based team AI and ML 42 Scientific Impact Industrial Impact $250M Deep computer science and mathematical expertise from direct value core team members several


slide-1
SLIDE 1

1

Molham Aref

Reinventing the Database for AI

July 19, 2019

slide-2
SLIDE 2

2

We are a mission-based team

Scientific Impact

Deep computer science and mathematical expertise from several technical communities:

  • Database systems and

theory

  • Machine learning
  • Programming languages
  • Operations research

2K+ publications 90K+ citations (35K+ in last 5 years) 37+ award-winning papers (3 this year!)

AI and ML Industrial Impact

42

core team members

22

PhDs

6

former professors

$250M

direct value created

4

AI/ML companies Founded

16

faculty network

$2B

total value created

slide-3
SLIDE 3

3

The Case for Relational Artificial Intelligence

A New Technology Category

slide-4
SLIDE 4

4

Databases should be Relational

What if I tell you

slide-5
SLIDE 5

5

Navigational vs Relational

In the Navigational vs Relational DB wars of the 1980’s, Navigational DB’s were the incumbent and Relational DBs were the underdog! Not Controversial but it used to be

5

slide-6
SLIDE 6

6

database

The Great Debate

slide-7
SLIDE 7

7

Navigational Relational

1974

slide-8
SLIDE 8

8

Navigational Relational

1974

Weighing in with: § Turing Award for Databases § Integrated Data Store (IDS) § Illustrious career at GE and Honeywell Argument: § Performance (it’s impossible to implement the relational model efficiently) § Programmers won’t get it (Cobol programmers can’t possibly understand relational languages)

Charles Bachman

Weighing in with: § Researcher at IBM Argument: § Separation of the What from the How (Argument for declarativity) § Domain experts will get it (and they are cheaper and more plentiful than programmers)

Ted Codd

slide-9
SLIDE 9

9

Navigational Relational

Weighing in with: § Turing Award for Databases § Integrated Data Store (IDS) § Illustrious career at GE and Honeywell Argument: § Performance (it’s impossible to implement the relational model efficiently) § Programmers won’t get it (Cobol programmers can’t possibly understand relational languages)

Charles Bachman

Weighing in with: § Researcher at IBM Argument: § Separation of the What from the How (Argument for declarativity) § Domain experts will get it (and they are cheaper and more plentiful than programmers)

Ted Codd

SO WHO WON?

1974

slide-10
SLIDE 10

10

Oracle (formerly Relational Software, Inc.)

§ Launched RDBMS in 1979 § IPO in 1986 § Current Market Cap: $190. $190.6B 6B

slide-11
SLIDE 11

11

Ingres (formerly Relational Technology, Inc.)

§ Launched RDBMS in 1981 § IPO’d in 1988 (sold prematurely to ASK in 1989)

~MIIGRES

2,000,000 Sharer

Relational Technology, Inc.

Common Stock

Of the 2,000,000 shares of Common Stock offered hereby, 1,500,000 shares are being sold by the Company and 500,000 shares are being sold by the Selling Stockholders. See "Principal and Selling Stockholders." The Company will not receive any of the proceeds from the sale of the shares by the Selling Stockholders. Prior to this offering, there has been no public market for the Common Stock of the Company. For the factors to be considered in determining the initial public offering price, see "Underwriting." See "Risk Factors" for a discussion of certain factors which should be considered by prospective purchasers of the Common Stock offered hereby. THESE SECURITIES HAVE NOT BEEN APPROVED OR DISAPPROVED BY THE SECURITIES AND EXCHANGE COMMISSION NOR HAS THE COMMISSION PASSED UPON THE ACCURACY OR ADEQUACY OF THIS PROSPECTUS. ANY REPRESENTATION TO THE CONTRARY IS A CRIMINAL OFFENSE.

Initial Public Underwriting Proceeds to Proceeds to Offering Price Discount(1) Company (2) Selling Stockholders(2)

...............

Per Share. $1 4.00 $0.98 $1 3.02 $1 3.02

.................

Total (3) $28,000,000 $1,960,000 $1 9,530,000 $651 0,000 (1 ) The Company and the Selling Stockholders have agreed to indemnify the Underwriters against certain liabilities, including liabilities under the Securities Act of 1933. (2) Before deducting estimated expenses of $714,920 payable by the Company and $240,806 payable by the Selling Stockholders. (3) The Company has granted the Underwriters an option for 30 days to purchase up to an additional 300,000 shares at the initial public offering price per share, less the underwriting discount, solely to cover over-allotments. If such option is exercised in full, the total initial public

  • ffering price, underwriting discount and proceeds

to Company will be $32,200,000, $2,254,000, and $23,436,000, respectively. See "Underwriting." The shares are offered severally by the Underwriters, as specified herein, subject to receipt and acceptance by them and subject to their right to reject any order in whole or in part. It is expected that certificates for the shares will be ready for delivery at the offices of Goldman, Sachs & Co., New York, New York on or about May 24, 1988.

Goldman, Sachs & Co. Robertson, Colman & Stephens

The date of this Prospectus is May 17, 1988.

MANAGEMENT Executive Officers and Directors The executive officers and directors of the Company and their ages as of March 31, 1988 are as follows:

Name

  • Age

Position

Gary J. Morgenthaler ............. 39 Paul E. Newton.. ................. 44 Nicholas Birtles. .................. 43

.....................

Robert Healy 45

...............

Lawrence A. Rowe 39

  • P. Michael Seashols ..............

42 William M. Smartt.. ............... 45 Martin J. Sprinzen ................ 40 Eugene Wong .................... 53

...............

Robert C. Miller (1 ) 44

..........

Charles G. Moore (1 ) (2) 44

...........

Michael R. Stonebraker 44 William H. Younger, Jr. (1 ) (2) .... 38 Chairman of the Board, Chief Executive Officer and Director President, Chief Operating Officer and Director Vice President, lnternational Operations Vice President, Marketing Vice President, Advanced Development Vice President, Sales and Marketing Vice President, Finance and Administration and Chief Financial Officer Vice President, Engineering Secretary Director Director Director Director (1 ) Member of the Compensation Committee (2) Member of the Audit Committee All directors hold office until the next annual meeting of stockholders of the Company and until their successors have been duly elected and qualified. Executive officers serve at the discretion of the Board. There are no family relationships among any of the directors and officers.

  • Mr. Morgenthaler, a founder of the Company, has served as Chief Executive Officer and

Chairman of the Board of Directors of the Company since early 1987. He served as President and Chief Executive Officer from January 1984 to early 1987, and as Executive Vice President and Chief Operating Officer from October 1980 to January 1984. Mr. Morgenthaler has served as a director of the Company since its inception. Prior to founding the Company, he was a consultant with McKinsey & Company, Inc., a management consulting firm. Mr. Morgenthaler holds a B.A. from Harvard University.

  • Mr. Newton has served as President

and Chief Operating Officer and a director of the Company since early 1987. Between 1968 and early 1987, Mr. Newton was employed in various positions by UCCEL Corporation, a computer services and software company. Between 1984 and 1986, Mr. Newton served as Senior Vice President and General Manager of Software at UCCEL. Mr. Newton holds a B.S. in physics and an M.S. in management from M.I.T.

  • Mr. Birtles joined the Company in 1984 as Managing Director of European Operations and

became Vice President, lnternational Operations in early 1986. Prior to his employment by the Company, Mr. Birtles was employed for 13 years by Comshare, a computer services company, most recently as its European Sales Director, where he was responsible for 12 sales offices in the United Kingdom.

  • Mr. Healy has served as Vice President, Marketing of the Company since April 1987. From 1983

to April 1987, Mr. Healy served as Senior Vice President, Marketing and lnternational Sales of General Electric Software International, a software manufacturing and distribution company. Be- tween 1969 and 1983, he held various positions, most recently, Division Vice President, Marketing, for Automatic Data Processing, Inc., an electronic data processing service company. Mr. Healy holds a B.S. in business administration from Upsala College.

~MIIGRES

2,000,000 Sharer

Relational Technology, Inc.

Common Stock

Of the 2,000,000 shares of Common Stock offered hereby, 1,500,000 shares are being sold by the Company and 500,000 shares are being sold by the Selling Stockholders. See "Principal and Selling Stockholders." The Company will not receive any of the proceeds from the sale of the shares by the Selling Stockholders. Prior to this offering, there has been no public market for the Common Stock of the Company. For the factors to be considered in determining the initial public offering price, see "Underwriting." See "Risk Factors" for a discussion of certain factors which should be considered by prospective purchasers of the Common Stock offered hereby. THESE SECURITIES HAVE NOT BEEN APPROVED OR DISAPPROVED BY THE SECURITIES AND EXCHANGE COMMISSION NOR HAS THE COMMISSION PASSED UPON THE ACCURACY OR ADEQUACY OF THIS PROSPECTUS. ANY REPRESENTATION TO THE CONTRARY IS A CRIMINAL OFFENSE. Initial Public Underwriting Proceeds to Proceeds to Offering Price Discount(1) Company (2) Selling Stockholders(2)

...............

Per Share. $1 4.00 $0.98 $1 3.02 $1 3.02

.................

Total (3) $28,000,000 $1,960,000 $1 9,530,000 $651 0,000 (1 ) The Company and the Selling Stockholders have agreed to indemnify the Underwriters against certain liabilities, including liabilities under the Securities Act of 1933. (2) Before deducting estimated expenses of $714,920 payable by the Company and $240,806 payable by the Selling Stockholders. (3) The Company has granted the Underwriters an option for 30 days to purchase up to an additional 300,000 shares at the initial public offering price per share, less the underwriting discount, solely to cover over-allotments. If such option is exercised in full, the total initial public
  • ffering price, underwriting discount and proceeds
to Company will be $32,200,000, $2,254,000, and $23,436,000, respectively. See "Underwriting." The shares are offered severally by the Underwriters, as specified herein, subject to receipt and acceptance by them and subject to their right to reject any order in whole or in part. It is expected that certificates for the shares will be ready for delivery at the offices of Goldman, Sachs & Co., New York, New York on or about May 24, 1988.

Goldman, Sachs & Co. Robertson, Colman & Stephens

The date of this Prospectus is May 17, 1988.
slide-12
SLIDE 12

12 12

DB-Engines Ranking May 2019

The DB-Engines Ranking ranks database management systems according to their popularity. The ranking is updated monthly. Relational DBMS

  • 1. Oracle

Relational DBMS

  • 2. MySQL

Relational DBMS

  • 3. Microsoft SQL Server

Relational DBMS

  • 4. PostgresSQL

RDBMS Popularity

12

slide-13
SLIDE 13

13 13

Analysts agree

13

slide-14
SLIDE 14

14 14

Why?

14

slide-15
SLIDE 15

15 15

Business Intelligence should be Relational

What if I tell you

15

slide-16
SLIDE 16

16 16

MOLAP vs ROLAP

In the Multidimensional (i.e. Tensor) vs Relational OLAP wars of the 1990’s, MOLAP was the incumbent and ROLAP was the underdog! Not Controversial but it used to be

16

slide-17
SLIDE 17

17

§ Launched in 2002 § IPO in 2013 § Current Market Cap: $11.6B

Tableau Software

slide-18
SLIDE 18

18 18

Analysts agree

18

slide-19
SLIDE 19

19 19

Why?

19

slide-20
SLIDE 20

20 20

Artificial Intelligence should be Relational

What if I tell you

20

slide-21
SLIDE 21

21 21

No way!! Relational systems are too slow! Tensors and linear algebra are the way we’ve always done it What if I tell you

21

slide-22
SLIDE 22

22 22

Relational Artificial Intelligence is Inevitable

I am here to tell you

22

slide-23
SLIDE 23

23

Why?

Rest of the talk

slide-24
SLIDE 24

24 24

“We track about 47 different hardware startups that all have a unique approach” to accelerating AI. Greg Brockman, CTO OpenAI, interviewed by Reid Hoffman, May 30, 2019 “13 private chip companies focused on the AI market have raised more than $1.2 billion in venture-capital funding”

  • Barron’s article “AI Chip Market Will Soar to $34 Billion in Five Years”, Feb 20, 2019

“Today the job of training machine learning models is limited by compute, if we had faster processors we’d run bigger models...in practice we train on a reasonable subset of data that can finish in a matter of months. We could use improvements

  • f several orders of magnitude – 100x or greater.”

Greg Diamos, Senior Researcher, SVAIL, Baidu, From EE Times – September 27, 2016

The Need for Speed

24

slide-25
SLIDE 25

25 25

AI’s biggest challenges are computational!

ACCURACY

Search for better

▪ Parameters ▪ Hyper parameters ▪ Features ▪ Models

Don’t make assumptions that you don’t need to make (e.g. i.i.d. assumption)

ROBUSTNESS

Many ”big data” problems are really a big collection of small data problems Overcome challenges with small, incomplete, and dirty data problems by incorporating prior knowledge and expertise

INTERPRETABILITY

Searching for models that are accurate and interpretable is harder than searching for accurate models Interpretation in terms of prior knowledge and in language &

  • ntology that humans

understand

VERSATILITY

Reasoning and (generalized) inference: From observations to unknowns in any time period Inference of any property in the model (e.g., it’s just as easy to infer price from sales as it is to infer sales from price)

FAIRNESS

It’s not enough to exclude gender, ethnicity, race, age, etc as features to the models. Other features might be correlated. Prejudice is a computational limitation: Reasoning about each person vs reasoning about the group

EXPLAINABILITY

Explainability typically implanted via separate shadow models that have to be learned Explanation in terms of prior knowledge and in language &

  • ntology that humans

understand

CAUSALITY

Understanding causality beyond A/B testing Computationally very expensive

SELF-SUPERVISION

“The future will be self- supervised” Yann LeCun Build models of the world by

  • bserving it and searching

model space for the models that have the most explanatory power

slide-26
SLIDE 26

26 26

Constant factors – Do same amount of work faster (i.e., brawn)

▪ Latency hiding: Memory hierarchy and network latencies (e.g., in memory and near-data computing) ▪ Parallelization: SIMD, multi-core, accelerators (e.g., GPU, TPU, FPGA) ▪ Specialization: Specialize for workload (e.g., JIT compilation), specialize for data

The Path to Performance: Brawn

26

slide-27
SLIDE 27

27 27

Asymptotics – Do less work (i.e., brains)

  • Specialize algorithm by exploiting problem structure
  • Algebraic (e.g., groups, semi rings, rings)
  • Combinatorial (e.g., fractional hypertree width)
  • Statistical (e.g., samples and sketches)
  • Geometric (e.g., fast multipole method)
  • Solve similar but more tractable problem
  • Approximation (with error bars)

The Path to Performance: Brains and Brawn

27

slide-28
SLIDE 28

28

Brains

Do Less Work

slide-29
SLIDE 29

29 29 29

The relational model dominates data management

▪ The last 40 years have witnessed massive adoption

  • f the relational model

▪ It’s hard to find any examples today of enterprises whose data isn’t in a relational database ▪ Millions of human hours invested in building relational models and populating them with data ▪ Relational databases are rich with knowledge

  • f the underlying domains that they model

▪ The availability and accuracy of large amounts of curated data has made it possible for humans (BI) and machines (AI) to learn from the past and to predict the future

slide-30
SLIDE 30

30 30

What’s the first thing we do when we build predictive models?

ID x1 x2 x3 ... y

We work hard to throw away all relational structure (and semi-structure) we worked so hard to build We end up throwing away important domain knowledge that can help us build better AI models Features Examples Feature extraction query

30

slide-31
SLIDE 31

31 31

The wastefulness does not end there

Features w/zero filling Training Samples One-hot encoded features

31

slide-32
SLIDE 32

32 32

The wastefulness does not end there

Features Training Samples One-hot encoded features

Revisit from first principles

  • Avoid materializing the join
  • Avoid filling in the zeros
  • Avoid one-hot encoding
  • Exploit relational structures to speed up learning
  • Ideally, train models faster than the time it takes to

produce the query output in the first place!

32

slide-33
SLIDE 33

33 33

What would a database do?

ID x1 x2 x3 ... y

Features Examples

  • 2. Feature extraction query

s: Sufficient statistics generated from model spec and feature extraction query. Computed via aggrefations

  • 3. Model specification

(e.g., “degree 2 ridge regression”)

  • 1. Database

33

slide-34
SLIDE 34

34 34

Number of Aggregates Varies By Model Class

■ Supervised

  • Regression
  • Classification

■ Unsupervised Mo Model # # feat eatur ures es # # par aram ams # # ag aggreg egat ates es Linear regression n n + 1 Θ(n2) Polynomial regression Θ(nd) Θ(nd) Θ(n2d) Factorization machines Θ(nd) Θ(nr) Θ(n2d) n: # input features d: degree r: rank Mo Model # # feat eatur ures es # # ag aggreg egat ates es Decision trees Θ(n) Θ(nbh) b: branching factor, h: depth (data-dependent) Mo Model # # ag aggreg egat ates es K-means Θ(kn) PCA Θ(kn2) k: # clusters

34

slide-35
SLIDE 35

35 35

All Products Department Class Sub-class Style Item

We Efficiently Compute Those Aggregates

35

slide-36
SLIDE 36

36 36

Case Study: Retail dataset

36

slide-37
SLIDE 37

37 37

Case Study: Retail dataset

Relation Cardinality (# Tuples) Degree (# k/v columns) File size (csv) Inventory 84,055,817 3 & 1 2 GB Items 5,618 1 & 4 129 KB Stores 1,317 1 & 14 139 KB Demographics 1,302 1 & 15 161 KB Weather 1,159,457 2 & 6 33 MB

Total: 2.1 GB

37

slide-38
SLIDE 38

38 38

Case Study: Retail dataset – PostgreSQL & TensorFlow

Cardinality (# of tuples) 84,055,817 Degree (# of columns) 44 (3 & 41) Size 23 GB Time to compute in PostgreSQL 217 secs Time to export from PostgreSQL 373 secs Time to learn parameters with GD > 12,000 secs ▪ The design matrix is constructed by joining together all the relations ▪ Train a linear regression model to predict sales by item, store, date from all the other features

38

slide-39
SLIDE 39

39 39

Case Study: Retail dataset - comparison

Design matrix with PostgreSQL/TensorFlow relationalAI Time Size Time Size Original

  • 2.1 GB
  • 2.1 GB

Join Tables 217 secs 23 GB

  • Export DM

373 secs 23 GB

  • Aggregate
  • 18 secs

37 KB Parameter learning with GD > 12 K secs

  • 0.5 secs
  • Total

> 12.5 K secs 18.5 secs Improvement (1st Model)

> 676x faster 11x smaller

Every model after

> 24,000x faster

39

slide-40
SLIDE 40

40 40

Does it work for all model classes or methods?

▪ Linear regression ▪ Polynomial regression ▪ Factorization machines ▪ Decision trees ▪ Linear SVM ▪ Deep sum-product networks ▪ Naive Bayes Classifier (discrete case) ▪ Hidden Markov Model (discrete case)

(with more on the way) Supported methods include

40

▪ K-Means & K-Median clustering ▪ Gaussian Discriminant Analysis ▪ Linear Discriminant Analysis ▪ Principal component analysis ▪ Frequent item set mining (with Apriori algorithm) ▪ Computing empirical mutual information and entropy

slide-41
SLIDE 41

41 41

So what?

Moore’s Law gives us 2x speedup every 1.5 years According to Nvidia GPUs give us a 2-10X speed-up over CPUs Some context:

In other words, GPUs give us ~5 year advantage

41

slide-42
SLIDE 42

42 42

So what?

256x

is

8 doublings

(i.e., 2^8)

What are the implications of 2-3 orders of magnitude speed-up?

256x

is

8 doublings 1024x

is

10 doublings

42

slide-43
SLIDE 43

43 43

So what?

256x

is

8 doublings

(i.e., 2^8)

What are the implications of 2-3 orders of magnitude speed-up?

256x

is

8 doublings

1024x

is

10 doublings

(i.e., 2^10)

Algorithms that exploit the domain structure give us a 12-15 YEAR ADVANTAGE

43

slide-44
SLIDE 44

44 44

AI’s biggest challenges are computational!

ACCURACY

Search for better

▪ Parameters ▪ Hyper parameters ▪ Features ▪ Models

Don’t make assumptions that you don’t need to make (e.g. i.i.d. assumption)

ROBUSTNESS

Many ”big data” problems are really a big collection of small data problems Overcome challenges with small, incomplete, and dirty data problems by incorporating prior knowledge and expertise

INTERPRETABILITY

Searching for models that are accurate and interpretable is harder than searching for accurate models Interpretation in terms of prior knowledge and in language &

  • ntology that humans

understand

VERSATILITY

Reasoning and (generalized) inference: From observations to unknowns in any time period Inference of any property in the model (e.g., it’s just as easy to infer price from sales as it is to infer sales from price)

FAIRNESS

It’s not enough to exclude gender, ethnicity, race, age, etc as features to the models. Other features might be correlated. Prejudice is a computational limitation: Reasoning about each person vs reasoning about the group

EXPLAINABILITY

Explainability typically implanted via separate shadow models that have to be learned Explanation in terms of prior knowledge and in language &

  • ntology that humans

understand

CAUSALITY

Understanding causality beyond A/B testing Computationally very expensive

SELF-SUPERVISION

“The future will be self- supervised” Yann LeCun Build models of the world by

  • bserving it and searching

model space for the models that have the most explanatory power

slide-45
SLIDE 45

45

Statistical Relational Learning

Relational generative models

slide-46
SLIDE 46

46 46

What else do we throw away when we build the feature matrix?

ID x1 x2 x3 ... y

Translation to feature matrix assumes each entity is independent of the others (iid assumption) This is often not true - e.g. related sku’s or related people Features Examples Feature extraction query

46

slide-47
SLIDE 47

47 47

ID x1 x2 x3 ... y

Features Pairs of Entities

ID x1 x2 x3 ... y

What if we don’t make the i.i.d assumption?

47

slide-48
SLIDE 48

48 48

Features All

ID x1 x2 x3 ... y ID x1 x2 x3 ... y

...

ID x1 x2 x3 ... y

What if we don’t make the i.i.d assumption?

48

slide-49
SLIDE 49

49 49

■ Statistical Relational models generalize PGMs in the same way that first order logic generalizes propositional logic

– they allow us to quantify over individuals/entities

  • Allows for generalization (e.g. item, sub-class, class, dept, etc.)
  • Ability to predict link-based patterns (e.g. inter item dependencies at sub-class, class, dept etc.)
  • Models a varied number of observations for each object/relation. (e.g. friends, colleagues, etc.)

■ Variants

  • MLN in various flavors, PSL, RDN, BoostSRL, ProbLog, etc.

Statistical Relational Learning

49

slide-50
SLIDE 50

50 50

■ Inference

  • Unlike “traditional” methods where prediction is the input applied to the parameters of the model class, inference in SRL

requires expensive optimization or (approximate) integration over possible worlds ■ Learning

  • Unlike traditional learning algorithms, just one instance to learn from (the relational DB)
  • Structure learning uses inference during each step

Statistical Relational Learning

50

slide-51
SLIDE 51

51 51

Slide and example thanks to Pedro Domingos

51

slide-52
SLIDE 52

52 52

CERTAIN KNOWLEDGE WITH INTEGRITY CONSTRAINTS

A logical Knowledge Base is a set of Integrity Constraints that define a set of possible worlds: person(x) smokes(x) -> person(x) cancer(x) -> person(x) friends(x, y) -> person(x), person(y)

slide-53
SLIDE 53

53 53

CERTAIN KNOWLEDGE WITH INTEGRITY CONSTRAINTS

A logical Knowledge Base is a set of Integrity Constraints that define a set of possible worlds: person(x) smokes(x) -> person(x) cancer(x) -> person(x) friends(x, y) -> person(x), person(y)

Smoking causes cancer Friends have similar smoking habits

slide-54
SLIDE 54

54 54

CERTAIN KNOWLEDGE WITH INTEGRITY CONSTRAINTS

A logical Knowledge Base is a set of Integrity Constraints that define a set of possible worlds: person(x) smokes(x) -> person(x) cancer(x) -> person(x) friends(x, y) -> person(x), person(y) w1 smokes(x) -> cancer(x) w2 smokes(x), friends(x, y) -> smokes(y)

Smoking causes cancer Friends have similar smoking habits

slide-55
SLIDE 55

55 55

Approximate answer by converting into convex continuous optimization problem Exploit group symmetry à lifted inference and approximate lifted inference Avoid grounding altogether à in-database learning Leveraging database semantics to avoid having to cluster -> in-database SPNs Stay tuned

How do you make this tractable?

55

slide-56
SLIDE 56

56

Brawn

Do same amount of work faster

slide-57
SLIDE 57

57 57

Constant factors – Do same amount of work faster (i.e., brawn)

▪ Latency hiding: Memory hierarchy and network latencies (e.g., in memory and near-data computing) ▪ Parallelization: SIMD, multi-core, accelerators (e.g., GPU, TPU, FPGA) ▪ Specialization: Specialize for workload (e.g., JIT compilation), specialize for data

The Path to Performance: Brawn

57

slide-58
SLIDE 58

58 58

Motivation for implementation strategy

and 💱

3 to 5 years building something similar in prior lives using C++ without ability to specialize for queries or data sets

58

slide-59
SLIDE 59

59 59

“Looks like Python, feels like LISP, runs like C” Julia is fast, dynamic, optionally typed, and multi-dispatched

■ Feels like Lisp: Hygienic macros, code quoting, generated functions ■ Runs like C: Specialization based on type inference, inlining, unboxing, LLVM to gen assembly

Julia in a nutshell

Source code Julia AST Julia IR LLVM IR Machine code Parse Lower Compile Compile Execute

59

slide-60
SLIDE 60

60 60

§ **Specialization** § Query evaluation: Just-in-time compiled query plans § Specialization § Data types: e.g., fixed-precision decimals

Brains and Brawn: Systems Programming in Julia

60

slide-61
SLIDE 61

61 61

Just-in-Time Query Compilation

▪ Query compilation has only recently replaced interpretation in modern database systems ▪ But, state of the practice is surprisingly primitive

  • Typically: variations on template expansion in C/C++
  • Ad-hoc methods to generate code: e.g., write a text file and invoke gcc
  • Cumbersome engineering effort

▪ Better: use a language with proper staged metaprogramming support

  • e.g., LegoDB using Scala/LMS/Squid

▪ Julia is very appealing from this point of view!

select A, B, C from R, S, T where … group by … pushq %rbp movq %rsp, %rbp testq %rdi, %rdi negq %rdi movq %rdi, %rax …

61

slide-62
SLIDE 62

62 62

Simplified TPC-H Q1: from SQL to Julia to Native Code

select sum(l_extprice * (100 - l_discount) * (100 + l_tax)) from lineitem sum = 0 for i in 1:size sum += l_extprice[i] * (100 - l_discount[i]) * (100 + l_tax[i]) end return sum testq %rcx, %rcx jle L71 movq (%rdi), %r8 movq (%rsi), %r9 movq (%rdx), %r10 xorl %edi, %edi xorl %eax, %eax L32: movl $100, %esi subq (%r9,%rdi,8), %rsi movq (%r10,%rdi,8), %rdx addq $100, %rdx imulq (%r8,%rdi,8), %rsi imulq %rdx, %rsi addq %rsi, %rax addq $1, %rdi cmpq %rdi, %rcx jne L32 retq L71: xorl %eax, %eax retq

From SQL to Julia with runtime code generation From Julia to LLVM to

  • ptimized x86-64 *

(*) The loop actually even gets vectorized, but we produced simpler code here for presentation purposes

62

slide-63
SLIDE 63

63 63

BI benchmark: vs Tableau/Hyper and Databricks Spark

63

Spark numbers based on Databricks hardware and TPCH setup. Snowflake benchmarks closer to Spark than Hyper.

slide-64
SLIDE 64

64 64

Brains and Brawn Together: 3-Clique Graph benchmark vs Databricks Spark

64

All benchmarks run on 1 core laptop.

slide-65
SLIDE 65

65 65

§ Specialization § Query evaluation: Just-in-time compiled query plans § **Specialization** § Data types: e.g., fixed-precision decimals

Brains and Brawn: Systems Programming in Julia

65

slide-66
SLIDE 66

66 66

Fixed-precision decimals are an important data type in database systems (e.g., for currencies), and avoid the inexact representation problems of floats: The Julia ecosystem has a FixedPointDecimal package for this purpose But… is this really going to be efficient enough? (Most database systems need special code to “compile away” fixed precision decimal operations into simple operations on integers…)

julia> 0.3333 + 0.33333 0.6666300000000001 # oops julia> T = FixedDecimal{Int64,5} FixedDecimal{Int64,5} julia> T(0.3333) + T(0.33333) FixedDecimal{Int64,5}(0.66663) # much better!

Abstraction without regret by example: Fixed-precision decimals

66

slide-67
SLIDE 67

struct FixedDecimal{T <: Integer, f} <: Real i::T function Base.reinterpret(::Type{FixedDecimal{T, f}}, i::Integer) where {T, f} n = max_exp10(T) if f >= 0 && (n < 0 || f <= n) new{T, f}(i % T) else _throw_storage_error(f, T, n) end end end +(x::FixedDecimal{T, f}, y::FixedDecimal{T, f}) where {T, f} = reinterpret(FD{T, f}, x.i+y.i) julia> @code_native +(T(0.3333),T(0.33333)) decl %eax movl (%esi), %eax decl %eax addl (%edi), %eax retl Here’s the FixedDecimal datatype and its addition operation… … and lo, the Julia compiler produces a tiny # of ops on integers, just as required!

Moreover, this will be inlined at the call site in any practical example!

67

slide-68
SLIDE 68

68 68

■ What about Parallelization and Accelerators?

68

slide-69
SLIDE 69

69

Closing

One more time

slide-70
SLIDE 70

70 70

AI’s biggest opportunities are relational!

ACCURACY

Search for better

▪ Parameters ▪ Hyper parameters ▪ Features ▪ Models

Don’t make assumptions that you don’t need to make (e.g. i.i.d. assumption)

ROBUSTNESS

Many ”big data” problems are really a big collection of small data problems Overcome challenges with small, incomplete, and dirty data problems by incorporating prior knowledge and expertise

INTERPRETABILITY

Searching for models that are accurate and interpretable is harder than searching for accurate models Interpretation in terms of prior knowledge and in language &

  • ntology that humans

understand

VERSATILITY

Reasoning and (generalized) inference: From observations to unknowns in any time period Inference of any property in the model (e.g., it’s just as easy to infer price from sales as it is to infer sales from price)

FAIRNESS

It’s not enough to exclude gender, ethnicity, race, age, etc as features to the models. Other features might be correlated. Prejudice is a computational limitation: Reasoning about each person vs reasoning about the group

EXPLAINABILITY

Explainability typically implanted via separate shadow models that have to be learned Explanation in terms of prior knowledge and in language &

  • ntology that humans

understand

CAUSALITY

Understanding causality beyond A/B testing Computationally very expensive

SELF-SUPERVISION

“The future will be self- supervised” Yann LeCun Build models of the world by

  • bserving it and searching

model space for the models that have the most explanatory power

slide-71
SLIDE 71

71 71

AI investment is focused on consumer AI

  • Deep learning for images, speech, text à not relational data (yet)

Weaknesses of implementations of relational data management systems

  • Abstraction leads to regret
  • Can guarantee correct answer but can’t guarantee optimal path to get there
  • Limitations on expressiveness, i.e. I can’t always ask the question I want to ask

Inertia — we have something that (sort of) works and we’re getting by. “you can’t expect us to rewrite all this code and retrain all those data scientists and programmers”

  • The number of models that haven’t been built is >>> the number of models that have
  • The number of future modelers is >>> the number of current modelers
  • The number of domain experts is >>> the number of modelers and data scientists

Why hasn’t this happened yet?

71

slide-72
SLIDE 72

72 72

■ We invented a new generation of (meta) algorithms that provide optimal solutions to large problem classes

  • OOM more power for OOM better intelligence

■ New generation of compilers that eliminate the cost of abstraction

  • Allow us to specialize for workload
  • Allow us to specialize for datasets

■ Backlash against Hadoop (Map-Reduce), NoSQL, ML Frameworks – “the emperor has no clothes” is in the air

  • Require you to sell your soul for scalability and/or performance
  • Harder to program and operate

Why Now?

72

slide-73
SLIDE 73

73 73

We built a system that gives you abstraction without regret How are we going to do that?

  • Constant factors
  • Asymptotic factors

We’re going to meet people where they are:

  • Tables and SQL if you are an analyst
  • Tensors & Linear Algebra if you are a data scientist

We’re going to simplify and consolidate analytics:

  • The building blocks for next gen AI (e.g. fast aggregation, factoring, multi-way evaluation, JIT, accelerators) building blocks for

all enterprise analytics: BI, graphs, rules, planning, mathematical optimization. We’re going to stage it. We’re going to consolidate and checkpoint our gains as we go.

  • AutoML (with automatic feature engineering and relational statistics) -> Data scientist
  • Data Management Systems for Analytics (aka data lakes) -> Data scientist
  • Business Intelligence & Data Warehouses -> Analyst & End User

What are we doing about it?

73

slide-74
SLIDE 74

74 74

Product: Never have to start from scratch again

  • Analytics
  • Data integration and federation
  • Operational

Data

  • General: e.g. Weather, Events, Consumer, Sentiment
  • Domain and industry specific: e.g. securities, crypto currencies
  • Competitor: e.g. price

Engine

  • Database
  • AI and Analytics

Tools

  • Data scientists: Notebooks (e.g. Jupyter)
  • Domain modelers: e.g. ontology editors (e.g. Jupyter, NORMA, Protégé)
  • Analysts: e.g. BI and spreadsheets

Templates

  • Industry: retail, financial services, technology & software.
  • Problem class: (product) knowledge graphs, recommender systems, anomaly detection,

portfolio optimization

slide-75
SLIDE 75

75

References

Incomplete list

slide-76
SLIDE 76

76 76

▪ Worst-Case Optimal Join Algorithms: Techniques, Results, and Open

  • Problems. Ngo. (Gems of PODS 2018)

▪ Worst-Case Optimal Join Algorithms: Techniques, Results, and Open

  • Problems. Ngo, Porat, Re, Rudra. (Journal of the ACM 2018)

▪ What do Shannon-type inequalities, submodular width, and disjunctive

datalog have to do with one another? Abo Khamis, Ngo, Suciu, (PODS 2017 - Invited to Journal of ACM)

▪ Computing Join Queries with Functional Dependencies. Abo Khamis,

Ngo, Suciu. (PODS 2017)

▪ Joins via Geometric Resolutions: Worst-case and Beyond. Abo Khamis,

Ngo, Re, Rudra. (PODS 2015, Invited to TODS 2015)

▪ Beyond Worst-Case Analysis for Joins with Minesweeper. Abo Khamis,

Ngo, Re, Rudra. (PODS 2014)

▪ Leapfrog Triejoin: A Simple Worst-Case Optimal Join Algorithm.

Veldhuizen (ICDT 2014 - Best Newcomer)

▪ Skew Strikes Back: New Developments in the Theory of Join Algorithms.

Ngo, Re, Rudra. (Invited to SIGMOD Record 2013)

▪ Worst Case Optimal Join Algorithms. Ngo, Porat, Re, Rudra. (PODS 2012

– Best Paper)

Underlying magic: Worst-case optimal join algorithms

76

slide-77
SLIDE 77

77 77

Underlying magic: Optimal query plans for worst-case optimal joins

▪ Juggling functions inside a database, Abo Khamis, Ngo, Suciu (Invited to SIGMOD Record) ▪ On Functional Aggregate Queries with Additive

  • Inequalities. Abo Khamis, Curtin, Moseley, Ngo, Nguyen,

Olteanu, Schleich. PODS 2019 ▪ What do Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog have to do with one another? Abo Khamis, Ngo, Suciu, (PODS 2017 - Invited to Journal

  • f ACM)

▪ FAQ: Questions Asked Frequently, Abu Khamis, Ngo, Rudra, (PODS 2016 – Best Paper, Invited to Journal of ACM)

77

slide-78
SLIDE 78

78 78

Underlying magic: In-database relational learning

▪ Rk-means: Fast Clustering for Relational Data. Curtin, Moseley, Ngo, Nguyen, Olteanu, Schleich. Submitted to NeurIPS 2019 ▪ On coresets for logistic regression. Curtin, Moseley, Pruhs,

  • Samadian. Submitted to NeurIPS 2019

▪ SolverBlox: Algebraic Modeling in Datalog. Borraz-Sanchez, Klabjan, Pasalic, Aref. (Declarative Logic Programming – Morgan & Claypool 2018) ▪ In-Database Learning with Sparse Tensors, Abo Khamis, Ngo, Nguyen, Olteanu, Schleich (PODS 2018 - Invited to Journal of TODS) ▪ AC/DC: In-Database Learning Thunderstruck, Abo Khamis, Ngo, Nguyen, Olteanu, Schleich (DEEM 2018) ▪ Modelling Machine Learning Algorithms on Relational Data with

  • Datalog. Makrynioti, Vasiloglou, Pasalic, Vassalos. (DEEM 2018)

▪ In-Database Factorized Learning, Ngo, Nguyen, Olteanu, Schleich (AMW 2017) ▪ Data Science with Linear Programming. Makrynioti, Vasiloglou, Pasalic, Vassalos. (DeLBP 2017)

78

slide-79
SLIDE 79

79 79

Underlying magic: Julia

▪ Julia: Dynamism and Performance Reconciled by Design, Jeff Bezanson, Jiahao Chen, Ben Chung, Stefan Karpinski, Viral B. Shah, Lionel Zoubritzky, Jan Vitek (OOPSLA 2018) ▪ Julia Subtyping: A Rational Reconstruction, Francesco Zappa Nardelli, Julia Belyakova, Artem Pelenitsyn, Benjamin Chung, Jeff Bezanson, Jan Vitek (OOPSLA 2018) ▪ Julia: A fresh approach to numerical computing, Jeff Bezanson, Alan Edelman, Stefan Karpinski, Viral B. Shah (SIAM Review 2017)

79

slide-80
SLIDE 80

80

THANK YOU