The Data Cube as a Typed Linear Algebra Operator DBPL 2017 16th - PowerPoint PPT Presentation

The Data Cube as a Typed Linear Algebra Operator DBPL 2017 — 16th Symp. on DB Prog. Lang. Technische Universit¨ at M¨ unchen (TUM), 1st Sep 2017 J.N. Oliveira H.D. Macedo INESC TEC & U.Minho SW Eng Group @ U.Aharus (H2020-732051: CloudDBAppliance)

Motivation Linear algebra Cube Properties References Motivation “Only by taking infinitesimally small units for observation (the differential of history, that is, the individual tendencies of men) and attaining to the art of integrating them (that is, finding the sum of these infinitesimals) can we hope to arrive at the laws of history.” Leo Tolstoy, “War and Peace” - Book XI, Chap.II (1869) 150 years later, this is what we are trying to attain through data-mining . But — how fit are our maths for the task? Have we attained the “ art of integration ”?

Motivation Linear algebra Cube Properties References Motivation Since the early days of psychometrics in the social sciences (1970s), linear algebra (LA) has been central to data analysis (e.g. tensor decompositions etc) We follow this trend but in a typed way, merging LA with polymorphic type systems , over a categorial basis. We address a concrete example: that of studying the maths behind a well-known device in data analysis, the data cube construction. We will define this construction as a polymorphic LA operator. Typed linear algebra is proposed as a rich setting for such an “ art of integration ” to be achieved.

Motivation Linear algebra Cube Properties References Running example Raw data: # Model Year Color Sale 1 Chevy 1990 Red 5 2 Chevy 1990 Blue 87 t = 3 Ford 1990 Green 64 4 Ford 1990 Blue 99 5 Ford 1991 Red 8 6 Ford 1991 Blue 7 Columns — attributes — the observables Rows — records ( n -many) — the infinitesimals Column-orientation — each column (attribute) A represented by a function t A : n → A such that a = t A ( i ) means “ a is the value of attribute A in record nr i ”.

Motivation Linear algebra Cube Properties References Records are tuples Can records be rebuilt from such attribute projection functions? Yes — by tupling them. Tupling : Given functions f : A → B and g : A → C, ▽ g such that their tupling is the function f ▽ g ) a = ( f a , g a ) ( f For instance, ▽ t Model ) 2 = ( Blue , Chevy ) , ( t Color ▽ ( t Color ▽ t Model )) 3 = ( 1990 , ( Green , Ford )) ( t Year and so on.

Motivation Linear algebra Cube Properties References Inverting tuples For the column-oriented model to work one will need to express joins , and these call for “inverse” functions, e.g. ▽ t Year ) ◦ ( Ford , 1990 ) = { 3 , 4 } ( t Model meaning that tuples nr 3 and nr 4 have the same model ( Ford ) and year ( 1990 ). However, the type f ◦ : A → P n is rather annoying, as it involves sets of tuple indices — these will add an extra layer of complexity. Fortunately, there is a simpler way — typed linear algebra , also known as linear algebra of programming ( LAoP ).

Motivation Linear algebra Cube Properties References The LAoP approach Represent functions by Boolean matrices. Given (finite) types A and B , any function f : A → B can be represented by a matrix � f � with A -many columns and B -many rows such that, for any b ∈ B and a ∈ A , matrix cell � 1 ⇐ b = f a b � f � a = 0 otherwise NB : Following the infix notation usually adopted for relations (which are Boolean matrices) — for instance y � x — we write y M x to denote the contents of the cell in matrix M addressed by row y and column x .

Motivation Linear algebra Cube Properties References The LAoP approach One projection function (matrix) per dimension attribute: t Model 1 2 3 4 5 6 Chevy 1 1 0 0 0 0 Ford 0 0 1 1 1 1 # Model Year Color Sale 1 Chevy 1990 Red 5 t Year 1 2 3 4 5 6 2 Chevy 1990 Blue 87 1990 1 1 1 1 0 0 3 Ford 1990 Green 64 1991 0 0 0 0 1 1 4 Ford 1990 Blue 99 5 Ford 1991 Red 8 t Color 1 2 3 4 5 6 6 Ford 1991 Blue 7 Blue 0 1 0 1 0 1 Green 0 0 1 0 0 0 Red 1 0 0 0 1 0 NB : we tend to abbreviate � f � by f when the context is clear.

Motivation Linear algebra Cube Properties References The LAoP approach Note how the inverse of a function is also represented by a Boolean matrix, e.g. t ◦ Chevy Ford Model 1 1 0 2 1 0 t Model 1 2 3 4 5 6 3 0 1 versus Chevy 1 1 0 0 0 0 4 0 1 Ford 0 0 1 1 1 1 5 0 1 6 0 1 — no need for powersets. Clearly, j t ◦ Model a = a t Model j Given a matrix M , M ◦ is known as the transposition of M .

� � � � Motivation Linear algebra Cube Properties References The LAoP approach We type matrices in the same way as functions: M : A → B means a matrix M with A -many columns and B -many rows. M � B denotes a matrix from A (source) Matrices are arrows: A to B (target), where A , B are (finite) types. M M � B . Writing B A means the same as A Composition — aka matrix multiplication: M N B A C M · N b ( M · N ) c = � � a :: ( b M a ) × ( a N c ) �

Motivation Linear algebra Cube Properties References The LAoP approach Function composition implemented by matrix multiplication, � f · g � = � f � · � g � Identity — the identity matrix id corresponds to the identity function and is such that M · id = M = id · M (1) Function tupling corresponds to the so-called Khatri-Rao product M ▽ N defined index-wise by ( b , c ) ( M ▽ N ) a = ( b M a ) × ( c N a ) (2) Khatri-Rao is a “column-wise” version of the well-known Kronecker product M ⊗ N : ( y , x ) ( M ⊗ N ) ( b , a ) = ( y M b ) × ( x N a ) (3)

Motivation Linear algebra Cube Properties References Typing data The raw data given above is represented in the LAoP by the expression ▽ t Model )) · ( t Sale ) ◦ (4) ▽ ( t Color v = ( t Year of type v : 1 → ( Year × ( Color × Model )) depicted aside. v is a multi-dimensional column vector — a tensor . Datatype 1 = { all } is the so-called singleton type.

� � � Motivation Linear algebra Cube Properties References Dimensions and measures Sale is a special kind of data — a measure . Measures are encoded as row vectors, e.g. Model t Sale 1 2 3 4 5 6 1 5 87 64 99 8 7 t Model t Year t Color � Year # t Color recall t Sale # Model Year Color Sale 1 Chevy 1990 Red 5 1 2 Chevy 1990 Blue 87 Summary: 3 Ford 1990 Green 64 dimensions are 4 Ford 1990 Blue 99 matrices , measures 5 Ford 1991 Red 8 are vectors . 6 Ford 1991 Blue 7 Measures provide for integration in Tolstoy’s sense — aka consolidation

Motivation Linear algebra Cube Properties References Totalisers There is a unique function in type A → 1 , usually named ! � 1 . This corresponds to a row vector wholly filled with 1 s. A ! � 1 = � � Example: 2 1 1 ! � 1 ) is the Given M : B → A , the expression ! · M (where A row vector (of type B → 1 ) that contains all column totals of M , � 50 40 85 115 � � 1 1 � � 100 50 170 190 � · = 50 10 85 75 τ A � A + 1 by Given type A , define its totalizer matrix A : A → A + 1 τ A � id � τ A = (5) ! Thus τ A · M yields a copy of M on top of the corresponding totals.

Motivation Linear algebra Cube Properties References Cubes Data cubes can be obtained from products of totalizers. Recall the Kronecker (tensor) product M ⊗ N of two matrices M ⊗ N � B × D . M N � B and C � D , which is of type A × C A The matrix τ A ⊗ τ B � ( A + 1 ) × ( B + 1 ) A × B provides for totalization on the two dimensions A and B . Indeed, type ( A + 1 ) × ( B + 1 ) is isomorphic to A × B + A + B + 1 , whose four parcels represent the four elements of the “ dimension powerset of { A , B } ”.

Motivation Linear algebra Cube Properties References Cube = muti-dimensional totalisation Recalling ▽ ( t Color ▽ t Model )) · ( t Sale ) ◦ v = ( t Year build c = ( τ Year ⊗ ( τ Color ⊗ τ Model )) · v This is the multidimensional vector (tensor) representing the data cube for • dimensions Year , Color , Model • measure Sale depicted aside.

Motivation Linear algebra Cube Properties References Totalisers yield cubes We reason: c = ( τ Year ⊗ ( τ Color ⊗ τ Model )) · v ▽ t Model )) · ( t Sale ) ◦ } ▽ ( t Color = { v = ( t Year ▽ ( t Color ▽ t Model )) · ( t Sale ) ◦ ( τ Year ⊗ ( τ Color ⊗ τ Model )) · ( t Year { property ( M ⊗ N ) · ( P ▽ Q ) = ( M · P ) ▽ ( N · Q ) } = (( τ Year · t Year ) ▽ (( τ Color · t Color ) ▽ (( τ Model · t Model )))) · ( t Sale ) ◦ = { define t ′ A = τ A · t A } ▽ ( t ′ ▽ t ′ Model )) · ( t Sale ) ◦ ( t ′ Year Color � t A � Note that t ′ A = , since t A is a function. !

Motivation Linear algebra Cube Properties References Generalizing data cubes In our approach a cube is not necessarily one such column vector. The key to generic data cubes is (generalized) vectorization , a M � C with kind of “ matrix currying ”: given A × B A × B -many columns and C -many rows, reshape M into its vec A M � A × C with B -many columns and vectorized version B A × C -many rows. Such matrices, M and vec A M , are isomorphic in the sense that they contain the same information in different formats, as c M ( a , b ) = ( a , c ) ( vec A M ) b (6) holds for every a , b , c .

The Data Cube as a Typed Linear Algebra Operator DBPL 2017 16th - PowerPoint PPT Presentation

The Data Cube as a Typed Linear Algebra Operator DBPL 2017 16th Symp. on DB Prog. Lang. Technische Universit at M unchen (TUM), 1st Sep 2017 J.N. Oliveira H.D. Macedo INESC TEC & U.Minho SW Eng Group @ U.Aharus (H2020-732051:

Outline Cube Release Roadmap Release Notes Cube 7 Highlights Cube 7 Beta

FROM SYSTEM F TO TYPED ASSEMBLY LANGUAGE Greg Morrisett, David Walker, Karl Crary & Neal

Shuffle algebra perspective on operator valued probability theory 30 mars 2020 1/25 Operator

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

bluecube V 4 . 3 1 Blue Cube CMS V4.3 by Digitalcube TABLE OF CONTENTS Introduction Discover

Explorations of the Rubiks Cube Group Zeb Howell May 2016 Explorations of the Rubiks Cube

Cube Attacks on Stream Ciphers Based on Division Property Chaoyun Li ESAT-COSIC, KU Leuven

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Fe February 1 Te Templates Wa Wade Fa Fagen-Ul Ulmsch schnei eider er, , Cra Craig

Motivation Eric Eager Data Scientist at Pro Football Focus DataCamp Linear Algebra for Data

Extrapolation of operator moments, with problems C. Brezinski, applications to linear algebra

Security-Typed Programming within Dependently-Typed Programming Dan Licata Joint work with

Event Sourcing Greg Young Event Sourcing says all state is transient and you only store facts.

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // CHARITY HILTON L E C T U R E # 1 1 :

System for Supporting Real-time Analytics Feng Li, M. Tamer Ozsu, Gang Chen, Beng Chin Ooi

Apache Kylin Introduction Dec 8, 2014 @ ApacheKylin Luke Han Sr. Product Manager |

Reporting Technologies Static and Dynamic Reporting Michael Nissen michaeln@diku.dk Department

Sequential Data Types of data Temporal (focusing on this one today) Bi-Temporal (Physical Time

Geometry Debugging in OSCAR (Geant4 Simulation of CMS) 03.07.2001 Geant4 Workshop, Genova

OLAP & Data Mining OLAP & Data Mining Agenda Agenda SQL Server Features (in short) SQL

The Data Cube as a Typed Linear Algebra Operator DBPL 2017 16th - PowerPoint PPT Presentation

The Data Cube as a Typed Linear Algebra Operator DBPL 2017 16th Symp. on DB Prog. Lang. Technische Universit at M unchen (TUM), 1st Sep 2017 J.N. Oliveira H.D. Macedo INESC TEC & U.Minho SW Eng Group @ U.Aharus (H2020-732051:

Outline Cube Release Roadmap Release Notes Cube 7 Highlights Cube 7 Beta

FROM SYSTEM F TO TYPED ASSEMBLY LANGUAGE Greg Morrisett, David Walker, Karl Crary &amp; Neal

Shuffle algebra perspective on operator valued probability theory 30 mars 2020 1/25 Operator

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

bluecube V 4 . 3 1 Blue Cube CMS V4.3 by Digitalcube TABLE OF CONTENTS Introduction Discover

Explorations of the Rubiks Cube Group Zeb Howell May 2016 Explorations of the Rubiks Cube

Cube Attacks on Stream Ciphers Based on Division Property Chaoyun Li ESAT-COSIC, KU Leuven

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Fe February 1 Te Templates Wa Wade Fa Fagen-Ul Ulmsch schnei eider er, , Cra Craig

Motivation Eric Eager Data Scientist at Pro Football Focus DataCamp Linear Algebra for Data

Extrapolation of operator moments, with problems C. Brezinski, applications to linear algebra

Security-Typed Programming within Dependently-Typed Programming Dan Licata Joint work with

Event Sourcing Greg Young Event Sourcing says all state is transient and you only store facts.

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // CHARITY HILTON L E C T U R E # 1 1 :

System for Supporting Real-time Analytics Feng Li, M. Tamer Ozsu, Gang Chen, Beng Chin Ooi

Apache Kylin Introduction Dec 8, 2014 @ ApacheKylin Luke Han Sr. Product Manager |

Reporting Technologies Static and Dynamic Reporting Michael Nissen michaeln@diku.dk Department

Sequential Data Types of data Temporal (focusing on this one today) Bi-Temporal (Physical Time

Geometry Debugging in OSCAR (Geant4 Simulation of CMS) 03.07.2001 Geant4 Workshop, Genova

OLAP &amp; Data Mining OLAP &amp; Data Mining Agenda Agenda SQL Server Features (in short) SQL

FROM SYSTEM F TO TYPED ASSEMBLY LANGUAGE Greg Morrisett, David Walker, Karl Crary & Neal

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

OLAP & Data Mining OLAP & Data Mining Agenda Agenda SQL Server Features (in short) SQL