Data Mining and Matrices 01 Introduction Rainer Gemulla, Pauli - PowerPoint PPT Presentation

Data Mining and Matrices 01 – Introduction Rainer Gemulla, Pauli Miettinen April 18, 2013

Outline What is data mining? 1 What is a matrix? 2 Why data mining and matrices? 3 Summary 4 2 / 27

What is data mining? “Data mining is the process of discovering knowledge or patterns from massive amounts of data.” (Encyclopedia of Database Systems) Rock Gold Tools Miners Data Knowledge Software Analysts Estimated $100 billion industry around managing and analyzing data. 3 / 27 Data, Data everywhere. The Economist, 2010.

What is data mining? “Data mining is the process of discovering knowledge or patterns from massive amounts of data.” (Encyclopedia of Database Systems) Science ◮ The Sloan Digital Sky Survey gathered 140TB of information ◮ NASA Center for Climate Simulation stores 32PB of data ◮ 3B base pairs exist in the human genome ◮ LHC registers 600M particle collisions per second, 25PB/year Social data ◮ 1M customer transactions are performed at Walmart per hour ◮ 25M Netflix customers view and rate hundreds of thousands of movies ◮ 40B photos have been uploaded to Facebook ◮ 200M active Twitter users write 400M tweets per day ◮ 4.6B mobile-phone subscriptions worldwide Government, health care, news, stocks, books, web search, ... 4 / 27 Data, Data everywhere. The Economist, 2010.

What is data mining? “Data mining is the process of discovering knowledge or patterns from massive amounts of data.” (Encyclopedia of Database Systems) Prediction Outlier detection “Regnet es am Siebenschl¨ afertag, der Regen sieben Wochen nicht weichen mag.” Clustering (German folklore) Pattern mining 5 / 27

What is data mining? “Data mining is the process of discovering knowledge or patterns from massive amounts of data.” (Encyclopedia of Database Systems) Focus of this lecture Knowledge discovery pipeline 6 / 27

Womb mater (Latin) = mother matrix (Latin) = pregnant animal matrix (Late Latin) = womb also source , origin Since 1550s: place or medium where something is developed Since 1640s: embedding or enclosing mass 8 / 27 Online Etymology Dictionary

Rectangular arrays of numbers “Rectangular arrays” known in ancient China ( rod calculus , estimated as early as 300BC)  1 0 0 0 0 0  0 1 0 0 0 0     0 0 1 0 0 0     0 0 0 1 0 0     0 0 0 0 1 0   0 0 0 0 0 1 Term “matrix” coined by J.J. Sylvester in 1850 9 / 27

System of linear equations Systems of linear equations can be written as matrices 3 x + 2 y + z = 39   3 2 1 39 2 x + 3 y + z = 34 2 3 1 34 →   1 2 3 26 x + 2 y + 3 z = 26 and then be solved using linear algebra methods       3 2 1 39 x 9 . 25  =  = 5 1 24 y 4 . 25 ⇒     12 33 2 . 75 z 10 / 27

Set of data points x y  − 3 . 84 − 2 . 21  4 − 3 . 33 − 2 . 19     − 2 . 55 − 1 . 47     − 2 . 46 − 1 . 25     − 1 . 49 − 0 . 76 ● 2     − 1 . 67 − 0 . 39 ● ●   ● ● ● ● ●   ● ● ●● − 1 . 3 − 0 . 59 ● ● ● ● ●   ● ● ● . . ●   ● ● . . ● ● ●  . .  y 0 ● ● ● ● ● ● ●   ● ●● ● ● ● ● ● ● ● ●  1 . 59 0 . 78  ● ●     1 . 53 1 . 02 ● ●     1 . 45 1 . 26 − 2   ● ●   1 . 86 1 . 18     2 . 04 0 . 96     2 . 42 1 . 24 − 4     2 . 32 2 . 03   − 4 − 2 0 2 4 2 . 9 1 . 35 x 11 / 27

Linear maps Original data Linear maps from R 3 to R 4 f 1 ( x , y , z ) = 3 x + 2 y + z 2 ● ● ● ● ● ● ● ● ● ● ●● f 2 ( x , y , z ) = 2 x + 3 y + z ● ● ● ● ● ● ● ● ● ● ● ● ● ● y 0 ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● f 3 ( x , y , z ) = x + 2 y + 3 z ● ● − 2 ● ● f 4 ( x , y , z ) = x − 4 Linear map f 1 written as a matrix − 4 − 2 0 2 4 x   x Rotated and stretched  = f 1 ( x , y , z ) � � 3 2 1 y  4 z ● Linear map from R 3 to R 4 ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●     ● ● ● ● 3 2 1 f 1 ( x , y , z ) ● ● ● ● ● y 0 ● ● ● ● ●   ● x ● ● ● ● ● ● ● ● ● ● 2 3 1 f 2 ( x , y , z ) ● ● ● ● ● ●    =   y ● − 2 ●      1 2 3 f 3 ( x , y , z )     z 1 0 0 f 4 ( x , y , z ) − 4 − 4 − 2 0 2 4 x 12 / 27

Graphs Adjacency matrix 13 / 27

Objects and attributes Anna, Bob, and Charlie went shopping Anna bought butter and bread Bob bought butter, bread, and beer Charlie bought bread and beer Bread Butter Beer Data Matrix Mining     Anna 1 1 0 Book 1 5 0 3 Bob 1 1 1 Book 2 0 0 7     Charlie 0 1 1 Book 3 4 6 5 Customer transactions Document-term matrix Avatar The Matrix Up Jan Jun Sep     Alice 4 2 Saarbr¨ ucken 1 11 10 Bob 3 2 Helsinki 6 . 5 10 . 9 8 . 7     Charlie 5 3 Cape Town 15 . 7 7 . 8 8 . 7 Incomplete rating matrix Cities and monthly temperatures Many different kinds of data fit this object-attribute viewpoint. 14 / 27

What is a matrix? A means to describe computation ◮ Rotation  ◮ Rescaling    ◮ Permutation Linear operators ◮ Projection    ◮ · · · Attribute j A means to describe data Rows Columns Entries A 11 A 12 A 1 j Objects Attributes Values   · · · · · · A 21 A 22 A 2 j Equations Variables Coefficients · · · · · ·   . . . ... ...   Data points Axes Coordinates . . .   . . .   Vertices Vertices Edges   Object i A i 1 A i 2 A ij · · · · · ·   .  . . . . . . . .  ... ... . . . . . . . . . In data mining, we make use of both viewpoints simultaneously. 15 / 27

Outline What is data mining? 1 What is a matrix? 2 Why data mining and matrices? 3 Summary 4 16 / 27

Key tool: Matrix decompositions A matrix decomposition of a data matrix D is given by three matrices L , M , R such that D = LMR , D ij = � k , k ′ L ik M kk ′ R k ′ j where k ′ k ′ D is an m × n data matrix, R ∗ j M R L is an m × r matrix, k M is an r × r matrix, R is an r × n matrix, and D ij L i ∗ r is an integer ≥ 1. k There are many different kinds of matrix L D decompositions, each putting certain con- straints on matrices L , M , R (which may not be easy to find). 17 / 27

Example: Singular value decomposition D 50 × 2 4 2 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● y ● 0 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● − 2 ● ● − 4 − 4 − 2 0 2 4 x L 50 × 2 M 2 × 2 R 2 × 2 1.0 R 2* 0.4 ● ● ● 0.5 R 1* ● 0.2 ● ● ● ● ● ● ● ● � 11 . 73 � ● ● 0 ● ● ● ● ● ● ● ● ● ● ● 0.0 ● 0.0 y ● ● ● y ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 . 71 ● − 0.2 ● ● ● ● ● ● − 0.5 ● − 0.4 − 1.0 − 0.4 − 0.2 0.0 0.2 0.4 − 1.0 − 0.5 0.0 0.5 1.0 x x 18 / 27

Example: Non-negative matrix factorization D ∗ j L R ∗ j LR ∗ j 19 / 27 Lee and Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 1999.

Example: Latent Dirichlet allocation R ( L ) 20 / 27 Blei et al. Latent dirichlet allocation. JMLR, 2003.

Other matrix decompositions Singular value decomposition (SVD) k -means Non-negative matrix factorization (NMF) Semi-discrete decomposition (SDD) Boolean matrix decomposition (BMF) Independent component analysis (ICA) Matrix completion Probabilistic matrix factorization . . . 21 / 27

What can we do with matrix decompositions? Separate data from multiple processes Remove noise from the data Remove redundancy from the data Reveal latent structure and similarities in the data Fill in missing entries Find local patterns Reduce space consumption Reduce computational cost Aid visualization Matrix decompositions can make data mining algorithms more effective. They may also provide insight into the data by themselves. 22 / 27

Data Mining and Matrices 01 Introduction Rainer Gemulla, Pauli - PowerPoint PPT Presentation

Data Mining and Matrices 01 Introduction Rainer Gemulla, Pauli Miettinen April 18, 2013 Outline What is data mining? 1 What is a matrix? 2 Why data mining and matrices? 3 Summary 4 2 / 27 What is data mining? Data mining is the

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal & spectral matrices) by

Transformations and Matrices Transformations I Transformations are functions Matrices

Matrices with Application to Page Rank Markov Matrices Pagerank Anil Maheshwari

Structural Matrices in MDOF Systems Evaluation of Structural Matrices Choice of Property

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Scanning packs what to look for Implementing the EU Falsified Medicines Directive in the UK

DATA REPORTING AND SHARING FOR THE 2030 AGENDA NATIONAL PLATFORM DEVINFO AND THE MDGS

Op Open en Fo Fort rt Co Collins lins: : A framework for advancing transparency and

Approaches to Probabilistic Model Learning for Mobile Manipulation Robots Jrgen Sturm

Product I nform ation 4 .0 Gebrauchsinform ation 4 .0 ( GI 4 .0 ) Digital Product

SACL Matrix Update March 5, 2015 Student Affairs and Campus Life Matrix Workgroup Outline

Enterprise GIS Steering Committee Meeting March 10, 2017 Enterprise GIS Goals One master

Fluorescent Mark Technology for DPM Applications Data Matrix Mark Protection & Authentication

Data Mining and Matrices 01 Introduction Rainer Gemulla, Pauli - PowerPoint PPT Presentation

Data Mining and Matrices 01 Introduction Rainer Gemulla, Pauli Miettinen April 18, 2013 Outline What is data mining? 1 What is a matrix? 2 Why data mining and matrices? 3 Summary 4 2 / 27 What is data mining? Data mining is the

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices &amp; quadratic forms)

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal &amp; spectral matrices) by

Transformations and Matrices Transformations I Transformations are functions Matrices

Matrices with Application to Page Rank Markov Matrices Pagerank Anil Maheshwari

Structural Matrices in MDOF Systems Evaluation of Structural Matrices Choice of Property

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Scanning packs what to look for Implementing the EU Falsified Medicines Directive in the UK

DATA REPORTING AND SHARING FOR THE 2030 AGENDA NATIONAL PLATFORM DEVINFO AND THE MDGS

Op Open en Fo Fort rt Co Collins lins: : A framework for advancing transparency and

Approaches to Probabilistic Model Learning for Mobile Manipulation Robots Jrgen Sturm

Product I nform ation 4 .0 Gebrauchsinform ation 4 .0 ( GI 4 .0 ) Digital Product

SACL Matrix Update March 5, 2015 Student Affairs and Campus Life Matrix Workgroup Outline

Enterprise GIS Steering Committee Meeting March 10, 2017 Enterprise GIS Goals One master

Fluorescent Mark Technology for DPM Applications Data Matrix Mark Protection &amp; Authentication

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

JUST THE MATHS SLIDES NUMBER 9.9 MATRICES 9 (Modal & spectral matrices) by

Fluorescent Mark Technology for DPM Applications Data Matrix Mark Protection & Authentication