Part 2 MDL in L in Ac Actio ion Jilles V Vreeke ken 1

Explicit Coding Ad hoc sounds bad, but is it really? Bayesian learning for instance, is in inherent ntly ly subjectiv ive, plus  biasing search is a time-honoured tradition in data analysis  Using an expli licit encoding ing allows us to steer towards the type of structure we want nt to dis iscover We so also mitigate one of the practical weak spots of AIT all data is a string, but wouldn’t it be nice if the structure  you found would not depend on the order of the data? 2

Matrix Factorization The rank of a matrix 𝑩 is number of rank-1 matrices that when summed form 𝑩 (Sche hein n rank)  𝑑 1 𝑑 2 𝑑 3 = + + … 𝑐 3 𝑐 1 𝑐 2 𝑩 𝒄 𝟑 ∘ 𝒅 𝟑 𝒄 𝟒 ∘ 𝒅 𝟒 𝒄 𝟐 ∘ 𝒅 𝟐 3

Boolean Matrix Factorization The rank of a Boolean matrix 𝑩 is number of rank-1 matrices that when summed form 𝑩 (Sche hein n rank)  𝑑 3 𝑑 1 𝑑 1 𝑑 2 𝑑 1 𝑑 1 𝑐 1 + + … = 𝑐 2 𝑐 2 𝑐 1 𝑐 2 𝑐 3 𝑩 𝒄 𝟑 ∘ 𝒅 𝟑 𝒄 𝟒 ∘ 𝒅 𝟒 𝒄 𝟐 ∘ 𝒅 𝟐 (Miettinen et al 2006, 2008) 4

Boolean Matrix Factorization The rank of a Boolean matrix 𝑩 is number of rank-1 matrices that when summed form 𝑩 (Sche hein n rank)  noise quickly inflate the ‘true’ latent rank to min 𝑜 , 𝑛  = + + … 𝑐 1 𝑐 1 𝑐 1 𝑩 𝒄 𝟑 ∘ 𝒅 𝟑 𝒄 𝟒 ∘ 𝒅 𝟒 𝒄 𝟐 ∘ 𝒅 𝟐 (Miettinen et al 2006, 2008) 5

Boolean Matrix Factorization Noise quickly inflates the rank to min ( 𝑜 , 𝑛 ) how can we determine the ‘true’ latent rank?  ≈ 𝑩 𝑪 ∘ 𝑫 (Miettinen & Vreeken 2012, 2014) 6

Boolean Matrix Factorization Separating structure and noise matrices 𝐶 and 𝐷 contain structure, matrix 𝐹 contains noise  = ⊕ 𝑩 𝑭 𝑪 ∘ 𝑫 (Miettinen & Vreeken 2012, 2014) 7

Boolean Matrix Factorization Encoding the structure 𝑜 𝑀 𝑪 = log 𝑜 + � log 𝑜 + log 𝑐 𝑐∈𝑪 = ⊕ 𝑩 𝑭 𝑪 ∘ 𝑫 (Miettinen & Vreeken 2012, 2014) 8

Boolean Matrix Factorization Encoding the structure 𝑀 𝑫 = log 𝑛 + � log 𝑛 + log 𝑛 𝑑 𝑑∈𝑫 = ⊕ 𝑩 𝑭 𝑪 ∘ 𝑫 (Miettinen & Vreeken 2012, 2014) 9

Boolean Matrix Factorization Encoding the noise 𝑀 𝑭 = log 𝑜𝑛 + log 𝑜𝑛 𝑭 = ⊕ 𝑩 𝑭 𝑪 ∘ 𝑫 (Miettinen & Vreeken 2012, 2014) 10

Boolean Matrix Factorization MDL for BMF 𝑀 𝐸 , 𝐼 = 𝑀 𝑪 + 𝑀 𝑫 + 𝑀 ( 𝑭 ) = ⊕ 𝑩 𝑭 𝑪 ∘ 𝑫 (Miettinen & Vreeken 2012, 2014) 11

Pattern Mining The idea deal outcome of pattern mining  patterns that show the structure of the data  preferably a small set, without redundancy or noise Frequent pattern mining does not ot achieve this  pattern explosion → overly many, overly redundant results MDL allows us to effectively pursue the ideal  we want a group of patterns that summarise the data well  we take a patt attern s set t mining approach (Tatti & Vreeken 2012, Bertens et al. 2016, Bhattacharyya & Vreeken 2017) (for transaction data, Vreeken et al (2011), for graphs Koutra et al (2014) 12

Event sequences Alphabet Ω { a, b, c, d, … } Data 𝐸 a b d c a d b a a b c a d a b a b c one, or a b d c a d b a a b c , { multiple a b d c a d b , sequences a b d c a d b a a , … } (Tatti & Vreeken 2012, Bertens et al. 2016, Bhattacharyya & Vreeken 2017) 13

Event sequences Alphabet Ω { a, b, c, d, … } a b a a b b Data 𝐸 a b d c a d b a a b c a d a b a b c one, or a b d c a d b a a b c , { multiple a b d c a d b , sequences a b d c a d b a a , … } Patterns serial ‘subsequences a b episodes allowing gaps’ (Tatti & Vreeken 2012, Bertens et al. 2016, Bhattacharyya & Vreeken 2017) 14

Event sequences Alphabet Ω { a, b, c, d, … } a b a a b b Data 𝐸 a b d c a d b a a b c a d a a a b c d b c one, or a b d c a d b a a b c , { multiple a b d c a d b , sequences a b d c a d b a a , … } Patterns serial ‘subsequences a b episodes allowing gaps’ d b c (Tatti & Vreeken 2012, Bertens et al. 2016, Bhattacharyya & Vreeken 2017) 15

Models As models we use code de tab tables dictionary of patterns & codes  always contains all singletons  p abc ! ? q ! da ? We use optimal prefix codes a a - - easy to compute,  b b - - behave predictably, c  c - - good results, d d  - - more details follow  16

Encoding Event Sequences Data 𝐸 : a b d c a d b a a b c Encoding 1: using only singletons 1 : 𝐷𝑈 a a b b 𝐷 𝑞 a b d c a d b a a b c c c d d The length of the code for pattern 𝑌 X 𝑣𝑣𝑣 ( 𝑌 ) 𝑀 = − log 𝑞 = − log ( ∑𝑣𝑣𝑣 ( 𝑍 ) ) X X The length of the code stream 𝑀 𝐷 𝑞 = ∑ 𝑣𝑣𝑣 𝑌 𝑀 ( ) X 𝑌∈𝐷𝐷 17

Encoding Event Sequences Data 𝐸 : a b d c a d b a a b c Encoding 2: using patterns 𝐷𝑈 2 : a a b b 𝐷 𝑞 p q p d a b c c 𝐷 𝑣 ! ? ! ? ! ! ! d d p abc ! ? gap q ! da ? gap Alignment: a b d c a d b a a b c p ! ? ! q ? ! p ! ! 18

Encoding Event Sequences Data 𝐸 : a b d c a d b a a b c Encoding 2: using patterns 𝐷𝑈 2 : a a b b 𝐷 𝑞 p q p d a b c c 𝐷 𝑣 ! ? ! ? ! ! ! d d p abc ! ? q ! da ? The length of a gap code for pattern 𝑌 ? 𝑀 = − log( 𝑞 )) p ? ? and analogue for non-gap codes ! 19

Encoding Event Sequences By which, the encoded size of 𝐸 given 𝐷𝑈 and 𝐷 is 𝑀 𝐸 𝐷𝑈 = 𝑀 𝐷 𝑞 𝐷𝑈 + 𝑀 ( 𝐷 𝑣 ∣ 𝐷𝑈 ) which leaves us to define 𝑀 ( 𝐷𝑈 ∣ 𝐷 ) 20

Encoding a Code T able X X ? ! … 𝑀 ( 𝐷𝑈 ∣ 𝐷 , 𝐸 ) consists of Y Y ? ! a a … z z 21

Encoding a Code T able X X ? ! … 𝑀 ( 𝐷𝑈 ∣ 𝐷 , 𝐸 ) consists of Y Y ? ! a a … 1) base singleton counts in 𝐸 z z 𝐸 − 1 𝑀 ℕ Ω + 𝑀 ℕ 𝐸 + log Ω − 1 (Rissanen 1983) 22

Encoding a Code T able X X ? ! … 𝑀 ( 𝐷𝑈 ∣ 𝐷 , 𝐸 ) consists of Y Y ? ! a a … 1) base singleton counts in 𝐸 z z 𝐸 − 1 𝑀 ℕ Ω + 𝑀 ℕ 𝐸 + log Ω − 1 (Rissanen 1983) 23

Encoding a Code T able X X ? ! 𝒬 … 𝑀 ( 𝐷𝑈 ∣ 𝐷 , 𝐸 ) consists of Y Y ? ! a a … 1) base singleton counts in 𝐸 z z 𝐸 − 1 𝑀 ℕ Ω + 𝑀 ℕ 𝐸 + log Ω − 1 2) number of patterns, total, and per pattern usage 𝑀 ℕ 𝒬 + 1 + 𝑀 ℕ 𝑣𝑣𝑣 𝒬 + 1 + log 𝑣𝑣𝑣 𝒬 − 1 𝒬 − 1 24

Encoding a Code T able X X ? ! 𝒬 … 𝑀 ( 𝐷𝑈 ∣ 𝐷 , 𝐸 ) consists of Y Y ? ! a a … 1) base singleton counts in 𝐸 z z 𝐸 − 1 𝑀 ℕ Ω + 𝑀 ℕ 𝐸 + log Ω − 1 2) number of patterns, total, and per pattern usage 𝑀 ℕ 𝒬 + 1 + 𝑀 ℕ 𝑣𝑣𝑣 𝒬 + 1 + log 𝑣𝑣𝑣 𝒬 − 1 𝒬 − 1 25

Encoding a Code T able X X ? ! 𝒬 … 𝑀 ( 𝐷𝑈 ∣ 𝐷 , 𝐸 ) consists of Y Y ? ! a a … 1) base singleton counts in 𝐸 z z 𝐸 − 1 𝑀 ℕ Ω + 𝑀 ℕ 𝐸 + log Ω − 1 2) number of patterns, total, and per pattern usage 𝑀 ℕ 𝒬 + 1 + 𝑀 ℕ 𝑣𝑣𝑣 𝒬 + 1 + log 𝑣𝑣𝑣 𝒬 − 1 𝒬 − 1 3) per pattern 𝑌 : its length, elements, and number of gaps 𝑀 ℕ 𝑌 − � log 𝑞 𝑦 𝐸 + 𝑀 ℕ 𝑣𝑕𝑞𝑣 𝑌 + 1 𝑦∈𝑌 26

Encoding a Code T able X X ? ! 𝒬 … 𝑀 ( 𝐷𝑈 ∣ 𝐷 , 𝐸 ) consists of Y Y ? ! a a … 1) base singleton counts in 𝐸 z z 𝐸 − 1 𝑀 ℕ Ω + 𝑀 ℕ 𝐸 + log Ω − 1 2) number of patterns, total, and per pattern usage 𝑀 ℕ 𝒬 + 1 + 𝑀 ℕ 𝑣𝑣𝑣 𝒬 + 1 + log 𝑣𝑣𝑣 𝒬 − 1 𝒬 − 1 3) per pattern 𝑌 : its length, elements, and number of gaps 𝑀 ℕ 𝑌 − � log 𝑞 𝑦 𝐸 + 𝑀 ℕ 𝑣𝑕𝑞𝑣 𝑌 + 1 𝑦∈𝑌 27

Encoding Event Sequences By which we have a lossless encoding. In other words, an objective function. By MDL, our goal is now to minimise 𝑀 𝐷𝑈 , 𝐸 = 𝑀 𝐷𝑈 𝐷 + 𝑀 ( 𝐸 ∣ 𝐷𝑈 ) for how to do so, please see the papers Tatti & Vreeken (2012) Bertens et al. (2016), Bhattacharyya & Vreeken (2017) for transaction data, Vreeken et al (2011) Budhathoki & Vreeken (2015) for graphs Koutra et al (2014) 28

Experiments synt nthe hetic data random  no structure found  HMM  structure recovered real l data text data for interpretation  S QS -C ANDS S QS -S EARCH Ω Δ𝑀 | 𝐸 | # 𝐷𝑜𝐷𝑣 |  | |  | Addresses 5 295 56 15 506 138 155 5k JMLR 3 846 788 40 879 563 580 30k Moby Dick 10 277 1 22 559 215 231 10k (implementation available at http://eda.mmci.uni-saarland/sqs) 29

Part 2 MDL in L in Ac Actio ion Jilles V Vreeke ken 1 - PowerPoint PPT Presentation

Part 2 MDL in L in Ac Actio ion Jilles V Vreeke ken 1 Explicit Coding Ad hoc sounds bad, but is it really? Bayesian learning for instance, is in inherent ntly ly subjectiv ive, plus biasing search is a time-honoured tradition in

Overview Two-Part MDL Two-Part MDL Two-Part MDL for Two-Part MDL for Grammar Learning

REAL-TIME RAY TRACING WITH MDL Ignacio Llamas & Maksim Eisenstein, 03.21.2019 - GPU Technology

Implementing Powerful Implementing Powerful Ratio ionale le fo for A Actio ion P Pla

Recent Developments in Class Action Law and Impact on MDL Cases and Impact on MDL Cases

Between Renderers with MDL Jan Jordan Software Product Manager MDL March 18, GTC San Jose 2019

Navajo jo N Natio ion T Trib ribal Ac l Actio ion P Plan Empowering and Strengthening

Integrating the NVIDIA Material Definition Language MDL in Your Application Lutz Kettner

MDL-Based Unsupervised Attribute Ranking Zdravko Markov Computer Science Department Central

Implement Physically Based Ray Tracing with OptiX and MDL Detlef Rttger, 4/4/2016 OptiX

Sharing Physically Based Materials Between Renderers with MDL Jan Jordan Software Product Manager

Strong Asymptotic Assertions for Discrete MDL in Regression and Classification or A Strange Way

SIGGRAPH16 : NVIDIA BEST OF GTC MDL MATERIALS TO GLSL SHADERS Andreas Senbach, NVIDIA

RESTORATION ADVISORY BOARD MEETING MARCH 14, 2019 87th Air Base Wing JB MDL PBR CONTRACT UPDATE

The Catch-up Phenomenon in Bayesian and MDL Model Selection Tim van Erven www.timvanerven.nl 23

Emer ergen ency M Medicin ine e Physic ician ian S Satis isfac actio ion and Wel elln

Kennington Park Community Liaison Group 14 November 2019 1 # Actio ion Updat date 1

Beyond: The Legal and Policy Landscape of Worker Classification February 26, 2019 Nayantara

Explicit Modular Approaches to Generalized Fermat Equations David Brown University of

Types Classification of Values cs3723 1 Values and Types Basic types: types of atomic

7. Applications of Singularity Analysis http://ac.cs.princeton.edu Analytic combinatorics

Weather Typing as a Potential Tool to Analyze Tropical-Extratropical Interactions ngel G.

CSE 105 THEORY OF COMPUTATION Fall 2016 http://cseweb.ucsd.edu/classes/fa16/cse105-abc/ This

Groups, Formal Language Theory and Decidability Sam Jones Supervised by: Rick Thomas Department

FUCHSIA: Data-Driven Debugging for Functional Side Channels Saeid Tizpaz-Niari* , Pavol Cerny,