Decomposition of Boolean Multi-Relational Data with Graded Relations - PowerPoint PPT Presentation

Decomposition of Boolean Multi-Relational Data with Graded Relations Martin Trnecka, Marketa Trneckova DEPARTMENT OF COMPUTER SCIENCE PALACKÝ UNIVERSITY OLOMOUC CZECH REPUBLIC IEEE International Conference on Intelligent systems IS’16 Sofia, Bulgaria, September 4-6, 2016

Boolean Matrix Decomposition Method for analysis of Boolean data. A general aim: for a given matrix I ∈ { 0 , 1 } n × m find matrices A ∈ { 0 , 1 } n × k and B ∈ { 0 , 1 } k × m for which I (approximately) equals A ◦ B ◦ is the Boolean matrix product k ( A ◦ B ) ij = max l =1 min( A il , B lj ) .     10111 110   10110 01101 011      =  ◦ 00101       01001 001       01001   10110 100 Discovery of k factors that exactly or approximately explain the data. Factors = interesting patterns (rectangles) in data. M. Trnecka, M. Trneckova (Palacký University Olomouc) Sofia, Bulgaria, September 2016 1 / 15

Limits of Boolean Matrix Decomposition Various methods and approaches. Classic setting: can handle only one input data matrix. Many real-word data sets are more complex than one simple data table. Multi-Relational Data = data composed from many tables (matrices) interconnected via relations between objects or attributes of these data tables. M. Trnecka, M. Trneckova (Palacký University Olomouc) Sofia, Bulgaria, September 2016 2 / 15

Multi-Relation Boolean Matrix Factorization Krmelova M., Trnecka M.: Boolean Factor Analysis of Multi-Relational Data. In: M. Ojeda-Aciego, J. Outrata (Eds.): CLA 2013: Proceedings of the 10th International Conference on Concept Lattices and Their Applications, 2013, pp. 187–198. Trnecka M., Trneckova M.: An Algorithm for the Multi-Relational Boolean Factor Analysis based on Essential Elements. In: K. Bertet, S. Rudolph (Eds.): CLA 2014: Proceedings of the 11th International Conference on Concept Lattices and Their Applications, 2014, pp. 107–118. Problem settings: Two Boolean data tables C 1 and C 2 interconnected with binary relation R 12 . Multi-Relational Factor = pair of classic factors satisfying relation (several ways). Algorithmic issue: how to select these factors. M. Trnecka, M. Trneckova (Palacký University Olomouc) Sofia, Bulgaria, September 2016 3 / 15

Simple Example Table: C 1 Table: C 2 Table: R C 1 C 2 a b c d e f g h e f g h 1 × × × 5 × × 1 × × 2 × × 6 × × 2 × × 3 × × 7 × × × 3 × × × 4 × × × × 8 × × 4 × × × × Factors of data table C 1 are: F C 1 = �{ 1, 4 } , { b, c, d }� , F C 1 = �{ 2, 4 } , { a, c }� , 1 2 F C 1 = �{ 1, 3, 4 } , { b, d }� and factors of table C 2 are: F C 2 = �{ 6 , 7 } , { f, g }� , 3 1 F C 2 = �{ 5 } , { e, h }� , F C 2 = �{ 5 , 7 } , { e }� , F C 2 = �{ 8 } , { g, h }� . 2 3 4 F C 2 F C 2 F C 2 F C 2 1 2 3 4 F C 1 × 1 F C 1 × × 2 F C 1 × × × 3 M. Trnecka, M. Trneckova (Palacký University Olomouc) Sofia, Bulgaria, September 2016 4 / 15

Our Work The main advantage of Boolean data is interpretability. Considering Boolean data only can be limiting. Relation between input matrices is not necessarily of a Boolean nature. Our goal: Compute for two input Boolean matrices C 1 and C 2 and relation R 12 (with grades from some scale L ) between them, multi-relational factors. � � F C 1 , F C 2 , where F C 1 ∈ F C 1 , F C 2 Multi-relation factor on C 1 and C 2 is , d ∈ F C 2 i j i j ( F C 1 and F C 2 represent sets of classical factors from C 1 and C 2 respectively) and both are compatible with relation R 12 in degree d ∈ L . We want factors explaining (covering) the largest part of input data. We assume that L conforms to the structure of a complete residuated lattice used in Fuzzy logic. M. Trnecka, M. Trneckova (Palacký University Olomouc) Sofia, Bulgaria, September 2016 5 / 15

Solution Factors = Formal concepts (clear interpretation, geometrical viewpoint). Belohlavek R., Vychodil V., Discovery of optimal factors in binary data via a novel method of matrix decomposition, Journal of Computer System Science 76(1) (2010). We design new BMF algorithm (part of our final algorithm) Based on so called “Essential elements” Derivate of GreEss algorithm. Belohlavek R., Trnecka M.: From-Below Approximations in Boolean Matrix Factorization: Geometry and New Algorithm. Journal of Computer and System Sciences 81(8)(2015), 1678—1697 We used calculus over Fuzzy logic and residuated lattices. M. Trnecka, M. Trneckova (Palacký University Olomouc) Sofia, Bulgaria, September 2016 6 / 15

Idea of Algorithm (in case of object attribute relation) The main issue: how to understand that “factors F C 1 ∈ F C 1 and F C 2 ∈ F C 2 are i j compatible in a relation R 12 in degree d ”. Intuitively: we want all objects from F C 1 to be compatible with relation R 12 and also i all attributes from F C 2 to be compatible with this relation. j “object x is compatible with relation” means: if object x is in F C 1 then x has all i attributes from F C 2 in relation R 12 . j Similarly for attributes. For two factors � A, B � and � C, D � :      � �  � �  ⊗  � �  . =  x → R 12 ( x, y ) y → R 12 ( x, y ) d  x ∈ A y ∈ D y ∈ D x ∈ A M. Trnecka, M. Trneckova (Palacký University Olomouc) Sofia, Bulgaria, September 2016 7 / 15

Algorithm Input: Boolean matrices C 1 , C 2 and relation R 12 . Output: Set F of multi-relational factors. 1: F C 1 ← Boolean factors of C 1 2: F C 2 ← Boolean factors of C 2 3: U C 1 ← C 1 4: U C 2 ← C 2 5: foreach � A, B � ∈ F C 1 do compute set of all candidates F � A,B � ⊆ F C 2 which 6: are compatible in R 12 with � A, B � in degree d > 0 7: end for 8: while exist � A, B � and � C, D � ∈ F � A,B � which can be connected and improve coverage do select � A, B � and corresponding � C, D � ∈ F � A,B � that 9: cover the biggest parts of U C 1 and U C 2 add �� A, B � , � C, D � , d � to F 10: remove all entries in � A, B � from U C 1 11: remove all entries in � C, D � from U C 2 12: remove � C, D � from F � A,B � 13: 14: end while M. Trnecka, M. Trneckova (Palacký University Olomouc) Sofia, Bulgaria, September 2016 8 / 15

Experimental Evaluation on Synthetic Data Quality of factorization. The main factor: density of relational matrix. To eliminate influence of input matrices C 1 and C 2 , we fixed them. C 1 has a size 1000 × 500 and approximate density of ones 25% and C 2 has a size 500 × 1000 and the same density. Relational matrix has a size 500 × 500 . Grades of this matrix are from the scale L = { 0 , 0 . 1 , 0 . 2 , 0 . 3 , 0 . 4 , 0 . 5 , 0 . 6 , 0 . 7 , 0 . 8 , 0 . 9 , 1 } . We wanted to demonstrate that the number of zeros in this relation plays a crucial role. We used 10 different sets of relational matrices with different distribution of grades. Each set contains 1000 of such relations. M. Trnecka, M. Trneckova (Palacký University Olomouc) Sofia, Bulgaria, September 2016 9 / 15

Results Table: Results for synthetic data average average average average percent coverage coverage total of zeros of C 1 of C 2 coverage Set 1 89% 65% 58% 62% Set 2 81% 75% 69% 72% Set 3 72% 85% 79% 82% Set 4 61% 93% 90% 91% Set 5 52% 95% 93% 94% Set 6 39% 99% 98% 98% Set 7 28% 99 . 8% 99 . 6% 99 . 7% Set 8 20% 99 . 9% 99 . 9% 99 . 9% Set 9 15% 99 . 9% 100% 99 . 9% Set 10 10% 100% 100% 100% M. Trnecka, M. Trneckova (Palacký University Olomouc) Sofia, Bulgaria, September 2016 10 / 15

Experimental Evaluation on Real Data MovieLens dataset. http://grouplens.org/datasets/movielens/ Two data tables that represent a set of users and their attributes (e.g. gender, age, occupation) and a set of movies and their attributes (e.g. genre). Ratings are made on a 5-star scale (values 1-5, 1 means, that user does not like a movie and 5 means that he likes a movie). We used 10M version of MovieLens dataset We chose users that rate the most and films that are rated the most. Ratings were normalized to [0 , 1] interval. By our algorithm we obtained 46 multi-relational factors. These factors cover 98 percent of input data tables. M. Trnecka, M. Trneckova (Palacký University Olomouc) Sofia, Bulgaria, September 2016 11 / 15

Cumulative Coverage 1 0.9 0.8 0.7 0.6 coverage 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 35 40 45 number of factors Figure: Cumulative coverage of User and Movie data tables M. Trnecka, M. Trneckova (Palacký University Olomouc) Sofia, Bulgaria, September 2016 12 / 15

Interpretation of Obtained Factors College female students rated action, sci-fi and thriller movies from 1980s with at least three stars. Females students of elementary school rated new comedy films with at least three stars. College males students rated action, adventure and fantasy movies with at least four stars. Middle aged males rated new drama films at with at least three stars. Late forties females working as academics or educators rated films from 1970s with five stars. Females in the age of 25-34 rated children, animated and comedy movies with four stars. M. Trnecka, M. Trneckova (Palacký University Olomouc) Sofia, Bulgaria, September 2016 13 / 15

Decomposition of Boolean Multi-Relational Data with Graded Relations - PowerPoint PPT Presentation

Decomposition of Boolean Multi-Relational Data with Graded Relations Martin Trnecka, Marketa Trneckova DEPARTMENT OF COMPUTER SCIENCE PALACK UNIVERSITY OLOMOUC CZECH REPUBLIC IEEE International Conference on Intelligent systems IS16

Boolean Algebra Chapter 3 Boolean Values Introduction Boolean Operations Fundamental Operators

1 Boolean Algebra 1. Boolean Algebra Verification Technology Content 1.1 Boolean algebra basics

Digital Design Discussion: Boolean Algebra Boolean Expression Equivalence Boolean Function

Chapter 2: Relational Model Chapter 2: Relational Model Structure of Relational Databases

Chapter 3: Relational Model Structure of Relational Databases Relational Algebra Tuple

Boolean Logic 01-1 Boolean values Are TRUE and FALSE 01-2 Boolean values Are TRUE and

CHAPTER III BOOLEAN ALGEBRA R.M. Dansereau; v.1.0 BOOLEAN VALUES INTRO. TO COMP. ENG.

! Krmelova M., Trnecka M. (DAMOL) Boolean Factor Analysis of Multi-Relational Data October 16,

Relational Algebra Relational Query Languages Recall: Query = Retrieval Program Language

Relational Algebra 1 / 39 Relational Algebra Relational model specifies stuctures and

Relational Query Languages (2) SQL and QBE Walid G. Aref Query Languages For The Relational

The boolean type and boolean operators Recall that Java provides a data type boolean which can

Relational Data Model Hacettepe University Computer Engineering Department Outline 1. Relational

Thermal decomposition of the Thermal decomposition of the Thermal decomposition of the Thermal

Polar Decomposition of a Matrix Garrett Buffington May 4, 2014 The Polar Decomposition SVD and

Boolean Functions Boolean Expressions Let B = { 0 , 1 } . 1 ... true, 0 ... false Let x 1 , x 2 ,

Review Models that use SVD or eigen-analysis PageRank: eigen-analysis of random dolphin

Scope Stack Allocation Andreas Fredriksson, DICE <dep@dice.se> Contents What are Scope

But know this, that in the last days perilous times will come. 2 Timothy 3:1 Where Are We In

How to Make Decisions (Optimally) Siddhartha Sen Microsoft Research NYC AI for Systems

Rational Recurrences for Empirical Natural Language Processing Noah Smith University of

CSE 440: Introduction to HCI User Interface Design, Prototyping, and Evaluation James Fogarty

Improving the Resilience of Mutualistic Networks Varun Rao December 3, 2018 1 / 34 Overview

Tree level processes in the worldline formalism James P. Edwards Tlaxcala Sept 2017 En

Decomposition of Boolean Multi-Relational Data with Graded Relations - PowerPoint PPT Presentation

Decomposition of Boolean Multi-Relational Data with Graded Relations Martin Trnecka, Marketa Trneckova DEPARTMENT OF COMPUTER SCIENCE PALACK UNIVERSITY OLOMOUC CZECH REPUBLIC IEEE International Conference on Intelligent systems IS16

Boolean Algebra Chapter 3 Boolean Values Introduction Boolean Operations Fundamental Operators

1 Boolean Algebra 1. Boolean Algebra Verification Technology Content 1.1 Boolean algebra basics

Digital Design Discussion: Boolean Algebra Boolean Expression Equivalence Boolean Function

Chapter 2: Relational Model Chapter 2: Relational Model Structure of Relational Databases

Chapter 3: Relational Model Structure of Relational Databases Relational Algebra Tuple

Boolean Logic 01-1 Boolean values Are TRUE and FALSE 01-2 Boolean values Are TRUE and

CHAPTER III BOOLEAN ALGEBRA R.M. Dansereau; v.1.0 BOOLEAN VALUES INTRO. TO COMP. ENG.

! Krmelova M., Trnecka M. (DAMOL) Boolean Factor Analysis of Multi-Relational Data October 16,

Relational Algebra Relational Query Languages Recall: Query = Retrieval Program Language

Relational Algebra 1 / 39 Relational Algebra Relational model specifies stuctures and

Relational Query Languages (2) SQL and QBE Walid G. Aref Query Languages For The Relational

The boolean type and boolean operators Recall that Java provides a data type boolean which can

Relational Data Model Hacettepe University Computer Engineering Department Outline 1. Relational

Thermal decomposition of the Thermal decomposition of the Thermal decomposition of the Thermal

Polar Decomposition of a Matrix Garrett Buffington May 4, 2014 The Polar Decomposition SVD and

Boolean Functions Boolean Expressions Let B = { 0 , 1 } . 1 ... true, 0 ... false Let x 1 , x 2 ,

Review Models that use SVD or eigen-analysis PageRank: eigen-analysis of random dolphin

Scope Stack Allocation Andreas Fredriksson, DICE &lt;dep@dice.se&gt; Contents What are Scope

But know this, that in the last days perilous times will come. 2 Timothy 3:1 Where Are We In

How to Make Decisions (Optimally) Siddhartha Sen Microsoft Research NYC AI for Systems

Rational Recurrences for Empirical Natural Language Processing Noah Smith University of

CSE 440: Introduction to HCI User Interface Design, Prototyping, and Evaluation James Fogarty

Improving the Resilience of Mutualistic Networks Varun Rao December 3, 2018 1 / 34 Overview

Tree level processes in the worldline formalism James P. Edwards Tlaxcala Sept 2017 En

Scope Stack Allocation Andreas Fredriksson, DICE <dep@dice.se> Contents What are Scope