Formal Methods for Mining Structured Objects Gemma Casas Garriga - - PowerPoint PPT Presentation

formal methods for mining structured objects
SMART_READER_LITE
LIVE PREVIEW

Formal Methods for Mining Structured Objects Gemma Casas Garriga - - PowerPoint PPT Presentation

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Formal Methods for Mining Structured Objects Gemma Casas Garriga Ph.D. Software Program Universitat Polit` ecnica de Catalunya Ph.D. dissertation, 8 June 2006


slide-1
SLIDE 1

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions

Formal Methods for Mining Structured Objects

Gemma Casas Garriga

Ph.D. Software Program Universitat Polit` ecnica de Catalunya

Ph.D. dissertation, 8 June 2006

Advised by Jos´ e L. Balc´ azar

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-2
SLIDE 2

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions

Outline

1 Introduction: Mining Structured Data 2 Lattice Theory for Sequences 3 Horn Axiomatizations for Sequences 4 Partial Order Construction 5 Conclusions and Future Research

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-3
SLIDE 3

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions

Outline

1 Introduction: Mining Structured Data 2 Lattice Theory for Sequences 3 Horn Axiomatizations for Sequences 4 Partial Order Construction 5 Conclusions and Future Research

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-4
SLIDE 4

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Preliminaries

Preliminaries

Data description

Consider the universe of patterns. Patterns are subclasses of directed graphs.

B D A B F A B F D A B B F A F D B

Consider a set of data D = {d1, . . . , dn}, where each object di can be described by a pattern.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-5
SLIDE 5

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Preliminaries

Preliminaries

Some definitions

Given two patterns G and G′, we say that G is a subpattern of G′, denoted G G′, if there is a morphism from G to G′.

A C C A C B

  • C

A B C A

The subpattern relation defines a partial order organization on the set of patterns, namely an exponential lattice of patterns.

B C A D A C A D B A B D C { } A B A C D A B C D A B B D C D C C B D

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-6
SLIDE 6

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Preliminaries

Preliminaries

Some definitions

If G G′, we say that G is more general than G′, or G′ is more specific than G.

B C A D A C A D B A B D C

{ }

A B A C D A B C D A B B D C D C C B D Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-7
SLIDE 7

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions The basic problem

The basic problem

Mining descriptions on the data

The complete lattice of patterns organized by defines the pattern space from where to identify valid hypothesis for our data D. D = {d1, . . . , dn}

B C A D A C A D B A B D C { } A B A C D A B C D A B B D C D C C B D

The support of a pattern G in the data D is the number of instances d ∈ D s.t. G d.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-8
SLIDE 8

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions The basic problem

The basic problem

Mining descriptions on the data

The complete lattice of patterns organized by defines the pattern space from where to identify valid hypothesis for our data D. D = {d1, . . . , dn}

B C A D A C A D B A B D C { } A B A C D A B C D A B B D C D C C B D

The support of a pattern G in the data D is the number of instances d ∈ D s.t. G d.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-9
SLIDE 9

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Examples

  • Example. The itemset case

Data D does not exhibit any structure. A well-known problem is to find all patterns whose support in D is

  • ver a minimum specified threshold, namely frequent itemsets.

The complete pattern space is formed only by trivial orders.

Id Sets d1 {A, B, C, D} d2 {A, C, D} d3 {A, B, D} d4 {A, C}

C A D B A C A D B C A B D C { } A B B D C D C D A C B D A B A B C D

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-10
SLIDE 10

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Examples

  • Example. The sequential case

Data D corresponds to sequences of itemsets. In this case the pattern space is formed by general partial orders. Sequences are often used within the data mining community as a prototypical example of structured domain.

Seq id Input sequences d1 (AE)(C)(D)(A) d2 (D)(ABE)(F)(BCD) d3 (D)(A)(B)(F)

D ABE F BCD AE D A C B A D F D B F A B D F B A D B

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-11
SLIDE 11

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Observations

Observations

The pattern space defined by grows combinatorically with the structure exhibited by data D. Moreover, many patterns may be redundant to describe D.

D ABE F BCD AE D A C B A D F D B F A B D F B A D B

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-12
SLIDE 12

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Formal Concept Analysis

Lattice theory

Compacting the exponential lattice of patterns

Formal Concept Analsis (FCA) is employed for compacting all the relationships of binary data into a Galois lattice without information loss. The final Galois lattice is a closure system. The good structural properties of the Galois lattice define a connection with the classical propositional Horn theory. An important limitation of this approach is that the classical propositional description is unable to reflect any structure.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-13
SLIDE 13

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Formal Concept Analysis

Lattice theory

Compacting the exponential lattice of patterns

Formal Concept Analsis (FCA) is employed for compacting all the relationships of binary data into a Galois lattice without information loss. The final Galois lattice is a closure system. The good structural properties of the Galois lattice define a connection with the classical propositional Horn theory. An important limitation of this approach is that the classical propositional description is unable to reflect any structure.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-14
SLIDE 14

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions This thesis

This thesis

Studies the properties of the closure system obtained when mining structured objects in the form of sequences. Provides a theoretical basis on:

⋆ Horn theory for ordered models ⋆ Partial order construction from sequences ⋆ Lattice theory for partial order structures

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-15
SLIDE 15

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions

Outline

1 Introduction: Mining Structured Data 2 Lattice Theory for Sequences 3 Horn Axiomatizations for Sequences 4 Partial Order Construction 5 Conclusions and Future Research

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-16
SLIDE 16

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Formal Concept Analysis for sequences

Ordered data

Seq id Input sequences d1 (AE)(C)(D)(A) d2 (D)(ABE)(F)(BCD) d3 (D)(A)(B)(F)

Observation The intersection of a collection of sequences returns a set of sequences. Example The intersection of (AD)(C)(B) and (A)(B)(D)(C) is the set of sequences {(A)(C), (A)(B), (D)(C)}.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-17
SLIDE 17

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Formal Concept Analysis for sequences

Derivation operators

We define the following two operators φ and ψ: For a set O ⊆ O of objects: φ(O) = {s ∈ S|s maximally contained in di, for all i ∈ O}. Correspondingly, for a set S ⊆ S of sequences: ψ(S) = {i ∈ O|s ⊆ di, for all s ∈ S}. Example

Seq id Input sequences d1 (AE)(C)(D)(A) d2 (D)(ABE)(F)(BCD) d3 (D)(A)(B)(F)

⊲ φ({1, 3}) = {(D)(A)} ⊲ ψ({(AE)(D), (AE)(C)}) = {1, 2}

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-18
SLIDE 18

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Formal Concept Analysis for sequences

A closure operator

We obtain two basic properties: Maps ψ and φ form a Galois connection. Compositions ∆ = ψ · φ and ∆ = φ · ψ are closure operators. Example

Seq id Input sequences d1 (AE)(C)(D)(A) d2 (D)(ABE)(F)(BCD) d3 (D)(A)(B)(F)

⊲ ∆({(D)(B)}) = {(D)(A)(B), (D)(A)(F), (D)(B)(F)} ⊲ ∆({(D)(A)}) = {(D)(A)}

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-19
SLIDE 19

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Closed sets of sequences

Closed sets of sequences

As customary in FCA: Definition Closed sets of sequences are those coinciding with their closure, that is, ∆(S) = S. Example

Seq id Input sequences d1 (AE)(C)(D)(A) d2 (D)(ABE)(F)(BCD) d3 (D)(A)(B)(F)

⊲ ∆({(D)(B), (D)(A)}) = {(D)(A)(B), (D)(A)(F), (D)(B)(F)} ⊲ ∆({(D)(A)}) = {(D)(A)}

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-20
SLIDE 20

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions The Galois lattice of closed concepts

The Galois lattice of closed concepts

A formal concept of this ordered context is a pair (O, S) with φ(O) = S and ψ(S) = O.

3 { <(AE)(C)(D)(A)> } { <(AE)(C)> , <(AE)(D)> , <(D)(A)> } { <(D)(A)(B)> , <(D)(A)(F)> , <(D)(B)(F)> } {<(D)(A)>} { <(D)(ABE)(F)(BCD)> } { <(D)(A)(B)(F)> } 1,2,3 1,2 2,3 2 1

Naturally, this Galois lattice corresponds to a closure system.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-21
SLIDE 21

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Properties of the closure system

Properties of the closure system

Each node in the lattice corresponds to a closed set of sequences. All sequences in a closed set are maximal in the corresponding set of

  • bjects.

Individually, sequences in the nodes of the lattice are stable sequences (i.e.,s ∈ ∆({s})).

⋆ A sequence is stable if there exists no supersequence with the same support. ⋆ Stable sequences are mined by current state-of-the-art algorithms (e.g. CloSpan) ⋆ The Galois lattice of closed sets of sequences can be constructed by conveniently organizing stable sequences.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-22
SLIDE 22

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions

Outline

1 Introduction: Mining Structured Data 2 Lattice Theory for Sequences 3 Horn Axiomatizations for Sequences 4 Partial Order Construction 5 Conclusions and Future Research

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-23
SLIDE 23

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Notation

Notation

Propositional Horn theory

Assume a finite number of variables.

⋆ V = {x, y, z}

A clause is Horn iff it contains at most one positive literal.

⋆ x ∨ z ∨ y

A model is a complete truth assignment from variables to {0, 1}.

⋆ m(x) = 0, m(z) = 1, m(y) = 1, . . .

Given a set of models M, the Horn theory of M corresponds to the conjunction of all Horn clauses satisfied by all models from M. Main property

Given a set of models M, there is exactly one minimal Horn theory containing

  • it. Semantically, it contains all the models that are intersections of models of
  • M. This is sometimes called the empirical Horn approximation.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-24
SLIDE 24

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Notation

Notation

Propositional Horn theory

Assume a finite number of variables.

⋆ V = {x, y, z}

A clause is Horn iff it contains at most one positive literal.

⋆ x ∨ z ∨ y

A model is a complete truth assignment from variables to {0, 1}.

⋆ m(x) = 0, m(z) = 1, m(y) = 1, . . .

Given a set of models M, the Horn theory of M corresponds to the conjunction of all Horn clauses satisfied by all models from M. Main property

Given a set of models M, there is exactly one minimal Horn theory containing

  • it. Semantically, it contains all the models that are intersections of models of
  • M. This is sometimes called the empirical Horn approximation.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-25
SLIDE 25

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Model identification in sequences

Model transformation

Intuition

Transformation i One propositional variable is assigned to each possible subsequence. ii Let m be a model: we impose on m the constraints that if m(x) = 1 for a variable x, then m(y) = 1 for all those variables y such that y represents a subsequence of the sequence represented by x. Example ⊲ Let d = (C)(A)(A) be an input sequence in D.

Vars Subsequence v1 (A) v2 (C) v3 (A)(A) v4 (A)(C) v5 (C)(A) v6 (C)(C) . . . . . . m(v1) = 1 m(v2) = 1 m(v3) = 1 m(v4) = 0 m(v5) = 1 m(v6) = 0 . . .

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-26
SLIDE 26

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Model identification in sequences

Model transformation

Formalization (i)

Formally, we define the interpretation function: ξ : S → V Now each input sequence d ∈ D corresponds to a model md: the

  • ne that sets to true exactly the variables ξ(s′) where s′ ⊆ d.

Example ⊲ ξ((A)) = x, ξ((A)(B)) = y, ξ((A)(C)) = z, ξ((B)) = w, ξ((B)(A)) = u, . . .

Seq id Input sequences d1 (AE)(C)(D)(A) d2 (D)(ABE)(F)(BCD) d3 (D)(A)(B)(F) Model x y z w u . . . m1 1 1 . . . m2 1 1 1 1 . . . m3 1 1 1 . . .

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-27
SLIDE 27

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Model identification in sequences

Model transformation

Formalization (ii)

Background Horn Conditions Constraints imposed to the models, ⊲ s′ ⊆ s then ξ(s) → ξ(s′), are indeed Horn clauses, which we call background Horn conditions.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-28
SLIDE 28

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Clause identification in sequences

Clause identification

Generators

We define clauses in this sequential context through generators. Definition We say that a set of sequences G is a generator of S when ∆(G) = S.

3 {<(A)(D)>} {<(C)>} {<(E)>} {<(A)>} {<(D>} {<(B)>} {<(F)>} {<(A)(B)(F)>} {<(F)(B)>} {<(C)(D)>} {<(A)(A)>} {<(C)(A)>} {<(E)(A)>} {<(D)(D)>} {<(D)(E)>} {<(F)(D)>} {<(D)(C)>} {<(B)>,<(C)>} {<(B)>,<(E)>} {<(F)>,<(C)>} {<(F)>,<(E)>} {<(B)(B)>} { <(AE)(C)(D)(A)> } { <(AE)(C)> , <(AE)(D)> , <(D)(A)> } { <(D)(A)(B)> , <(D)(A)(F)> , <(D)(B)(F)> } {<(D)(A)>} { <(D)(ABE)(F)(BCD)> } { <(D)(A)(B)(F)> } 1,2,3 1,2 2,3 2 1

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-29
SLIDE 29

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Clause identification in sequences

Clause identification

Deterministic implications

Definition A deterministic association rule with order is an implication G → S such that ∆(G) = S. Example ∆({(E)}) = {(AE)(C), (AE)(D), (D)(A)} (E) → (AE)(C), (AE)(D), (D)(A) Due to the construction of operator ∆, we can argue that all the rules derived from the lattice have confidence 1. Moreover, these implications can be interpreted as Horn clauses.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-30
SLIDE 30

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Horn approximation for sequential data

Main characterization

Theorem Given a set of input sequences D, the conjunction of all the determinis- tic association rules with order constructed by the closure system, seen as propositional formulas, and together with the background Horn condi- tions, axiomatizes exactly the empirical Horn approximation of the theory containing the set of models M = models(D). Computation of these rules Algorithmically, a way to compute these rules is by transversing the hypergraph of differences between the nodes of the lattice.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-31
SLIDE 31

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Horn approximation for sequential data

Main characterization

Theorem Given a set of input sequences D, the conjunction of all the determinis- tic association rules with order constructed by the closure system, seen as propositional formulas, and together with the background Horn condi- tions, axiomatizes exactly the empirical Horn approximation of the theory containing the set of models M = models(D). Computation of these rules Algorithmically, a way to compute these rules is by transversing the hypergraph of differences between the nodes of the lattice.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-32
SLIDE 32

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions

Outline

1 Introduction: Mining Structured Data 2 Lattice Theory for Sequences 3 Horn Axiomatizations for Sequences 4 Partial Order Construction 5 Conclusions and Future Research

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-33
SLIDE 33

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Partial orders to summarize sequences

Partial orders to summarize sequences

Typically, partial orders are used to compact the information on the

  • rder relationships between the items of a set of sequences.

A common goal is to identify the most specific partial orders compatible with a nonempty set of sequences.

Seq id Input sequences d1 (AE)(C)(D)(A) d2 (D)(ABE)(F)(BCD) d3 (D)(A)(B)(F)

F A B B D F A B B D

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-34
SLIDE 34

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions The closure system of partial orders

The closure system of partial orders

Starting from the same ordered context we can also characterize a closure operator Ω for partial orders. From operator Ω we derive the lattice of closed partial orders.

AE D ABE F BCD AE D A C A B D F B A D B A D F 1 2 3 2,3 1,2 1,2,3 D D A C

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-35
SLIDE 35

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Closed partial orders

Closed partial orders

As customary in FCA: Definition Closed partial orders are those coinciding with their closure, that is, Ω(G) = G. Closed partial orders are the most specific structures to generalize a set of input sequences. How to construct these closed structures? Transitivity property is assumed.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-36
SLIDE 36

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Closed partial orders

Closed partial orders

As customary in FCA: Definition Closed partial orders are those coinciding with their closure, that is, Ω(G) = G. Closed partial orders are the most specific structures to generalize a set of input sequences. How to construct these closed structures? Transitivity property is assumed.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-37
SLIDE 37

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Closed partial orders

Closed partial orders

As customary in FCA: Definition Closed partial orders are those coinciding with their closure, that is, Ω(G) = G. Closed partial orders are the most specific structures to generalize a set of input sequences. How to construct these closed structures? Transitivity property is assumed.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-38
SLIDE 38

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Main characterization

Main characterization

Theorem The lattice of closed sets of sequences in D and the lattice of closed partial

  • rders in D are isomorphic upon transitivity.

3 { <(AE)(C)(D)(A)> } { <(AE)(C)> , <(AE)(D)> , <(D)(A)> } { <(D)(A)(B)> , <(D)(A)(F)> , <(D)(B)(F)> } {<(D)(A)>} { <(D)(ABE)(F)(BCD)> } { <(D)(A)(B)(F)> } 1,2,3 1,2 2,3 2 1

AE D ABE F BCD AE D A C A B D F B A D B A D F 1 2 3 2,3 1,2 1,2,3 D D A C

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-39
SLIDE 39

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Main characterization

Main characterization

The isomorphy between lattices defines a way to transform each closed set of sequences of ∆ into a closed partial order of Ω. Observation Roughly, this transformation consists on considering each closed set of sequences as the maximal paths of a closed partial order.

2,3 { <(D)(A)(B)> , <(D)(A)(F)> , <(D)(B)(F)> } 2,3 A B B D F

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-40
SLIDE 40

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Path preserving transformations

Path preserving transformations

Definition (Path preserving transformations) Closed partial orders are construced by matching positions of the sequences in the same group S. Two positions are path preserving if the paths crossing those positions are already included in S. Example ⊲ Let us consider S = {(D)(A)(B), (D)(A)(F), (D)(B)(F)}

F D B A B

Transitivity is a property assumed between the path preserving positions

  • f sequences in S.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-41
SLIDE 41

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Path preserving transformations

Path preserving transformations

Definition (Path preserving transformations) Closed partial orders are construced by matching positions of the sequences in the same group S. Two positions are path preserving if the paths crossing those positions are already included in S. Example ⊲ Let us consider S = {(D)(A)(B), (D)(A)(F), (D)(B)(F)}

F D B A B

Transitivity is a property assumed between the path preserving positions

  • f sequences in S.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-42
SLIDE 42

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Path preserving transformations

Path preserving transformations

Definition (Path preserving transformations) Closed partial orders are construced by matching positions of the sequences in the same group S. Two positions are path preserving if the paths crossing those positions are already included in S. Example ⊲ Let us consider S = {(D)(A)(B), (D)(A)(F), (D)(B)(F)}

F D B A B

Transitivity is a property assumed between the path preserving positions

  • f sequences in S.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-43
SLIDE 43

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Path preserving transformations

Path preserving transformations

Definition (Path preserving transformations) Closed partial orders are construced by matching positions of the sequences in the same group S. Two positions are path preserving if the paths crossing those positions are already included in S. Example ⊲ Let us consider S = {(D)(A)(B), (D)(A)(F), (D)(B)(F)}

F D B A B

Transitivity is a property assumed between the path preserving positions

  • f sequences in S.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-44
SLIDE 44

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Path preserving transformations

Path preserving transformations

Formalization These path-preserving transformations are formally justified with

  • perations of category theory.

When items are not repeated in the input sequences, then transitivity always holds.

⋆ Then, coproducts characterize the path-preserving transformation.

If items are repeated, then transitivity may not hold.

⋆ Under transitivity, colimits are a natural generalization of coproducts. ⋆ Without transitivity, we cannot ensure the maximally specificity property of the constructed structures.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-45
SLIDE 45

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Path preserving transformations

Path preserving transformations

Formalization These path-preserving transformations are formally justified with

  • perations of category theory.

When items are not repeated in the input sequences, then transitivity always holds.

⋆ Then, coproducts characterize the path-preserving transformation.

If items are repeated, then transitivity may not hold.

⋆ Under transitivity, colimits are a natural generalization of coproducts. ⋆ Without transitivity, we cannot ensure the maximally specificity property of the constructed structures.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-46
SLIDE 46

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Path preserving transformations

Path preserving transformations

Formalization These path-preserving transformations are formally justified with

  • perations of category theory.

When items are not repeated in the input sequences, then transitivity always holds.

⋆ Then, coproducts characterize the path-preserving transformation.

If items are repeated, then transitivity may not hold.

⋆ Under transitivity, colimits are a natural generalization of coproducts. ⋆ Without transitivity, we cannot ensure the maximally specificity property of the constructed structures.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-47
SLIDE 47

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Implications

Implications of this transformation

The algorithmic advantage is that we do not need to mine partial

  • rders directly from the sequential data.

The problem is reduced to finding stable sequences and performing a transformation on them. Now, we can extend these contributions to mining input objects represented as partial orders.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-48
SLIDE 48

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Towards partial order structures

Towards mining partial order structures

Consider that objects in D are partial orders.

A A B C D C A B C D

D A C B C

As suggested by the transformations, each partial order can be represented by a set of sequences corresponding to its maximal paths.

Poset Id Sets of Maximal Paths d1 {(A)(B)(C)(D), (A)(A)(D)} d2 {(A)(B)(C), (A)(B)(D), (A)(C)(D)} d3 {(A)(C)(B)(D), (A)(C)(B)(C)}

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-49
SLIDE 49

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Towards partial order structures

Towards mining partial order structures

Let us consider the two input partial orders.

B D A A C B A C D A

They corresond to sets of sequences.

⋆ d = {(A)(C)(B), (A)(C)(D), (A)(C)(A)} ⋆ d′ = {(A)(B), (A)(C)(D)(A)} ⋆ d d′ = {(A)(B), (A)(C)(D), (A)(C)(A)}

A B A C D

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-50
SLIDE 50

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions

Outline

1 Introduction: Mining Structured Data 2 Lattice Theory for Sequences 3 Horn Axiomatizations for Sequences 4 Partial Order Construction 5 Conclusions and Future Research

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-51
SLIDE 51

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Main contributions

Main contributions (i): Galois lattice of closed sets of sequences

The basis of a framework for mining general patterns from a set of sequences. A new concept lattice for sequences in terms of a Galois connection. A connection of each closed set of sequences with the stable sequences mined by CloSpan.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-52
SLIDE 52

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Main contributions

Main contributions (ii): The full implicational system for sequences

A proper notion of deterministic association rules in ordered data. A way of mining facts where a set of subsequences implies another subsequence in the data. Rules that can be formally justified by a purely logical characterization, namely the empirical Horn approximation for

  • rdered data.

Reduction of the rule mining problem in sequences to hypergraph transversal.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-53
SLIDE 53

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Main contributions

Main contributions (iii): Identification of partial orders in sequences

Notion of closed partial orders compatible with a set of sequences of itemsets. These closed partial orders are directly derived from the closed sets

  • f sequences represented in the Galois lattice for sequences.

Our analysis develops on the properties of the maximal paths that will compose the final closed partial orders.

A way to understand the mining of acyclic graphs by transforming each complex object into a set of sequences.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-54
SLIDE 54

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Future lines

Future lines

Horn clauses with exceptions.

A relevant property of the deterministic rules characterized in the thesis is the need of absolute confidence; this can be inappropriate in different ways.

Horn axiomatizations for structured objects.

A related issue corresponds to the extension of the full implicational system to the complex structured data, such as partial orders.

Closed sets in multirelational data.

The mining task on this data becomes the problem of finding closed queries (conjunction of atoms) in D. A query on the data can be assigned different semantics.

Other applications of closed structures.

Classification, subgroup discovery, learning rules from cellular automata . . .

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-55
SLIDE 55

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Relevant publications

Relevant publications (i)

G.C. Garriga Towards a Formal Framework for Mining General Patterns from Structured Data 2nd Int. KDD Workshop on Multirelational Datamining. 2003. J.L. Balc´ azar and G.C. Garriga and P.D´ ıaz-L´

  • pez

Reconstructing the rules of 1D cellular automata using closure systems

  • 2nd. European Conf. on Complex Systems. 2005.

J.L. Balc´ azar and G.C. Garriga On Horn Axiomatizations for Sequential Data 10th Int. Conf. on Database Theory. 2005.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects

slide-56
SLIDE 56

Introduction Lattice Theory Horn Axiomatizations Poset Construction Conclusions Relevant publications

Relevant publications (ii)

J.L. Balc´ azar and G.C. Garriga On Horn Axiomatizations for Sequential Data To appear Theoretical Computer Science (special issue ICDT). 2006. G.C. Garriga Summarizing Sequential Data with Closed Partial Orders SIAM Int. Conf. on Data Mining. 2005. G.C. Garriga and J.L. Balc´ azar Coproduct Transformations on Lattices of Closed Partial Orders

  • 2nd. Int. Conf. on Graph Transformation. 2004.

Gemma Casas Garriga Universitat Polit` ecnica de Catalunya Formal Methods for Mining Structured Objects