Analyse Relationnelle de Concepts: Une approche pour fouiller des - - PowerPoint PPT Presentation

analyse relationnelle de concepts
SMART_READER_LITE
LIVE PREVIEW

Analyse Relationnelle de Concepts: Une approche pour fouiller des - - PowerPoint PPT Presentation

An introduction to RCA Analyse Relationnelle de Concepts: Une approche pour fouiller des ensembles de donnes multi-relationnels Ecole des Mines dAls Sminaire LGI2P fvrier 2013 Marianne Huchard February 14, 2013 Marianne Huchard


slide-1
SLIDE 1

An introduction to RCA

Analyse Relationnelle de Concepts:

Une approche pour fouiller des ensembles de données multi-relationnels

Ecole des Mines d’Alès Séminaire LGI2P février 2013

Marianne Huchard February 14, 2013

Marianne Huchard EMA 2013

slide-2
SLIDE 2

An introduction to RCA

Brief presentation of FCA – Formal Concept Analysis

A methodology for:

◮ data analysis, data mining ◮ knowledge representation ◮ unsupervised learning

Roots:

◮ lattice theory, Galois correspondences (Birkhoff, 1940; Barbut

& Monjardet, 1970)

◮ concept lattices (Wille, 1982)

Marianne Huchard EMA 2013

slide-3
SLIDE 3

An introduction to RCA

Brief presentation of FCA – Formal Concept Analysis

Contexts and concepts

◮ Handled data

◮ entities with characteristics ◮ provided with a Formal Context (a binary table) flying nocturnal feathered migratory with_crest with_membrane flying squirrel ×

×

bat

× × ×

  • strich

×

flamingo

× × ×

chicken

× × ×

◮ Concept : maximal group of entities sharing characteristics ◮ Concept lattice : concepts with a partial order relation

Marianne Huchard EMA 2013

slide-4
SLIDE 4

An introduction to RCA

Contexte binaire et ses applications caractéristiques

Contexte (O, A, R)

◮ deux ensembles finis O et A ◮ une relation binaire R ⊆ O × A.

Définition (applications caractéristiques d’une relation binaire)

Attributs communs à un ensemble d’objets f : P(O) → P(A) X − → f (X) = {y ∈ A | ∀x ∈ X, (x, y) ∈ R} Objets partageant un ensemble d’attributs g : P(A) → P(O) Y − → g(Y ) = {x ∈ O | ∀y ∈ Y , (x, y) ∈ R}

Marianne Huchard EMA 2013

slide-5
SLIDE 5

An introduction to RCA

Correspondances de Galois

Résultats de Birkhoff 1940, Ore 1944, Barbut et Monjardet 1970 (f , g) forme une correspondance de Galois

◮ couple d’applications resp. de (A, ≤A) dans (B, ≤B) et

inversement, décroissantes et dont les deux composées f ◦ g et g ◦ f sont extensives Conséquences

◮ f ◦ g et g ◦ f sont des opérateurs de fermeture (monotone

croissant, extensif, idempotent)

◮ si (A, ≤A) et (B, ≤B) sont des treillis finis, leurs ensembles de

fermés pour f ◦ g et g ◦ f forme deux treillis isomorphes (en utilisant l’infimum et la fermeture du supremum)

◮ Le treillis de Galois est le sous-treillis du treillis du produit des

fermés restreint aux couples (x, y) tels que y = f (x) Retenir : Le treillis de concepts est un cas particulier de treillis de Galois

Marianne Huchard EMA 2013

slide-6
SLIDE 6

An introduction to RCA

Concept

Un concept formel C est un couple (E, I) tel que f (E) = I

  • u de manière équivalente E = g(I)

E = { e ∈ O | ∀ i ∈ I, (e, i) ∈ R} est l’extension (objets couverts), I = { i ∈ A | ∀ e ∈ E, (e, i) ∈ R} est l’intension (caractéristiques partagées).

Marianne Huchard EMA 2013

slide-7
SLIDE 7

An introduction to RCA

Spécialisation entre concepts

L’ensemble de tous les concepts C forme un treillis L lorsqu’il est muni de l’ordre suivant : (E1, I1) ≤L (E2, I2) ⇔ E1 ⊆ E2 (or de manière équivalente I2 ⊆ I1). On peut déduire de la réduction transitive du treillis l’ensemble minimal non redondant des implications du contexte qui ont un support non nul (il existe des objets vérifiant l’implication)

Marianne Huchard EMA 2013

slide-8
SLIDE 8

An introduction to RCA

Brief presentation of FCA – Formal Concept Analysis

Marianne Huchard EMA 2013

slide-9
SLIDE 9

An introduction to RCA

Brief presentation of FCA – Formal Concept Analysis

Marianne Huchard EMA 2013

slide-10
SLIDE 10

An introduction to RCA

Brief presentation of FCA – Formal Concept Analysis

Marianne Huchard EMA 2013

slide-11
SLIDE 11

An introduction to RCA

Brief presentation of FCA – Formal Concept Analysis

Marianne Huchard EMA 2013

slide-12
SLIDE 12

An introduction to RCA

Brief presentation of FCA – Formal Concept Analysis

AOC-poset Attribute-Object-Concept poset Lattice without Concepts 0, 3 and 4) max #concepts in lattice 2min(|A|,|O|) max #concepts in AOC- poset |A| + |O|

Marianne Huchard EMA 2013

slide-13
SLIDE 13

An introduction to RCA

FCA and complex data

◮ many-valued contexts (integers, floats, terms, structures,

symbolic objects, etc.) (Ganter et Wille, Polaillon, ...)

◮ fuzzy descriptions (Yahia et al., Belohlavek, ...) ◮ hierarchies on values (Godin et al., Carpineto et Romano, ...) ◮ logical description (Chaudron et al., Ferré et al., ...) ◮ graphs (Liquière, Prediger et Wille, ...) ◮ linked objects (Priss, Hacène-Rouane et al., ...) ◮ etc.

Marianne Huchard EMA 2013

slide-14
SLIDE 14

An introduction to RCA

Relational Concept Analysis (RCA)

◮ Extends the purpose of FCA for taking into account object

categories and links between objects

◮ Main principles:

◮ a relational model based on the entity-relationship model ◮ integrate relations between objects as relational attributes ◮ iterative process

◮ RCA provides a set of interconnected lattices ◮ Produced structures can be represented as ontology concepts

within a knowledge representation formalism such as description logics (DLs). Joint work with:

  • A. Napoli, C. Roume, M. Rouane-Hacène, P. Valtchev

Marianne Huchard EMA 2013

slide-15
SLIDE 15

An introduction to RCA

Relational Context Family (RCF)

A simple entity-relationship model to introduce RCA

Relational Context Family

◮ object-attribute contexts

◮ Pizza ◮ Ingredient

◮ object-object context

◮ has-topping ⊆ Pizza × Ingredient Marianne Huchard EMA 2013

slide-16
SLIDE 16

An introduction to RCA

Relational Context Family (RCF)

A RCF F is a pair (K, R) with:

◮ K is a set of object-attribute contexts Ki = (Oi, Ai, Ii) ◮ R is a set of object-object contexts Rj = (Ok, Ol, Ij),

◮ (Ok, Ol) are the object sets of formal contexts (Kk, Kl) ∈ K 2 ◮ Ij ⊆ Ok × Ol ◮ Kk is the source/domain context, Kl is the target/range

context.

◮ we may have Kk = Kl. Marianne Huchard EMA 2013

slide-17
SLIDE 17

An introduction to RCA

Relational Context Family (RCF) / object-attributes contexts

Pizza thin thick calzone

  • konomi

× alberginia × margherita × languedoc × four-cheeses × three-cheeses × frutti-di-mare × quebec × regina × hawai × lorraine × kebab × Ingredient fruit-vegetable meat fish dairy cereal-leguminous veg-oil tomato-sauce × cream × tomato × basilic ×

  • live

×

  • live oil

× soy × mushroom × eggplant ×

  • nion

× pepper × ananas × mozza × goat-cheese × emmental × fourme-ambert × squid × shrimp × mussels × ham × bacon × Marianne Huchard EMA 2013

slide-18
SLIDE 18

An introduction to RCA

Relational Context Family (RCF) / object-object context / part 1

has-topping tomato-sauce cream tomato basilic

  • live
  • live oil

soy mushroom eggplant

  • nion

pepper ananas

  • konomi

× × × × alberginia × × × × × margherita × × × × × languedoc × × × × × × × four-cheeses × three-cheeses × frutti-di-mare × × × quebec × regina × × hawai × × lorraine × × kebab × × × × Marianne Huchard EMA 2013

slide-19
SLIDE 19

An introduction to RCA

Relational Context Family (RCF) / object-object context / part 2

has-topping mozza goat-cheese emmental fourme-ambert squid shrimp mussels ham bacon chicken maple-sirup corn

  • konomi

alberginia margherita × languedoc × four-cheeses × × × × three-cheeses × × × frutti-di-mare × × × × quebec × × × × regina × × hawai × × lorraine × × kebab × × Marianne Huchard EMA 2013

slide-20
SLIDE 20

An introduction to RCA

Data patterns we would like to extract

Using a classification on ingredients by their categories of topping (fruit-vegetable, dairy, etc.)

◮ All pizzas, even different, except four-cheese and three-cheese,

contain at least one topping which is a vegetable

◮ Two pizzas (four-cheese and three-cheese) have all their

topping in dairy ingredients

◮ For pizzas: have meat ⇒ have dairy ◮ For pizzas: being thin ⇒ have at least dairy ◮ For pizzas: have only dairy ⇒ being thin

Marianne Huchard EMA 2013

slide-21
SLIDE 21

An introduction to RCA

RCA - Initial Lattice building

At the beginning, only the object-attribute contexts are used to build the foundation of the concept lattice family

Marianne Huchard EMA 2013

slide-22
SLIDE 22

An introduction to RCA

RCA - Introducing relations as relational attributes

Given an object-object context Rj = (Ok, Ol, Ij), There are different notable schemas between an object of domain Ok and concepts formed on Ol.

  • E. g.

◮ Existential: an object is linked (by Rj) to at least one object

  • f the extent of a concept

◮ Universal: an object is linked (by Rj) only to objects of the

extent of a concept

Marianne Huchard EMA 2013

slide-23
SLIDE 23

An introduction to RCA

RCA - Existential relational attributes

margherita has one topping in Concept_10 extent: mozza. It has other links to other concept extents. ∃has-topping.Concept_10 is assigned to margherita

Marianne Huchard EMA 2013

slide-24
SLIDE 24

An introduction to RCA

RCA - Existential relational attributes

Scaled relations with domain Oi are concatenated to Ki, the

  • bject-attribute context on Oi

Pizza thin thick calzone

  • konomi

× alberginia × margherita × languedoc × four-cheeses × three-cheeses × frutti-di-mare × quebec × regina × hawai × lorraine × kebab × has-topping ∃has-topping. Concept_7 ∃has-topping. Concept_5 ∃has-topping. Concept_6 ∃has-topping. Concept_8 ∃has-topping. Concept_9 ∃has-topping. Concept_10 ∃has-topping. Concept_11 ∃has-topping. Concept_12

  • konomi

x x x alberginia x x x margherita x x x x languedoc x x x x four-cheeses x x three-cheeses x x frutti-di-mare x x x x x quebec x x x x x regina x x x x hawai x x x x lorraine x x x x kebab x x x x Marianne Huchard EMA 2013

slide-25
SLIDE 25

An introduction to RCA

Relational Concept Family / exists

Marianne Huchard EMA 2013

slide-26
SLIDE 26

An introduction to RCA

Relational Concept Family / exists

Concept_21: pizzas with at least one topping in dairy Concept_18: pizzas with at least one topping in meat have at least one meat topping ⇒ have at least one dairy topping

Marianne Huchard EMA 2013

slide-27
SLIDE 27

An introduction to RCA

RCA - Universal relational attributes

three-cheese has topping in and only in Concept_10 extent. ∀∃has-topping.Concept_10 is assigned to three-cheese

Marianne Huchard EMA 2013

slide-28
SLIDE 28

An introduction to RCA

RCA - Universal relational attributes

Scaled relations with domain Oi are concatenated to Ki, the

  • bject-attribute context on Oi

Pizza thin thick calzone

  • konomi

× alberginia × margherita × languedoc × four-cheeses × three-cheeses × frutti-di-mare × quebec × regina × hawai × lorraine × kebab × has-topping ∀∃has-topping. Concept_7 ∀∃has-topping. Concept_5 ∀∃has-topping. Concept_6 ∀∃has-topping. Concept_8 ∀∃has-topping. Concept_9 ∀∃has-topping. Concept_10 ∀∃has-topping. Concept_11 ∀∃has-topping. Concept_12

  • konomi

x alberginia x margherita x languedoc x four-cheeses x x three-cheeses x x frutti-di-mare x quebec x regina x hawai x lorraine x kebab x Marianne Huchard EMA 2013

slide-29
SLIDE 29

An introduction to RCA

Relational Concept Family / forall

Marianne Huchard EMA 2013

slide-30
SLIDE 30

An introduction to RCA

Relational Concept Family / forall

Concept_13: pizzas with only dairy topping Concept_1: thin pizzas have only dairy topping ⇒ thin

Marianne Huchard EMA 2013

slide-31
SLIDE 31

An introduction to RCA

RCA - Introducing relations as relational attributes

Definition (The existential scaling operator)

Given K = (O, A, I) and an object-object context r, where O is the domain of r, and let ir be such that Oir is the range of r, with Kir = (Oir , Air , Iir ). Let also Lir be the lattice of Kir . The existential scaling operator S(r,∃),Lir maps K into the derived context K+ = (O+, A+, I +), where:

◮ O+ = O ◮ A+ = {∃r : c | c ∈ Lir }, where each ∃r : c is a relational

attribute

◮ I + = {(o, ∃r : c) | o ∈ O, c ∈ Lir , r(o) ∩ Ext(c) = ∅}

Marianne Huchard EMA 2013

slide-32
SLIDE 32

An introduction to RCA

Scaling operators

Operator Attribute form Condition Universal (narrow) ∀ r.c r(o) ⊆ Ext(c) Covers ⊇ r.c r(o) ⊇ Ext(c) Existential (wide) ∃ r.c r(o) ∩ Ext(c) = ∅ Universal strict ∀∃ r.c r(o) ⊆ Ext(c) and r(o) = ∅ Qualified cardinality restriction ≥ n r.c r(o) ⊆ Ext(c) and |r(o)| ≥ n Cardinality restriction ≥ n r.⊤L |r(o)| ≥ n Marianne Huchard EMA 2013

slide-33
SLIDE 33

An introduction to RCA

General Entity-Relationship diagram

General ER diagram may present cycle/circuits between classes/objects

Marianne Huchard EMA 2013

slide-34
SLIDE 34

An introduction to RCA

The RCA schema

Input

RCF = (K, R) : n object-attribute contexts, m object-object contexts

Initialization step

build, for i in 1..n, L0[i] the concept lattice of the context Ki

Step p

⊲ Apply relational scaling to all object-object contexts Rj, using the lattices of step p − 1 and the chosen scaling operator ⊲ concatenate Ki with the scaled R∗

j whose domain is Oi

⊲ update lattices of step p − 1 to build, for i in 1..n, the lattice Lp[i] for the context Ki concatenated as explained previously

Output (fix point)

The concept lattice family obtained when no new concepts are added

Marianne Huchard EMA 2013

slide-35
SLIDE 35

An introduction to RCA

The RCA schema

Definition (Complete relational extension of a context)

Let rel(K = (O, A, I)) = {rl, |O is the domain of rl}l=1,...,mK. Given a RCF (K, R), with a set of lattices L, a scaling constructor mapping ρ which associates a scaling operator to each object-object context, and a context K ∈ K with rel(K) = {rl}l=1,...,mK, the complete relational extension of K w.r.t. ρ and L is Eρ,L(K) = K | S(r1,ρ(r1)),Li1(K) | . . . | S(rmk ,ρ(rmk )),Limk (K)

Marianne Huchard EMA 2013

slide-36
SLIDE 36

An introduction to RCA

The RCA schema

Definition (Complete relational extension of an RCF)

Given a RCF (K, R) whose context set is K = {K1, . . . , Kn} and whose set of lattices is L, and a scaling constructor mapping ρ, the complete relational extension of K is a set of contexts defined as E∗

ρ,L(K) = {Eρ,L(K1), . . . , Eρ,L(Kn)}.

Marianne Huchard EMA 2013

slide-37
SLIDE 37

An introduction to RCA

The RCA schema

The whole construction process consists in building a sequence of contexts and lattices associated to (K, R) and ρ. Step 0. The first set of contexts is K0 = K. Step p + 1. The contexts of step p are used to build the associated lattices. The set of contexts at step p + 1 is defined using the relational extension: Kp+1 = E∗

ρ,Lp(Kp)

Marianne Huchard EMA 2013

slide-38
SLIDE 38

An introduction to RCA

Analysis of pizza data

∃ prefers ∀∃ has-topping ∀∃ has-category ∀∃ is-produced-by

Marianne Huchard EMA 2013

slide-39
SLIDE 39

An introduction to RCA

Analysis of pizza data - object-atribute contexts

Pizza thin thick calzone forest ×

  • ccitane

× three-cheese × four-cheese × lorraine × arctic × People

  • rganic-farmer

conventional-farmer Amedeo × Amine × Cyril × Marianne × Petko × Ingredient tomato-sauce cream

  • nion

bacon salmon soy-cream mozza goat-cheese emmental fourme-ambert eggplant mushroom Category mediterranean vegan vegetarian fruit-vegetable × × × meat fish × dairy × × Marianne Huchard EMA 2013

slide-40
SLIDE 40

An introduction to RCA

Analysis of pizza data - object-object contexts

prefers forest

  • ccitane

three-cheese four-cheese lorraine arctic Amedeo × Amine × Cyril × × Marianne × × Petko × has-topping tomato-sauce cream

  • nion

bacon salmon soy-cream mozza goat-cheese emmental fourme-ambert eggplant mushroom forest × ×

  • ccitane

× × × three-cheese × × × × four-cheese × × × × × × lorraine × × × × arctic × × × × Marianne Huchard EMA 2013

slide-41
SLIDE 41

An introduction to RCA

Analysis of pizza data - object-object contexts

is-produced-by Amedeo Amine Cyril Marianne Petko tomato-sauce × × cream ×

  • nion

× bacon × salmon × × soy-cream × mozza × × goat-cheese × emmental × × fourme-ambert × × eggplant × mushroom × has-category fruit-vegetable meat fish dairy tomato-sauce × cream ×

  • nion

× bacon × salmon × soy-cream × mozza × goat-cheese × emmental × fourme-ambert × eggplant × mushroom × Marianne Huchard EMA 2013

slide-42
SLIDE 42

An introduction to RCA

Concept lattice family

∃ prefers ∀∃ has-topping ∀∃ has-category ∀∃ is-produced-by

Marianne Huchard EMA 2013

slide-43
SLIDE 43

An introduction to RCA

Concept lattice family

Step 0

Marianne Huchard EMA 2013

slide-44
SLIDE 44

An introduction to RCA

Concept lattice family

Step 1

Marianne Huchard EMA 2013

slide-45
SLIDE 45

An introduction to RCA

Concept lattice family

Step 2

Marianne Huchard EMA 2013

slide-46
SLIDE 46

An introduction to RCA

Concept lattice family

Step 3

Marianne Huchard EMA 2013

slide-47
SLIDE 47

An introduction to RCA

Concept lattice family

Step 4

Marianne Huchard EMA 2013

slide-48
SLIDE 48

An introduction to RCA

Concept lattice family

◮ People: ∃prefers.Concept_28 ⇒ organic farmer ◮ Ingredient: ∀∃has-category.Concept_12 ⇔

∀∃is-produced-by.Concept_6 (organic farmers)

◮ Amedeo/Amine prefer at least one pizza with only vegan

topping ingredients and produced only by organic farmers

Marianne Huchard EMA 2013

slide-49
SLIDE 49

An introduction to RCA

RCA - Arguments for convergence (finite object / attribute sets)

◮ the number of objects (lines) in extended contexts doesn’t

change, this limits the concepts number of every lattice Li to be 2|Oi|

◮ the number of columns cannot increase indefinitely since new

attributes are S r : c, where S is a scaling operator, r is a relation, for example with target(r) = Oi and c is the concept

  • f a lattice Li built on Oi

Marianne Huchard EMA 2013

slide-50
SLIDE 50

An introduction to RCA

RCA - Complexity issues

Simple approach for lattice construction cost:

◮ Lowest known worst-case time complexity for concept lattice

construction: O(nc × |O| × |A|) where nc is the number concept in the lattice

◮ O(niterations × (n i=1(nci × |Oi| × |Ai|)))

With more precision, using an iterative algorithm (adding attributes

  • ne by one):

◮ construction: O(ncmax × |Omax| × |Amax|) where ncmax is the

maximal number concept in a lattice, |Omax| the greatest

  • bject number and |Amax| the greatest attribute number

(including relational ones)

◮ scaling: O(ncmax × |Omax|2) ◮ termination: just compare the lattices size (neglected)

Marianne Huchard EMA 2013

slide-51
SLIDE 51

An introduction to RCA

A synthesis on RCA

◮ an iterative method to produce interconnected classifications ◮ converges after a number of iterations that depends on the

structure

◮ variations on scaling can be done

Tools

◮ Galicia: http://galicia.sourceforge.net/ ◮ eRCA: http://code.google.com/p/erca/

Marianne Huchard EMA 2013

slide-52
SLIDE 52

An introduction to RCA

RCA - Current tracks

◮ Querying the concept lattice family ◮ Exploratory RCA: select, divide, step-by-step ◮ Metrics for guiding the process and filtering concepts ◮ Build Galois sub-hierarchy (AOC-poset)

Marianne Huchard EMA 2013