Advances in algorithms based on CbO Petr Krajca, Jan Outrata, Vilem - - PowerPoint PPT Presentation

advances in algorithms based on cbo
SMART_READER_LITE
LIVE PREVIEW

Advances in algorithms based on CbO Petr Krajca, Jan Outrata, Vilem - - PowerPoint PPT Presentation

Advances in algorithms based on CbO Petr Krajca, Jan Outrata, Vilem Vychodil Palacky University, Olomouc, Czech Republic Concept Lattices and Their Applications, Sevilla, 2010 Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms


slide-1
SLIDE 1

Advances in algorithms based on CbO

Petr Krajca, Jan Outrata, Vilem Vychodil

Palacky University, Olomouc, Czech Republic

Concept Lattices and Their Applications, Sevilla, 2010

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 1 / 21

slide-2
SLIDE 2

Contribution

Three topics of interest:

1

improved canonicity test

additional canonicity test for Close-by-One result: reduction of the number of concepts computed multiple times

2

parallelization

simultanous computation of disjoint sets of concepts focus on various workload distributions

3

data preprocessing

role of attribute permutations experimental observations: efficiency of algorithms w. r. t. number of inversions

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 2 / 21

slide-3
SLIDE 3

Related Work

Next Closure, Close-by-One: Ganter B.: Two basic algorithms in concept analysis. (Technical Report FB4-Preprint No. 831). TH Darmstadt, 1984. Kuznetsov S.: A fast algorithm for computing all intersections of objects in a finite semi-lattice. Autom. Docum. Math. Ling., 27(5)(1993), 11–21. Parallel and Distributed CbO: Krajca, Outrata, Vychodil: Parallel algorithm for computing fixpoints of Galois

  • connections. AMAI (to appear), DOI 10.1007/s10472-010-9199-5.

Krajca, Vychodil: Distributed algorithm for computing formal concepts using map-reduce framework. In: Proc. IDA 2009, LNCS 5772(2009), 333–344. Krajca, Vychodil: Comparison of data structures for computing formal concepts. In: Proc. MDAI 2009, LNAI 5861(2009), 114–125.

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 3 / 21

slide-4
SLIDE 4

Inside CbO: The Canonicity Test

Assumptions: X = {0, 1, . . . , m} (objects); Y = {0, 1, . . . , n} (attributes); A, B . . . current formal concept; attribute y ∈ Y such that y ∈ B; put C = (B ∪ {y})↓ and D = (B ∪ {y})↓↑ .

The Test

For A, B and C, D check whether D ∩ {0, 1, . . . , y − 1} = B ∩ {0, 1, . . . , y − 1} is true: yes (success) = ⇒ proceed with C, D no (failure) = ⇒ skip C, D

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 4 / 21

slide-5
SLIDE 5

CbO Represented by Recursive Procedure GenerateFrom

Procedure GenerateFrom

input: formal concept A, B, first attribute y to be added to A, B

  • utput: all formal concepts with intents containing B

procedure GenerateFrom(A, B, y)

  • utput A, B

if B = Y then for j from y upto n do if j ∈ B then C ← A ∩ {y}↓ D ← C↑ if D ∩ {0, 1, . . . , j − 1} = B ∩ {0, 1, . . . , j − 1} then call GenerateFrom(C, D, j + 1) Initially called with ∅↓, ∅↓↑ and y = 0.

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 5 / 21

slide-6
SLIDE 6

Performance Issues

Canonicity Test: the same closure computed multiple times (returned once) canonicity test is performed after computing of closure Proposed Solution: reuse information about canonicity test failure perform additional test before computing of closure (if possible) CbO Tree – a call tree for procedure GenerateFrom nodes – represent computed closures, two types:

1

Bi, y: represents invocation of GenerateFrom with arguments Bi and y

2

+ Bi: Bi is computed but fails the canonicity test

edges – between nodes labeled by attributes

edge between Bi and Bj is labeled by y whenever (Bi ∪ {y})↓↑ = Bj

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 6 / 21

slide-7
SLIDE 7

Example (CbO Tree)

5 4 3 5 4 3 5 5 4 5 5 4 5 3 5 3 5 4 3 5 4 3 2 3 3 4 2 4 2 1 1 2

  • B8
  • B9
  • B8
  • B8
  • B8
  • B8

B12, 3

  • B5
  • B6
  • B5
  • B5
  • B5
  • B5

B11, 3 B10, 2

  • B8
  • B8

B9, 5

  • B8
  • B8
  • B8

B8, 4 B7, 3

  • B5
  • B5

B6, 5

  • B5
  • B5
  • B5

B5, 4 B4, 3 B3, 2 B2, 1 B1, 0

1 2 3 4 5 × × × × × × × × × × × × ×

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 7 / 21

slide-8
SLIDE 8

Additional Test

Under the notation B j =

  • (B ∪ {j})↓↑ \ B
  • ∩ {0, 1, . . . , j − 1}

we have:

Lemma (On Test Failure Propagation)

Let B ⊆ Y , j ∈ B, and B j = ∅. Then, for each B′ ⊇ B such that j ∈ B′ and B j ⊆ B′, we have B′ j = ∅. Consequences: if the original test fails for intent B, attribute j ∈ B and D = (B ∪ {j})↓↑, then it fails for any intent B′ ⊇ B and D′ = (B′ ∪ {j})↓↑ provided that j ∈ B′ and D ∩ {0, 1, . . . , j − 1} B′ ∩ {0, 1, . . . , j − 1} closure D′ = (B′ ∪ {j})↓↑ need not be computed (!!)

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 8 / 21

slide-9
SLIDE 9

Example (Pruned CbO Tree)

5 4 3 1 2

  • B8
  • B9
  • B8

B10, 2 B2, 1 B1, 0 B12, 3

1 2 3 4 5 × × × × × × × × × × × × ×

M3 = M5 = {0, 2, 3, 4, 5} = B8; M4 = {0, 4} = B9

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 9 / 21

slide-10
SLIDE 10

Example (Pruned CbO Tree)

5 3 4 2 1 5 4 3 1 2

  • B8
  • B8
  • B8
  • B9
  • B8

B9, 5 B7, 3 B3, 2 B10, 2 B2, 1 B1, 0 B12, 3

1 2 3 4 5 × × × × × × × × × × × × ×

M3 = M5 = {0, 2, 3, 4, 5} = B8; M4 = {0, 4} = B9 2 ∈ M3 and 2 ∈ B2 = {0} = ⇒ test fails for j = 3

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 9 / 21

slide-11
SLIDE 11

Example (Pruned CbO Tree)

5 4 3 5 4 3 5 5 5 5 5 3 5 3 5 4 3 4 4 5 4 3 2 3 3 4 2 4 2 1 1 2

  • B8
  • B9
  • B8
  • B8
  • B8
  • B8

B12, 3

  • B5
  • B6
  • B5
  • B5
  • B5
  • B5

B11, 3 B10, 2

  • B8
  • B8

B9, 5

  • B8
  • B8
  • B8

B8, 4 B7, 3

  • B5
  • B5

B6, 5

  • B5
  • B5
  • B5

B5, 4 B4, 3 B3, 2 B2, 1 B1, 0

1 2 3 4 5 × × × × × × × × × × × × ×

M3 = M5 = {0, 2, 3, 4, 5} = B8; M4 = {0, 4} = B9 2 ∈ M3 and 2 ∈ B2 = {0} = ⇒ test fails for j = 3

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 9 / 21

slide-12
SLIDE 12

Procedure FastGenerateFrom

addtional input: (pointers to) sets of attributes {Ny | y ∈ Y } (initially empty) procedure FastGenerateFrom(A, B, y, {Ny | y ∈ Y })

  • utput A, B

if B = Y then for j from y upto n do Mj ← Nj if j ∈ B and Nj ∩ {0, 1, . . . , j − 1} ⊆ B ∩ {0, 1, . . . , j − 1} then C ← A ∩ {y}↓ D ← C↑ if D ∩ {0, 1, . . . , j − 1} = B ∩ {0, 1, . . . , j − 1} then put C, D, j to queue else Mj ← D while get C, D, j from queue call FastGenerateFrom(C, D, j + 1, {My | y ∈ Y })

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 10 / 21

slide-13
SLIDE 13

Soundness, Complexity and Relationship to Other Algorithms

soundness: procedure FastGenerateFrom is sound can be proved using unique derivations (paths in FCbO tree) complexity: polynomial time-delay O(|X| · |Y |3) relationship to other algorithms: lists concepts in the same order as CbO (FCbO = CbO in the worst case) lists as NextClosure (but faster) if the main loop is “from n downto y” Outrata, Vychodil: Fast algorithm for computing fixpoints of Galois connections induced by object-attribute relational data (in preparation).

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 11 / 21

slide-14
SLIDE 14

Performance evaluation I

Test Conditions: algorithms: CbO, FCbO performance indicator: computed closures Results: concepts closures closures ratio ratio (CbO) (FCbO) (CbO) (FCbO) mushroom 238,710 4,006,498 426,563 5.9 % 55.9 %

  • anon. web

129,009 27,949,552 1,475,341 0.4 % 8.7 % debian tags 38,977 12,045,680 679,911 0.3 % 5.7 % tit-tac-toe 59,505 221,608 128,434 26.8 % 46.3 %

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 12 / 21

slide-15
SLIDE 15

Performance evaluation II

Test Conditions: algorithms: CbO, FCbO, NextClosure performance indicator: computation time in seconds Results: mushroom tic-tac-toe debian tags

  • anon. web

size 8,124×119 958×29 14,315× 475 32,710×295 density 19 % 34 % < 1 % 1 % FCbO 0.23 0.02 0.10 0.15 CbO 4.34 0.06 5.31 27.14 NextClosure 685.00 1.86 1,432.25 8,236.85

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 13 / 21

slide-16
SLIDE 16

Parallelization of FCbO (and CbO)

Computation in Three Stages:

1

initialization (serial computation):

goes through CbO/FCbO tree up to depth level L, concepts at level L are enqueued.

2

parallel execution

create k processes P1, . . . , Pk (threads of execution), concepts from the queue are distributed among P1, . . . , Pk, P1, . . . , Pk compute disjoint sets of formal concepts (in parallel).

3

termination

wait for all processes (trivial synchronizaton) and terminate.

Issues: How to chose L and k? (hints in AMAI paper) How to distribute concepts into k queues? (not discussed before)

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 14 / 21

slide-17
SLIDE 17

Workload Distribution

1

round-robin (original approach, see AMAI paper)

concepts are distributed to queues attached to each processor n-th concept is placed into queuer where r = (n mod k) + 1

2

zig-zag – modification of round-robin

r = min

  • n mod z, z − (n mod z)
  • + 1 and z = 2 × k + 1

3

blocks – assigning blocks of consecutive concepts

the queue of all concepts is divided into k chunks of (approximately) equal size each block is assigned to a single processor

4

fair – idle processors get assigned concepts

single shared queue of concepts processors get concepts from the queue on demand pros: achieves best utilization of processors (in theory) cons: additional synchronization needed (overhead)

5

random – random assignment of processors

considered for testing purposes

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 15 / 21

slide-18
SLIDE 18

Performace under various workload distributions

Test Conditions: algorithm: parallel FCbO; processors: 8 performance indicator: computation time in seconds Results: round-r. blocks fair zig-zag random debian tags 0.0974 0.0988 0.0938 0.0984 0.0986

  • anon. web

0.1518 0.1590 0.1500 0.1528 0.1522 mushroom 0.1772 0.2158 0.1550 0.1788 0.1820 tic-tac-toe 0.0172 0.0198 0.0168 0.0174 0.0180 random 5000×100; 10% 0.0806 0.1194 0.0796 0.0820 0.0876 random 10000×100; 15% 1.1380 2.1326 0.8698 1.0974 1.1670

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 16 / 21

slide-19
SLIDE 19

Data Preprocessing Issues

Does the order of attributes matter?

Notion (permutation resistance)

Algorithm is permutation resistant whenever all isomorphic copies of X, Y, I with Y = {0, 1, . . . , n} require the same number of elementary computation steps (computation of a single fixpoint of ↑I and ↓I) in order to compute all concepts. Lindig’s algorithm (UpperNeighbor) is permutation resistant CbO, FCbO are not permutation resistant

  • rdered formal context – a formal context X, Y, I where

|{0}↓I| ≤ |{1}↓I| ≤ · · · ≤ |{n}↓I|

Fu H., Mephu Nguifo E.: A Parallel Algorithm to Generate Formal Concepts for Large Data. In: Proc. ICFCA 2004, LNCS 2961, pp. 394–401.

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 17 / 21

slide-20
SLIDE 20

Computing Formal Concepts in Ordered Contexts

Theorem (canonicity test in ordered contexts)

Let X, Y, I be an ordered formal context where Y = {0, . . . , n} and {a}↓I = {b}↓I for any a, b ∈ Y . Then for each j ∈ Y such that j ∈ ∅↓I↑I, ∅↓I↑I ∩ {0, . . . , j − 1} = {j}↓I↑I ∩ {0, . . . , j − 1}. Notes: under the conditions of Theorem 1 (pairwise distinct columns):

canonicity test succeeds in the first level (of CbO / FCbO tree) beneficial for parallel variants (sufficient to put L = 1)

general tendency (observed experimetally):

  • rdered contexts −

→ reduced number of computed closures increasing number of inversions − → increasing number of computed closures (inversion: a pair y1, y2 ∈ Y × Y such that |{y1}↓I| ≤ |{y2}↓I|)

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 18 / 21

slide-21
SLIDE 21

Reduction of computed closures by input data ordering

Test Conditions: algorithms: FCbO with ordered / unordered input data performance indicator: computed closures Results: concepts closures closures ratio ratio (unordered) (ordered) (unordered) (ordered) mushroom 238,710 426,563 299,201 55.9 % 79.7 %

  • anon. web

129,009 1,475,341 398,147 8.7 % 32.4 % debian tags 38,977 679,911 298,641 5.7 % 13.0 % tit-tac-toe 59,505 128,434 89,930 46.3 % 66.1 %

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 19 / 21

slide-22
SLIDE 22

Impact of the number of inversions

0.0 · 100 2.0 · 106 4.0 · 106 6.0 · 106 8.0 · 106 1.0 · 107 1.2 · 107 1.4 · 107 1.6 · 107 1000 2000 3000 4000 5000 6000 7000 computed closures number of inversions Close-by-One Fast Close-by-One

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 20 / 21

slide-23
SLIDE 23

Availability

Implementation: ANSI C binaries for various platforms (i686, x86 64, sparc64) and OS (linux, windows) hybrid data representation: pointer lists (extents), bitarrays (intents) test machine: Apple MacPro (Intel Xeon, 2.8 GHz, 16 GB RAM) collection of stand-alone programs, available at:

http:/ /fcalgs.sourceforge.net

Krajca, Outrata, Vychodil (UP Olomouc; CZ) Advances in algorithms based on CbO CLA 2010 21 / 21