merging classifiers of different classification approaches
play

Merging Classifiers of Different Classification Approaches - PowerPoint PPT Presentation

Merging Classifiers of Different Classification Approaches Incremental Classification, Concept Drift and Novelty Detection Workshop Antonina Danylenko 1 and Welf L owe 1 antonina.danylenko@lnu.se 14 December, 2014 1 Linnaeus University,


  1. Merging Classifiers of Different Classification Approaches Incremental Classification, Concept Drift and Novelty Detection Workshop Antonina Danylenko 1 and Welf L¨ owe 1 antonina.danylenko@lnu.se 14 December, 2014 1 Linnaeus University, Sweden Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 1(28)

  2. Agenda ◮ Introduction; ◮ Problem, Motivation, Approach; ◮ Decision Algebra; ◮ Merge as an Operation of Decision Algebra; ◮ Merging Classifiers; ◮ Experiments; ◮ Conclusions. Agenda Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 2(28)

  3. Introduction ◮ Classification is a common problem that arises in different fields of Computer Science (data mining, information storage and retrieval, knowledge management); ◮ Classification approaches are often tightly coupled to: ◮ learning strategies: different algorithms are used; ◮ data structures: represent information in different ways; ◮ how common problems are addressed: workarounds; ◮ It is not that easy to select an appropriate classification model for classification problem (be aware of accuracy, robustness, scalability); Introduction Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 3(28)

  4. Problem and Motivation ◮ Simple combining of classifiers learned over different data sets of the same problem is not straightforward; ◮ Current work is done in aggregation and meta-learning: ◮ combine different classifiers learned over same data set; ◮ construct single classifier learned on the different variations of the same classification problem; ◮ as a result - do not take into account that the context can differ. ◮ Combining classifiers with partly- or completely- disjoint contexts use one single classification approach for base-level classifiers; ◮ Generality gets lost: incomparable, difficult benchmarking, hard to propagate advances between domains; Introduction Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 4(28)

  5. Proposed Approach ◮ Use Decision Algebra that defines classifiers as re-usable black-boxes in terms of so-called decision functions; ◮ Define a general merge operation over these decisions functions which allows for symbolic computations with classification information captured; ◮ Show an example of merging classifiers of different classification approaches; ◮ Show that the merger of classifiers tendentiously becomes more accurate; Introduction Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 5(28)

  6. Classification Information ◮ Classification information is a set of decision tuples: CI = { ( � a 1 , c 1 ) , . . . ( � a n , c n ) } a ∈ � ◮ It is complete if: ∀ � A : ( � a , c ) ∈ CI ; ◮ It is non-contradictive if: ∀ ( � a i , c i ) , ( � a j , c j ) ∈ CI : � a i = � a j ⇒ c i = c j ; ◮ Problem domain ( A , C ) of CI is a superset of � A × C , that defines the actual classification problem, where � A ∈ A ; Decision Algebra Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 6(28)

  7. Decision Function ◮ Decision Function is a representation of complete and possibly contradictive decision information: df : � A → D ( C ) a ∈ � maps actual context � A to a (probability) distribution D ( C ); ◮ It is a higher order (or curried) function: df n : A n → ( A n − 1 → ( . . . ( A 1 → ( → D ( C ))))); ◮ Can be easily represented as a decision tree or decision graph: df n = x 1 ( df n − 1 , . . . , df n − 1 | Λ 1 | ) 1 where Λ i is a domain of attribute A 1 Decision Algebra Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 7(28)

  8. Graph Representation of Decision Function ◮ Decision function df 2 = x 1 ( na , x 2 ( na , na , a , a ) , x 2 ( na , na , a , a ) , a ) na na a a na na a a na a high med vhigh high med low vhigh low na 2 2 a 2 high med low vhigh 1 1 Figur: A tree (left) and graph (right) representation of df 2 . Each node labeled with n represents a decision term with a selection operator x n ; each square leaf node labled with c corresponds to a probability distribution over classes C with c the most probable class. Decision Algebra Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 8(28)

  9. Decision Algebra ◮ ( DA ) is a theoretical framework that is defined as a parameterized specification, with � A and D ( C ) as parameters. It provides a general representation of classification information as an abstract classifier; Decision Algebra Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 9(28)

  10. Operations Over Decision Functions ◮ Constructor x n : x n : Λ 1 × DF [ � A ′ , D ] × · · · × Λ 1 × DF [ � A ′ , D ] → DF [ � A , D ] � �� � | Λ 1 | times ◮ Bind binds attribute A i to an attribute value a ∈ Λ i : DF [ � A , D ] × Λ i → DF [ � A ′ , D ] : bind A i ( x n ( a 1 , df 1 , · · · , a | Λ 1 | , df | Λ 1 | ) , a ) ≡ df i , if a = a i bind A 1 ( df 2 , high) = x 2 ( na , na , a , a ) bind A 1 ◮ Evert changes the order of attributes in the decision function: DF [ � A , D ] → DF [ � A ′ , D ] : evert A i evert A i ( df ) := x ( a 1 , bind A i ( df , a 1 ) , . . . , a | Λ i | , bind A i ( df , a | Λ i | )) evert A 2 ( df 2 ) x 2 ( x 1 ( na , na , na , a ) , x 1 ( na , na , na , a ) , = x 1 ( na , a , a , a ) , x 1 ( na , a , a , a )) Merge as an Operation of Decision Algebra Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 10(28)

  11. Merge Operation over Decision Functions ◮ Merge operator ⊔ D over class distribution D ( C ); ⊔ D : D ( C ) × D ( C ) → D ( C ) d ( C ) ⊔ D d ′ ( C ) = { ( c , p + p ′ ) | ( c , p ) ∈ d ( C ) , ( c , p ′ ) ∈ d ′ ( C ) } ◮ General merge operation over decision functions : ⊔ : DF 1 [ � A , D ] × DF 2 [ � A , D ] → DF ′ [ � A , D ] 0 ∈ DF ∅ [ { � 0 , df 2 ◮ Merge over constant decision functions df 1 0 } , D ]: ⊔ ( df 0 1 , df 0 x 0 ( ⊔ D ( df 0 1 , df 0 2 ) := 2 )) Merge as an Operation of Decision Algebra Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 11(28)

  12. Scenario One: Same Formal Context ◮ Prerequisite : The decision functions df 1 ∈ DF 1 [ � A , D ] and df 2 ∈ DF 2 [ � A ′ , D ] are constructed over different samples of the A ′ = Λ 1 × . . . × Λ n ; same problem domain and � A = � ⊔ ( df 1 , df 2 ) := x n ( a 1 , ⊔ ( bind A 1 ( df 1 , a 1 ) , bind A 1 ( df 2 , a 1 )) , . . . , a k , ⊔ ( bind A 1 ( df 1 , a k ) , bind A 1 ( df 2 , a k ))) Merging Classifiers Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 12(28)

  13. Scenario One: Cont’d na a low high vhigh med 2 a na med high high vhigh med vhigh 1 1 vg 1: if df 1 ∈ DF ∅ [ { � low low 0 } , D ] ∧ df 2 ∈ (a) (b) DF ∅ [ { � 0 } , D ] then 2: return x ( ⊔ D ( df 1 , df 2 )) na na, a 3: end if vhigh 4: for all a ∈ Λ 1 do high med 5: = df a na 2 low ⊔ ( bind 1 ( df 1 , a ) , bind 1 ( df 2 , a )) (c.1) (c.2) 6: end for 7: return na, a a x ( a 1 , df a 1 , . . . , a | Λ 1 | , df a | Λ1 | ) vhigh high med high med vhigh na 2 a low 2 med a, vg low low high 1 (c.3) (c.4) vhigh (c) (d) Merging Classifiers Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 13(28)

  14. Scenario Two: Disjoint Formal Contexts ◮ Prerequisite : The decision functions df 1 ∈ DF 1 [ � A , D ] and df 2 ∈ DF 2 [ � A ′ , D ] are constructed over samples with disjoint formal contexts of the same problem domain: � A = Λ 1 × . . . × Λ n and A ′ = Λ ′ � 1 × . . . × Λ ′ m and attributes { A 1 , . . . , A n } ∩ { A ′ 1 , . . . , A ′ m } = ∅ ; ⊔ ( df 1 , df 2 ) := x n ( a 1 , ⊔ ( bind A 1 ( df 1 , a 1 ) , bind A 1 ( df 2 , a 1 )) , . . . , a k , ⊔ ( bind A 1 ( df 1 , a k ) , bind A 1 ( df 2 , a k ))) ⊔ ( df 0 df 2 , df 0 1 , df 2 ) := ⊔ ( 1 ) Merging Classifiers Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 14(28)

  15. Scenario Two: Cont’d acc 2 5more 3 4 na g 3 acc 1: if df 1 ∈ DF ∅ [ { � low 0 } , D ] ∧ df 2 ∈ vhigh 2 high more 4 DF ∅ [ { � 0 } , D ] then 1 vg 4 med low 2: return x ( ⊔ D ( df 1 , df 2 )) 5 6 3: end if (a) (b) 4: if df 1 ∈ DF ∅ [ { � 0 } , D ] then acc 5: return ⊔ ( df 2 , df 1 )) 2 6: end if more 4 vg, acc 4 7: for all a ∈ Λ 1 do acc acc, g acc, na low low 8: df a = more more 6 6 more 2 2 ⊔ ( bind 1 ( df 1 , a ) , bind 1 ( df 2 , a )) 2 4 4 vg, g 4 vg, na 4 g vg 4 4 2 2 9: end for low low low low 5more 5more 3 3 10: return 6 3 6 6 3 6 4 4 low low x ( a 1 , df a 1 , . . . , a | Λ 1 | , df a | Λ1 | ) vhigh vhigh 1 1 high high med med 5 5 (c) (d) Merging Classifiers Department of Computer Science, Linnaues University Merging Classifiers of Different Classification Approaches 15(28)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend