active learning by the naive credal classifier
play

Active Learning by the Naive Credal Classifier Alessandro Antonucci - PowerPoint PPT Presentation

Active Learning by the Naive Credal Classifier Alessandro Antonucci , Giorgio Corani , Sandra Gabaglio Istituto Dalle Molle di Studi sullIntelligenza Artificiale - Lugano (Switzerland) ISIN/SUPSI - Lugano (Switzerland)


  1. Active Learning by the Naive Credal Classifier Alessandro Antonucci ∗ , Giorgio Corani ∗ , Sandra Gabaglio † ∗ Istituto “Dalle Molle” di Studi sull’Intelligenza Artificiale - Lugano (Switzerland) † ISIN/SUPSI - Lugano (Switzerland) PGM’12 - Granada, September 20, 2012

  2. Active Learning Class C (values in C ) attributes A := ( A 1 , . . . , A k ) Training Dataset Test Set/Instance (supervised) (unsupervised) c ( 1 ) , a ( 1 ) 1 , . . . , a ( 1 ) ∗ , ˜ ∗ , ˜ a 1 , . . . , ˜ a 1 , . . . , ˜ a k a k k c ( 2 ) , a ( 2 ) 1 , . . . , a ( 2 ) k . . . c ( n ) , a ( n ) 1 , . . . , a ( n ) k

  3. Active Learning Class C (values in C ) attributes A := ( A 1 , . . . , A k ) Training Dataset Test Set/Instance (supervised) (unsupervised) c ( 1 ) , a ( 1 ) 1 , . . . , a ( 1 ) ∗ , ˜ ∗ , ˜ a 1 , . . . , ˜ a 1 , . . . , ˜ a k a k k c ( 2 ) , a ( 2 ) 1 , . . . , a ( 2 ) k Classifier . . . c ( n ) , a ( n ) 1 , . . . , a ( n ) k

  4. Active Learning Class C (values in C ) attributes A := ( A 1 , . . . , A k ) Training Dataset Active Set Test Set/Instance (unsupervised) (supervised) (unsupervised) c ( 1 ) , a ( 1 ) 1 , . . . , a ( 1 ) ∗ , ˜ ∗ , ˜ a 1 , . . . , ˜ a 1 , . . . , ˜ a k a k k ∗ , a ( a ) 1 , . . . , a ( a ) c ( 2 ) , a ( 2 ) 1 , . . . , a ( 2 ) k k Classifier ∗ , a ( b ) 1 , . . . , a ( b ) . . . k ∗ , a ( c ) 1 , . . . , a ( c ) c ( n ) , a ( n ) 1 , . . . , a ( n ) k k

  5. Active Learning Class C (values in C ) attributes A := ( A 1 , . . . , A k ) Training Dataset Active Set Test Set/Instance (unsupervised) (supervised) (unsupervised) c ( 1 ) , a ( 1 ) 1 , . . . , a ( 1 ) ∗ , ˜ ∗ , ˜ a 1 , . . . , ˜ a 1 , . . . , ˜ a k a k k ∗ , a ( a ) 1 , . . . , a ( a ) c ( 2 ) , a ( 2 ) 1 , . . . , a ( 2 ) .3 k k Classifier ∗ , a ( b ) 1 , . . . , a ( b ) .8 . . . k ∗ , a ( c ) 1 , . . . , a ( c ) c ( n ) , a ( n ) 1 , . . . , a ( n ) .5 k k Active Learning Score

  6. Active Learning Class C (values in C ) attributes A := ( A 1 , . . . , A k ) Training Dataset Active Set Test Set/Instance (unsupervised) (supervised) (unsupervised) c ( 1 ) , a ( 1 ) 1 , . . . , a ( 1 ) ∗ , ˜ ∗ , ˜ a 1 , . . . , ˜ a 1 , . . . , ˜ a k a k k ∗ , a ( a ) 1 , . . . , a ( a ) c ( 2 ) , a ( 2 ) 1 , . . . , a ( 2 ) .3 k k Classifier . . . ∗ , a ( c ) 1 , . . . , a ( c ) c ( n ) , a ( n ) 1 , . . . , a ( n ) .5 k k ∗ , a ( b ) 1 , . . . , a ( b ) .8 Active k Learning Score

  7. Active Learning Class C (values in C ) attributes A := ( A 1 , . . . , A k ) Training Dataset Active Set Test Set/Instance (unsupervised) (supervised) (unsupervised) c ( 1 ) , a ( 1 ) 1 , . . . , a ( 1 ) ∗ , ˜ ∗ , ˜ a 1 , . . . , ˜ a 1 , . . . , ˜ a k a k k ∗ , a ( a ) 1 , . . . , a ( a ) c ( 2 ) , a ( 2 ) 1 , . . . , a ( 2 ) .3 k k Classifier . . . ∗ , a ( c ) 1 , . . . , a ( c ) c ( n ) , a ( n ) 1 , . . . , a ( n ) .5 k k c b , a ( b ) 1 , . . . , a ( b ) Active k Learning Annotation Score

  8. Active Learning Class C (values in C ) attributes A := ( A 1 , . . . , A k ) Training Dataset Active Set Test Set/Instance (unsupervised) (supervised) (unsupervised) c ( 1 ) , a ( 1 ) 1 , . . . , a ( 1 ) ∗ , ˜ ∗ , ˜ a 1 , . . . , ˜ a 1 , . . . , ˜ a k a k k ∗ , a ( a ) 1 , . . . , a ( a ) c ( 2 ) , a ( 2 ) 1 , . . . , a ( 2 ) .3 k k Classifier . . . Actively Learned Classifier ∗ , a ( c ) 1 , . . . , a ( c ) c ( n ) , a ( n ) 1 , . . . , a ( n ) .5 k k (more accurate) c b , a ( b ) 1 , . . . , a ( b ) Active k Learning Annotation Score

  9. Accuracy Trajectories constant AL score: random pick among active set instances (variance error decreases, accuracy increases)

  10. Accuracy Trajectories constant AL score: random pick among active set instances (variance error decreases, accuracy increases) Accuracy (on test set) Training Set N N+d N+2d N+kd . . . Size Active Set Active Set FULL EMPTY

  11. Accuracy Trajectories constant AL score: random pick among active set instances (variance error decreases, accuracy increases) Accuracy (on test set) Random Pick Training Set N N+d N+2d N+kd . . . Size Active Set Active Set FULL EMPTY

  12. Accuracy Trajectories constant AL score: random pick among active set instances (variance error decreases, accuracy increases) Accuracy AL algorithm (on test set) Random Pick AL algs should do better! Training Set N N+d N+2d N+kd . . . Size Active Set Active Set FULL EMPTY

  13. Naive Classifiers C A k A 1 A 2 . . . Given class C , attributes independent NAIVE BAYES (NBC) NAIVE CREDAL (NCC) A BN quantified from data by BNs quantified by set of priors a flat Dirichlet prior Dir ( s t ) Imprecise Dirichlet model � � P ( c , a ) = P ( c ) · � m Dir ( s t ) : t > 0 , � i t i = 1 T ≡ i = 1 P ( a i | c ) A set C ∗ ⊆ C of optimal Given test instance a , assigns class c ∗ := arg max c ∈C P ( c | f ) (undominated) classes ∀ c ′ , c ′′ ∈ C , dominance test Conservative dominance test P t ( c ′ , a ) P ( c ′′ | a ) = P ( c ′ , a ) P ( c ′ | a ) min t ∈T P t ( c ′′ , a ) > 1 P ( c ′′ , a ) > 1

  14. Naive Classifiers C A k A 1 A 2 . . . Given class C , attributes independent NAIVE BAYES (NBC) NAIVE CREDAL (NCC) A BN quantified from data by BNs quantified by set of priors a flat Dirichlet prior Dir ( s t ) Imprecise Dirichlet model � � P ( c , a ) = P ( c ) · � m Dir ( s t ) : t > 0 , � i t i = 1 T ≡ i = 1 P ( a i | c ) A set C ∗ ⊆ C of optimal Given test instance a , assigns class c ∗ := arg max c ∈C P ( c | f ) (undominated) classes ∀ c ′ , c ′′ ∈ C , dominance test Conservative dominance test P t ( c ′ , a ) P ( c ′′ | a ) = P ( c ′ , a ) P ( c ′ | a ) min t ∈T P t ( c ′′ , a ) > 1 P ( c ′′ , a ) > 1

  15. Naive Classifiers C A k A 1 A 2 . . . Given class C , attributes independent NAIVE BAYES (NBC) NAIVE CREDAL (NCC) A BN quantified from data by BNs quantified by set of priors a flat Dirichlet prior Dir ( s t ) Imprecise Dirichlet model � � P ( c , a ) = P ( c ) · � m Dir ( s t ) : t > 0 , � i t i = 1 T ≡ i = 1 P ( a i | c ) A set C ∗ ⊆ C of optimal Given test instance a , assigns class c ∗ := arg max c ∈C P ( c | f ) (undominated) classes ∀ c ′ , c ′′ ∈ C , dominance test Conservative dominance test P t ( c ′ , a ) P ( c ′′ | a ) = P ( c ′ , a ) P ( c ′ | a ) min t ∈T P t ( c ′′ , a ) > 1 P ( c ′′ , a ) > 1

  16. Naive Classifiers C A k A 1 A 2 . . . Given class C , attributes independent NAIVE BAYES (NBC) NAIVE CREDAL (NCC) A BN quantified from data by BNs quantified by set of priors a flat Dirichlet prior Dir ( s t ) Imprecise Dirichlet model � � P ( c , a ) = P ( c ) · � m Dir ( s t ) : t > 0 , � i t i = 1 T ≡ i = 1 P ( a i | c ) A set C ∗ ⊆ C of optimal Given test instance a , assigns class c ∗ := arg max c ∈C P ( c | f ) (undominated) classes ∀ c ′ , c ′′ ∈ C , dominance test Conservative dominance test P t ( c ′ , a ) P ( c ′′ | a ) = P ( c ′ , a ) P ( c ′ | a ) min t ∈T P t ( c ′′ , a ) > 1 P ( c ′′ , a ) > 1

  17. Uncertainty Samplings AL score ( a ) shows how hard-to-classify an instance is difficult/ambiguous instances give better contribution to learning Uncertainty Sampling Credal Uncertainty Sampling Based on NBC Set of NCC posteriors P ( C | a ) posterior P ( C | a ) The weaker the dominances, the The smaller the more hard-to-classify instances probability of the most If C = { c ′ , c ′′ } (binary class): probable class, the score ( a ) ≡ more hard-to-classify � P ( c ′ | a ) P ( c ′′ | a ) � − max min t P ( c ′′ | a ) , min t is the instance P ( c ′ | a ) More than two classes? score ( a ) ≡ − P ( c ∗ | a ) Max over all pairs ( c ′ , c ′′ ) ∈ C 2

  18. Uncertainty Samplings AL score ( a ) shows how hard-to-classify an instance is difficult/ambiguous instances give better contribution to learning Uncertainty Sampling Credal Uncertainty Sampling Based on NBC Set of NCC posteriors P ( C | a ) posterior P ( C | a ) The weaker the dominances, the The smaller the more hard-to-classify instances probability of the most If C = { c ′ , c ′′ } (binary class): probable class, the score ( a ) ≡ more hard-to-classify � P ( c ′ | a ) P ( c ′′ | a ) � − max min t P ( c ′′ | a ) , min t is the instance P ( c ′ | a ) More than two classes? score ( a ) ≡ − P ( c ∗ | a ) Max over all pairs ( c ′ , c ′′ ) ∈ C 2

  19. Uncertainty Samplings AL score ( a ) shows how hard-to-classify an instance is difficult/ambiguous instances give better contribution to learning Uncertainty Sampling Credal Uncertainty Sampling Based on NBC Set of NCC posteriors P ( C | a ) posterior P ( C | a ) The weaker the dominances, the The smaller the more hard-to-classify instances probability of the most If C = { c ′ , c ′′ } (binary class): probable class, the score ( a ) ≡ more hard-to-classify � P ( c ′ | a ) P ( c ′′ | a ) � − max min t P ( c ′′ | a ) , min t is the instance P ( c ′ | a ) More than two classes? score ( a ) ≡ − P ( c ∗ | a ) Max over all pairs ( c ′ , c ′′ ) ∈ C 2

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend