AUTOMATIC CLASSIFICATION: NAVE BAYES WM&R 2019/20 2 U NITS R. - PowerPoint PPT Presentation

1 AUTOMATIC CLASSIFICATION: NAÏVE BAYES WM&R 2019/20 – 2 U NITS R. Basili ( many slides borrowed by: H. Schutze) Università di Roma “Tor Vergata ” Email: basili@info.uniroma2.it

2 Summary • The nature of probabilistic modeling • Probabilistic Algorithms for Automatic Classification (AC) • Naive Bayes classification • Two models: • Univariate Binomial (F IRST U NIT ) • Multinomial (Class Conditional Unigram Language Model) (S ECOND U NIT ) • Parameter estimation & Feature Selection

3 Motivation: is this spam ? From: "" <takworlld@hotmail.com> Subject: real estate is the only way... gem oalvgkay Anyone can buy real estate with no money down Stop paying rent TODAY ! There is no need to spend hundreds or even thousands for similar courses I am 22 years old and I have already purchased 6 properties using the methods outlined in this truly INCREDIBLE ebook. Change your life NOW ! ================================================= Click Below to order: http://www.wholesaledaily.com/sales/nmd.htm =================================================

4 Categorization/Classification • Given: • A description of an instance, x  X , where X is the instance language or instance space . • Issue: how to represent text documents. • A fixed set of categories: C = { c 1 , c 2 ,…, c n } • Determine: • The category of x : c ( x )  C(or 2 C ), where c ( x ) is a categorization function whose domain is X that correspond to the classe(s) of C suitable for x . • Learning problem: • We want to know how to build the categorization function c (“classifier”).

5 Document Classification “ Artificial Intelligence in the Path Planning Optimization of Mobile Agent Navigation”n Test Data: (AI) (Programming) (HCI) Classes: ML P LANNING S EMANTICS G ARB .C OLL . M ULTIMEDIA GUI learning planning programming garbage Training ... ... Data (bow): intelligence temporal semantics collection algorithm reasoning language memory reinforcement plan proof... optimization network... language... region... (Note: in real life there is often a hierarchy; and you may get papers on ML approaches to Garb. Coll., i.e. c is a multiclassificatio function)

6 Text Categorization tasks: examples • Labels are most often topics such as Yahoo-categories • e.g., " finance " " sports " " news>world>asia>business " • Labels may be genres • e.g., "editorials" "movie- reviews" "news“ • Labels may be opinion (as in Sentiment Analysis) • e.g., “like”, “hate”, “neutral” • Labels may be domain-specific binary • e.g., "interesting-to-me" : "not-interesting-to- me”, “spam” : “not - spam”, “contains adult language” :“doesn’t”, “is a fake” :“it i sn’t”

7 Text Classification approaches • Manual classification • Used by Yahoo!, Looksmart, about.com, ODP, Medline • Very accurate when job is done by experts • Consistent when the problem size and team is small • Difficult and expensive to scale • Usually, basic rules are adopted by the editors wrt: • Lexical items (i.e. words or proper nouns) • Metadata (e.g. original writing time of the document, author, ….) • Sources (e.g. the originating organization, e.g. a sector specific newspaper, or a social network) • Integration of different criteria

8 Autoatic Classification Methods • Automatic document classification better scales with the text volumes (e.g. user generated contents in s social media) • Hand-coded rule-based systems • One technique used by CS dept’s spam filter, Reuters, CIA, Verity, … • e.g., assign category if document contains a given boolean combination of words • Standing queries: Commercial systems have complex query languages (everything in IR query languages + accumulators) • Accuracy is often very high if a rule has been carefully refined over time by a subject expert • Building and maintaining these rule bases is expensive

9 Classification Methods (2) • Supervised learning of a document-label assignment function • Many systems partly rely on machine learning (Autonomy, MSN, Yahoo!, Cortana), • Algorithmic variants can be: • k-Nearest Neighbors (simple, powerful) • Rocchio (geometry based, simple, effective) • Naive Bayes (simple, common method) • … • Support-vector machines and neural networks (very accurate) • No free lunch: requires hand-classified training data • Data can be also built up (and refined) by amateurs (crowdsourcing) • Note: many commercial systems use a mixture of methods!

10 10 Bayesian Methods • Learning and classification methods based on probability theory. • Bayes theorem plays a critical role in probabilistic learning and classification. • STEPS: • Build a generative model that approximates how data are produced • Use prior probability of each category when no information about an item is available. • Produce, during categorization, the posterior probability distribution over the possible categories given a description of an item

11 11 Bayes’ Rule • Given an instance X and a category C the probability P(C,X) can be used as a joint event:   P ( C , X ) P ( C | X ) P ( X ) P ( X | C ) P ( C ) • The following rule thus holds for every X and C : P ( X | C ) P ( C )  P ( C | X ) P ( X ) • What does P(X|C) means?

12 12 Maximum a posteriori Hypothesis  h argmax P ( h | X ) MAP  h H P ( X | h ) P ( h )   As P(X) is argmax constant P ( X )  h H  argmax P ( X | h ) P ( h ) h  H

13 13 Maximum likelihood Hypothesis If all hypotheses are a priori equally likely, we only need to consider the P ( D|h ) term:  h argmax P ( X | h ) ML  h H

14 14 Naive Bayes Classifiers Task : Classify a new instance document D based on a tuple of attribute values D=(x 1 , x 2 , …, x n ) into one of the classes c j  C 𝑑 𝑁𝐵𝑄 = argmax c j ∈ 𝐷 P Cj x 1 , x 2 , … , xn) = P(x 1 ,x 2 ,…,xn|c j )P(cj) = = argmax cj ∈ 𝐷 P(x 1 ,x 2 ,…,xn) = argmax cj ∈ 𝐷 P(x 1 , x 2 , … , xn|c j )P(cj)

15 15 Problems to be solved to apply Bayes • Determine the notion of document as the joint event D=(x 1 , x 2 , …, x n )=(x D 2 , …, x D 1 , x D n ) • Determine how x i is related to the document content • Determine how to estimate • P(C j ) for the different classes j=1, …., k i ) for the different properties/features i =1, …, n • P(x D • P( x D n | C j ) for the different tuples and classes 2 , …, x D 1 , x D • Define the law that select among the different P( C j | x D 2 , …, x D n ) j=1, …k 1 , x D • Argmax? Best m scores? Thresholds?

16 16 Problems to be solved to apply Bayes • Determine the notion of document as the joint event D=(x 1 , x 2 , …, x n )=(x D 2 , …, x D 1 , x D n ) • Determine how x i is related to the document content • Determine how to estimate • P(C j ) for the different classes j=1, …., k i ) for the different properties/features i =1, …, n • P(x D • P( x D n | C j ) for the different tuples and classes 2 , …, x D 1 , x D • Define the law that select among the different P( C j | x D 2 , …, x D n ) j=1, …k 1 , x D • Argmax? Best m scores? Thresholds?

17 17 Problems to be solved to apply Bayes • Determine the notion of document as the joint event D=(x 1 , x 2 , …, x n )=(x D 2 , …, x D 1 , x D n ) • Determine how x i is related to the document content • IDEA: use words and their direct occurrences, as «signals» for the content • Words are individual outcomes of the test of picking randomly one token from the text • Random variables X can be used such that x i represent X = word i • Multiple Occurrences of words in texts trigger several successfu tests for the same word word i ; they augment the probability P( x i )=P( X = word i )

18 18 Modeling the document content • Variables X provide a description of a document D as they correspond to the outcome of a test • D corresponds to the joint event of one unique picking of words word i from the vocabulary V, whose outcomes are • Present if word i occurrs in D • Not present if word i does not occur in D • It is a binary event , like a picking a white or black ball from a urn • The joint event is the «parallel» picking of the ball for every (urn, i.e.) word i in the dictionary, that is one urn per word is accessed • Notice how n (i.e. the number of features) here becomes the size | V | of the vocabulary V • Each feature x i models the presence or absence of word i in D, and can be written as X i =0 or X i =1 This is the basis for the so-called Multivariate binomial model!

19 19 Problems to be solved to apply Bayes • Determine the notion of document as the joint event D=(x1, x2, …, xn )=(xD1, xD2, …, xDn) • Determine how xi is related to the document content • Determine how to estimate • P(C j ) for the different classes j=1, …., k i ) for the different properties/features i =1, …, n • P(x D • P( x D n | C j ) for the different tuples and classes 2 , …, x D 1 , x D • Define the law that select among the different P( C j | x D 2 , …, x D n ) j=1, …k 1 , x D • Argmax? Best m scores? Thresholds?

AUTOMATIC CLASSIFICATION: NAVE BAYES WM&R 2019/20 2 U NITS R. - PowerPoint PPT Presentation

1 AUTOMATIC CLASSIFICATION: NAVE BAYES WM&R 2019/20 2 U NITS R. Basili ( many slides borrowed by: H. Schutze) Universit di Roma Tor Vergata Email: basili@info.uniroma2.it 2 Summary The nature of probabilistic modeling

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Automatic Classification of Automatic Classification of Audio Data Audio Data Carlos H. C.

Automatic text classification and extraction of Automatic text classification and extraction of

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Automatic Enrollment and Automatic IRAs David C. John The Heritage Foundation The Retirement

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

(a) Quantitative classification (b) Qualitative classification (c) Area classification (d) Simple

Classification Image Classification Set of predefined categories [eg: table, apple, dog, giraffe]

Classification 1 Classification: Basic Concepts and Methods Classification: Basic Concepts

Library of Congress Classification: Module 1.3 1 Library of Congress Classification: Module 1.3

Classification K-nearest neighbor classification D istance functions Choice of k Choice of k

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Management of Classification Lookup Files The basics of classification The basics of

Automatic classification of galaxy morphology with ZEST+. Mariano Ciccolini Institut fr

MEET-THE-PARENTS SESSION 19 Jan 2018 P4 Level Motto Be Kind! Spread Love! Communication 1st

CS 380: ARTIFICIAL INTELLIGENCE AI FOR GAMES 11/25/2013 Santiago Ontan santi@cs.drexel.edu

BEIS Non Domestic Smart Energy Management Innovation Competition Outline the background to the

Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web

B.Sc. in Veterinary Nursing Applicant information session 20th January 2016 Welcome to

What causes people to change? Write a response in the chat pod. Thomas Edison Danny Glover Agatha

A Larger Loopback Prefix for IPv6 Mark Smith markzzzsmith@yahoo.com.au 127/8 covers many

IP Version 6 The explosive growth of the Internet IPv4 address space, 32-bit Real-time