Advanced Mul,media Text Classifica,on Tamara Berg Slide - PowerPoint PPT Presentation

Advanced ¡Mul,media ¡ Text ¡Classifica,on ¡ Tamara ¡Berg ¡

Slide ¡from ¡Dan ¡Klein ¡

Today! ¡ Slide ¡from ¡Dan ¡Klein ¡

What ¡does ¡categoriza,on/classifica,on ¡ mean? ¡

Slide ¡from ¡Min-‑Yen ¡Kan ¡

hFp://yann.lecun.com/exdb/mnist/index.html ¡ Slide ¡from ¡Dan ¡Klein ¡

Slide ¡from ¡Min-‑Yen ¡Kan ¡

• Machine ¡Learning ¡-‑ ¡how ¡to ¡select ¡a ¡model ¡on ¡ the ¡basis ¡of ¡data ¡/ ¡experience ¡ ¡ ¡Learning ¡parameters ¡(e.g. ¡probabili,es) ¡ ¡ ¡Learning ¡structure ¡(e.g. ¡dependencies) ¡ ¡ ¡Learning ¡hidden ¡concepts ¡(e.g. ¡clustering) ¡ Slide ¡from ¡Min-‑Yen ¡Kan ¡

Classifiers ¡ • Today ¡we’ll ¡talk ¡about ¡2 ¡simple ¡kinds ¡of ¡ classifiers ¡ – Nearest ¡Neighbor ¡Classifier ¡ – Naïve ¡Bayes ¡Classifier ¡

Document ¡Vectors ¡

Document ¡Vectors ¡ • Represent ¡document ¡as ¡a ¡“bag ¡of ¡words” ¡

Example ¡ • Doc1 ¡= ¡“the ¡quick ¡brown ¡fox ¡jumped” ¡ • Doc2 ¡= ¡“brown ¡quick ¡jumped ¡fox ¡the” ¡

Example ¡ • Doc1 ¡= ¡“the ¡quick ¡brown ¡fox ¡jumped” ¡ • Doc2 ¡= ¡“brown ¡quick ¡jumped ¡fox ¡the” ¡ Would ¡a ¡bag ¡of ¡words ¡model ¡represent ¡these ¡ two ¡documents ¡differently? ¡ ¡

Document ¡Vectors ¡ • Documents ¡are ¡represented ¡as ¡“bags ¡of ¡words” ¡ • Represented ¡as ¡vectors ¡when ¡used ¡computa8onally ¡ • Each ¡vector ¡holds ¡a ¡place ¡for ¡every ¡term ¡in ¡the ¡collec,on ¡ • Therefore, ¡most ¡vectors ¡are ¡sparse ¡ Slide ¡from ¡Mitch ¡Marcus ¡

Document ¡Vectors ¡ • Documents ¡are ¡represented ¡as ¡“bags ¡of ¡words” ¡ • Represented ¡as ¡vectors ¡when ¡used ¡computa8onally ¡ • Each ¡vector ¡holds ¡a ¡place ¡for ¡every ¡term ¡in ¡the ¡collec,on ¡ • Therefore, ¡most ¡vectors ¡are ¡sparse ¡ Lexicon ¡– ¡the ¡vocabulary ¡set ¡that ¡you ¡consider ¡to ¡be ¡valid ¡ words ¡in ¡your ¡documents. ¡ ¡ ¡Usually ¡stemmed ¡(e.g. ¡running-‑>run) ¡ Slide ¡from ¡Mitch ¡Marcus ¡

Document ¡Vectors: ¡ One ¡loca,on ¡for ¡each ¡word. ¡ ¡ nova galaxy heat h’wood film role diet fur A A 10 5 3 B B 5 10 C C 10 8 7 D D 9 10 5 “Nova” ¡occurs ¡10 ¡,mes ¡in ¡text ¡A ¡ E E 10 10 “Galaxy” ¡occurs ¡5 ¡,mes ¡in ¡text ¡A ¡ “Heat” ¡occurs ¡3 ¡,mes ¡in ¡text ¡A ¡ F F 9 10 (Blank ¡means ¡0 ¡occurrences.) ¡ G G 5 7 9 H H 6 10 2 8 I I 7 5 1 3 Slide ¡from ¡Mitch ¡Marcus ¡

Document ¡Vectors ¡ ¡ Document ids nova galaxy heat h’wood film role diet fur A A 10 5 3 B B 5 10 C C 10 8 7 D D 9 10 5 E E 10 10 F F 9 10 G G 5 7 9 H H 6 10 2 8 I I 7 5 1 3 Slide ¡from ¡Mitch ¡Marcus ¡

Vector ¡Space ¡Model ¡ • Documents ¡are ¡represented ¡as ¡ vectors ¡in ¡term ¡space ¡ • Terms ¡are ¡usually ¡stems ¡ • Documents ¡represented ¡by ¡vectors ¡of ¡terms ¡ • A ¡vector ¡distance ¡measures ¡similarity ¡between ¡documents ¡ ¡ • Document ¡similarity ¡is ¡based ¡on ¡length ¡and ¡direc,on ¡of ¡their ¡vectors ¡ • Terms ¡in ¡a ¡vector ¡can ¡be ¡“weighted” ¡in ¡many ¡ways ¡ Slide ¡from ¡Mitch ¡Marcus ¡

Document ¡Vectors ¡ ¡ Document ids nova galaxy heat h’wood film role diet fur A A 10 5 3 B B 5 10 C C 10 8 7 D D 9 10 5 E E 10 10 F F 9 10 G G 5 7 9 H H 6 10 2 8 I I 7 5 1 3 Slide ¡from ¡Mitch ¡Marcus ¡

Similarity ¡between ¡documents ¡ A ¡= ¡[10 ¡5 ¡3 ¡0 ¡0 ¡0 ¡0 ¡0]; ¡ G ¡= ¡[5 ¡0 ¡7 ¡0 ¡0 ¡9 ¡0 ¡0]; ¡ E ¡= ¡[0 ¡0 ¡0 ¡0 ¡0 ¡10 ¡10 ¡0]; ¡

Similarity ¡between ¡documents ¡ A ¡= ¡[10 ¡ ¡5 ¡ ¡3 ¡ ¡0 ¡ ¡ ¡0 ¡ ¡ ¡ ¡ ¡0 ¡ ¡ ¡ ¡0 ¡ ¡ ¡0]; ¡ G ¡= ¡[ ¡ ¡5 ¡ ¡0 ¡ ¡7 ¡ ¡0 ¡ ¡ ¡0 ¡ ¡ ¡ ¡ ¡9 ¡ ¡ ¡ ¡0 ¡ ¡ ¡0]; ¡ E ¡= ¡ ¡[ ¡ ¡0 ¡ ¡0 ¡ ¡0 ¡ ¡0 ¡ ¡ ¡0 ¡ ¡10 ¡ ¡10 ¡ ¡ ¡0]; ¡ Treat ¡the ¡vectors ¡as ¡binary ¡= ¡number ¡of ¡words ¡in ¡ common. ¡ ¡ Sb(A,G) ¡= ¡? ¡ Sb(A,E) ¡= ¡? ¡ Sb(G,E) ¡= ¡? ¡ Which ¡pair ¡of ¡documents ¡are ¡the ¡most ¡similar? ¡

Similarity ¡between ¡documents ¡ A ¡= ¡[10 ¡5 ¡3 ¡0 ¡0 ¡0 ¡0 ¡0]; ¡ G ¡= ¡[5 ¡0 ¡7 ¡0 ¡0 ¡9 ¡0 ¡0]; ¡ E ¡= ¡[0 ¡0 ¡0 ¡0 ¡0 ¡10 ¡10 ¡0]; ¡ n Sum ¡of ¡Squared ¡Distances ¡(SSD) ¡= ¡ ¡ ∑ − Y i ) 2 ( X i i = 1 SSD(A,G) ¡= ¡? ¡ SSD(A,E) ¡= ¡? ¡ SSD(G,E) ¡= ¡? ¡

Similarity ¡between ¡documents ¡ A ¡= ¡[10 ¡5 ¡3 ¡0 ¡0 ¡0 ¡0 ¡0]; ¡ G ¡= ¡[5 ¡0 ¡7 ¡0 ¡0 ¡9 ¡0 ¡0]; ¡ E ¡= ¡[0 ¡0 ¡0 ¡0 ¡0 ¡10 ¡10 ¡0]; ¡ a ⋅ b Angle ¡between ¡vectors: ¡Cos(θ) ¡= ¡ ¡ a b Dot ¡Product: ¡ a 2 1 + a 2 2 + ... + a 2 Length ¡(Euclidean ¡norm): ¡ a = n

Some ¡words ¡give ¡more ¡informa,on ¡ than ¡others ¡ • Does ¡the ¡fact ¡that ¡two ¡documents ¡both ¡ contain ¡the ¡word ¡“the” ¡tell ¡us ¡anything? ¡How ¡ about ¡“and”? ¡Stop ¡words ¡(noise ¡words): ¡ Words ¡that ¡are ¡probably ¡not ¡useful ¡for ¡ processing. ¡Filtered ¡out ¡before ¡natural ¡ language ¡is ¡applied. ¡ • Other ¡words ¡can ¡be ¡more ¡or ¡less ¡informa,ve. ¡ ¡ No ¡defini,ve ¡list ¡but ¡might ¡include ¡things ¡like: ¡ ¡ hFp://www.dcs.gla.ac.uk/idom/ir_resources/linguis,c_u,ls/stop_words ¡

Vector ¡Space ¡Model ¡ • Documents ¡are ¡represented ¡as ¡ vectors ¡in ¡term ¡space ¡ • Terms ¡are ¡usually ¡stems ¡ • Documents ¡represented ¡by ¡vectors ¡of ¡terms ¡ • A ¡vector ¡distance ¡measures ¡similarity ¡between ¡documents ¡ ¡ • Document ¡similarity ¡is ¡based ¡on ¡length ¡and ¡direc,on ¡of ¡their ¡vectors ¡ • Terms ¡in ¡a ¡vector ¡can ¡be ¡“weighted” ¡in ¡many ¡ways ¡ Slide ¡from ¡Mitch ¡Marcus ¡

Advanced Mul,media Text Classifica,on Tamara Berg Slide - PowerPoint PPT Presentation

Advanced Mul,media Text Classifica,on Tamara Berg Slide from Dan Klein Slide from Dan Klein Today! Slide from Dan Klein What does

Mul&lingualism @ ECUAD Debora O & Tara Wren

Mul$-Object Synchroniza$on Mul$-Object Programs What happens

Mul$-Object Synchroniza$on Mul$-Object Programs What happens

Presentation 1 What is social media? Get Media Smart social media 2 What is social media?

Mul$media Techniques in Android Some of the informa$on in

Outline Mul$lingualism and Aphasia Defini,on of mul,lingualism

A RARE PRESENTATION OF UNUSUAL BENIGN MUL A RARE PRESENTATION OF UNUSUAL BENIGN MUL TICYSTIC

Outline Mul$lingualism and Aphasia Defini.on of mul.lingualism

Mul$ple'pes$cide'exposures'and' the'risk'of'mul$ple'myeloma'in' Canadian'men'

Mul$SE: Mul$-Path Symbolic Execu$on using Value Summaries

Recogni(on of Mul(-Oriented, Mul(-Sized, and Curved Text

Mul$pac$ng simula$on of MICE 201 MHz cavity Tianhuan Luo

Mul$ channel mul$ple sca.ering theory for X-ray absorp$on

Mul$modal Interfaces Shiri Azenkot May 29, 2013 LNG 575

New Media Production 2 MUMT 303 Week 1 Sven-Amin Lembke What is new media? What is OLD media?

Mul$lingual web- based communica$on solu$ons for the

Measuring and Understanding IPTV Networks Colin Perkins http://csperkins.org/ Martin Ellis

Regret-equality in Stable Marriage Frances Cooper Joint work with: Prof David Manlove 1 Outline

Maximum Entropy Grammar Brandon Prickett and Joe Pater University of Massachusetts Amherst 27 th

Recruitment Fees Consultation Workshop 2 1 February 2018 Housekeeping

GLA EUROPEAN SOCIAL FUND (ESF) CO- FINANCING TEAM Helen Stonelake Project Manager, Skills,

Social Media Computing Lecture 2: Text Processing Lecturer: Aleksandr Farseev E-mail:

M I L T O N G L A S E R 1 9 2 9 . . . COMPUTERS ARE TO DESIGN AS MICROWAVES ARE TO

RE T AIL MARKE T RE VIE W Da vid Ma c hupa Vic e Pre side nt, Re ta il Se rvic e s

Advanced Mul,media Text Classifica,on Tamara Berg Slide - PowerPoint PPT Presentation

Advanced Mul,media Text Classifica,on Tamara Berg Slide from Dan Klein Slide from Dan Klein Today! Slide from Dan Klein What does

Mul&amp;lingualism @ ECUAD Debora O &amp; Tara Wren

Mul$-Object Synchroniza$on Mul$-Object Programs What happens

Mul$-Object Synchroniza$on Mul$-Object Programs What happens

Presentation 1 What is social media? Get Media Smart social media 2 What is social media?

Mul$media Techniques in Android Some of the informa$on in

Outline Mul$lingualism and Aphasia Defini,on of mul,lingualism

A RARE PRESENTATION OF UNUSUAL BENIGN MUL A RARE PRESENTATION OF UNUSUAL BENIGN MUL TICYSTIC

Outline Mul$lingualism and Aphasia Defini.on of mul.lingualism

Mul$ple'pes$cide'exposures'and' the'risk'of'mul$ple'myeloma'in' Canadian'men'

Mul$SE: Mul$-Path Symbolic Execu$on using Value Summaries

Recogni(on of Mul(-Oriented, Mul(-Sized, and Curved Text

Mul$pac$ng simula$on of MICE 201 MHz cavity Tianhuan Luo

Mul$ channel mul$ple sca.ering theory for X-ray absorp$on

Mul$modal Interfaces Shiri Azenkot May 29, 2013 LNG 575

New Media Production 2 MUMT 303 Week 1 Sven-Amin Lembke What is new media? What is OLD media?

Mul$lingual web- based communica$on solu$ons for the

Measuring and Understanding IPTV Networks Colin Perkins http://csperkins.org/ Martin Ellis

Regret-equality in Stable Marriage Frances Cooper Joint work with: Prof David Manlove 1 Outline

Maximum Entropy Grammar Brandon Prickett and Joe Pater University of Massachusetts Amherst 27 th

Recruitment Fees Consultation Workshop 2 1 February 2018 Housekeeping

GLA EUROPEAN SOCIAL FUND (ESF) CO- FINANCING TEAM Helen Stonelake Project Manager, Skills,

Social Media Computing Lecture 2: Text Processing Lecturer: Aleksandr Farseev E-mail:

M I L T O N G L A S E R 1 9 2 9 . . . COMPUTERS ARE TO DESIGN AS MICROWAVES ARE TO

RE T AIL MARKE T RE VIE W Da vid Ma c hupa Vic e Pre side nt, Re ta il Se rvic e s

Mul&lingualism @ ECUAD Debora O & Tara Wren