Semi-Supervised Learning and Text Analysis Machine Learning 10-701 - PowerPoint PPT Presentation

Semi-Supervised Learning and Text Analysis Machine Learning 10-701 November 29, 2005 Tom M. Mitchell Carnegie Mellon University

Document Classification: Bag of Words Approach aardvark 0 about 2 all 2 Africa 1 apple 0 anxious 0 ... gas 1 ... oil 1 … Zaire 0

For code, see www.cs.cmu.edu/~tom/mlbook.html click on “Software and Data”

Supervised Training for Document Classification • Common algorithms: – Logistic regression, Support Vector Machines, Bayesian classifiers • Quite successful in practice – Email classification (spam, foldering, ...) – Web page classification (product description, publication, ...) – Intranet document organization • Research directions: – More elaborate, domain-specific classification models (e.g., for email) – Using unlabeled data too � semi-supervised methods

EM for Semi-supervised document classification

Using Unlabeled Data to Help Train Naïve Bayes Classifier Learn P(Y|X) Y Y X1 X2 X3 X4 1 0 0 1 1 0 0 1 0 0 0 0 0 1 0 ? 0 1 1 0 X 1 X 2 X 3 X 4 ? 0 1 0 1

From [Nigam et al., 2000]

w t is t-th word in vocabulary M Step: E Step:

Elaboration 1: Downweight the influence of unlabeled examples by factor λ Chosen by cross validation New M step:

Using one labeled example per class

20 Newsgroups

EM for Semi-Supervised Doc Classification • If all data is labeled, corresponds to Naïve Bayes classifier • If all data unlabeled, corresponds to mixture-of- multinomial clustering • If both labeled and unlabeled data, it helps if and only if the mixture-of-multinomial modeling assumption is correct • Of course we could extend this to Bayes net models other than Naïve Bayes (e.g., TAN tree)

Bags of Words, or Bags of Topics?

LDA: Generative model for documents [Blei, Ng, Jordan 2003] Also extended to case where number of topics is not known in advance (hierarchical Dirichlet processes – [Blei et al, 2004])

Clustering words into topics with Hierarchical Topic Models (unknown number of clusters) [Blei, Ng, Jordan 2003] Probabilistic model for generating document D: 1. Pick a distribution P(z| θ) of topics according to P (θ|α) 2. For each word w Pick topic z from P(z | θ ) • Pick word w from P(w |z, φ ) • Training this model defines topics (i.e., φ which defines P(W|Z))

Example topics induced from a large collection of text JOB SCIENCE BALL FIELD STORY DISEASE MIND WATER WORK STUDY GAME MAGNETIC WORLD STORIES BACTERIA FISH JOBS SCIENTISTS TEAM MAGNET DREAM TELL DISEASES SEA CAREER SCIENTIFIC FOOTBALL WIRE GERMS DREAMS CHARACTER SWIM EXPERIENCE KNOWLEDGE BASEBALL NEEDLE THOUGHT CHARACTERS FEVER SWIMMING EMPLOYMENT WORK PLAYERS CURRENT AUTHOR CAUSE IMAGINATION POOL OPPORTUNITIES RESEARCH PLAY COIL MOMENT READ CAUSED LIKE WORKING CHEMISTRY FIELD POLES THOUGHTS TOLD SPREAD SHELL TRAINING TECHNOLOGY PLAYER IRON SETTING VIRUSES OWN SHARK SKILLS MANY BASKETBALL COMPASS REAL TALES INFECTION TANK CAREERS MATHEMATICS COACH LINES PLOT VIRUS LIFE SHELLS POSITIONS BIOLOGY PLAYED CORE TELLING MICROORGANISMS IMAGINE SHARKS FIND FIELD PLAYING ELECTRIC SENSE SHORT PERSON DIVING POSITION PHYSICS HIT DIRECTION FICTION INFECTIOUS CONSCIOUSNESS DOLPHINS FIELD LABORATORY TENNIS FORCE STRANGE ACTION COMMON SWAM OCCUPATIONS STUDIES TEAMS MAGNETS FEELING TRUE CAUSING LONG REQUIRE WORLD GAMES BE EVENTS SMALLPOX WHOLE SEAL OPPORTUNITY SCIENTIST SPORTS MAGNETISM BEING TELLS BODY DIVE EARN STUDYING BAT POLE TALE INFECTIONS MIGHT DOLPHIN ABLE SCIENCES TERRY INDUCED CERTAIN HOPE NOVEL UNDERWATER [Tennenbaum et al]

Example topics induced from a large collection of text JOB SCIENCE BALL STORY FIELD DISEASE MIND WATER WORK STUDY GAME MAGNETIC WORLD STORIES BACTERIA FISH JOBS SCIENTISTS TEAM MAGNET DREAM TELL DISEASES Significance: SEA CAREER SCIENTIFIC FOOTBALL WIRE GERMS DREAMS CHARACTER SWIM EXPERIENCE KNOWLEDGE BASEBALL NEEDLE THOUGHT CHARACTERS FEVER SWIMMING EMPLOYMENT WORK PLAYERS CURRENT AUTHOR CAUSE IMAGINATION • Learned topics reveal hidden, POOL OPPORTUNITIES RESEARCH PLAY COIL MOMENT READ CAUSED LIKE WORKING CHEMISTRY FIELD POLES THOUGHTS TOLD SPREAD implicit semantic categories in SHELL TRAINING TECHNOLOGY PLAYER IRON SETTING VIRUSES OWN SHARK SKILLS MANY BASKETBALL COMPASS REAL TALES INFECTION the corpus TANK CAREERS MATHEMATICS COACH LINES PLOT VIRUS LIFE SHELLS POSITIONS BIOLOGY PLAYED CORE TELLING MICROORGANISMS IMAGINE SHARKS FIND PLAYING FIELD ELECTRIC SENSE SHORT PERSON • In many cases, we can DIVING POSITION PHYSICS HIT DIRECTION FICTION INFECTIOUS CONSCIOUSNESS DOLPHINS FIELD LABORATORY TENNIS FORCE STRANGE ACTION represent documents with 10 2 COMMON SWAM OCCUPATIONS STUDIES TEAMS MAGNETS FEELING TRUE CAUSING LONG topics instead of 10 5 words REQUIRE WORLD GAMES BE EVENTS SMALLPOX WHOLE SEAL OPPORTUNITY SCIENTIST SPORTS MAGNETISM BEING TELLS BODY DIVE EARN STUDYING BAT POLE TALE INFECTIONS MIGHT DOLPHIN ABLE SCIENCES TERRY INDUCED CERTAIN HOPE NOVEL • Especially important for short UNDERWATER documents (e.g., emails). Topics overlap when words don’t ! [Tennenbaum et al]

Can we analyze roles and relationships between people by analyzing email word or topic distributions?

Author-Recipient-Topic model for Email Latent Dirichlet Allocation Author-Recipient Topic (LDA) (ART) [Blei, Ng, Jordan, 2003] [McCallum, Corrada, Wang, 2004]

Enron Email Corpus • 250k email messages • 23k people Date: Wed, 11 Apr 2001 06:56:00 -0700 (PDT) From: debra.perlingiere@enron.com To: steve.hooser@enron.com Subject: Enron/TransAltaContract dated Jan 1, 2001 Please see below. Katalin Kiss of TransAlta has requested an electronic copy of our final draft? Are you OK with this? If so, the only version I have is the original draft without revisions. DP Debra Perlingiere Enron North America Corp. Legal Department 1400 Smith Street, EB 3885 Houston, Texas 77002 dperlin@enron.com

Topics, and prominent sender/receivers discovered by ART [McCallum et al, 2004] Top words within topic : Top author-recipients exhibiting this topic

Topics, and prominent sender/receivers discovered by ART Beck = “Chief Operations Officer” Dasovich = “Government Relations Executive” Shapiro = “Vice Presidence of Regulatory Affairs” Steffes = “Vice President of Government Affairs”

Discovering Role Similarity Traditional SNA ART connection strength (A,B) = Similarity in Similarity in authored topics, recipients they conditioned on sent email to recipient

Co-Training for Semi-supervised document classification Idea: take advantage of *redundancy*

my advisor Redundantly Sufficient Features Professor Faloutsos

Redundantly Sufficient Features my advisor Professor Faloutsos

Redundantly Sufficient Features

my advisor Redundantly Sufficient Features Professor Faloutsos

Co-Training Key idea: Classifier 1 and Classifier J must: 1. Correctly classify labeled examples 2. Agree on classification of unlabeled Answer 1 Answer 2 Classifier 1 Classifier 2

CoTraining Algorithm #1 [Blum&Mitchell, 1998] Given: labeled data L, unlabeled data U Loop: Train g1 (hyperlink classifier) using L Train g2 (page classifier) using L Allow g1 to label p positive, n negative examps from U Allow g2 to label p positive, n negative examps from U Add these self-labeled examples to L

CoTraining: Experimental Results • begin with 12 labeled web pages (academic course) • provide 1,000 additional unlabeled web pages • average error: learning from labeled data 11.1%; • average error: cotraining 5.0% Typical run:

Co-Training for Named Entity Extraction (i.e.,classifying which strings refer to people, places, dates, etc.) [Riloff&Jones 98; Collins et al., 98; Jones 05] Answer1 Answer2 Classifier 1 Classifier 2 New York I flew to ____ today I flew to New York today.

CoTraining setting : • wish to learn f: X � Y, given L and U drawn from P(X) • features describing X can be partitioned (X = X1 x X2) such that f can be computed from either X1 or X2 One result [Blum&Mitchell 1998]: • If – X1 and X2 are conditionally independent given Y – f is PAC learnable from noisy labeled data • Then – f is PAC learnable from weak initial classifier plus unlabeled data

Co-Training Rote Learner pages hyperlinks + My advisor + + - - - - - -

Co-Training Rote Learner pages hyperlinks + + My advisor + + + - - - - - - - - -

Co-Training Rote Learner pages hyperlinks + + + + My advisor + + + + + - - - - - - - - - - - -

Expected Rote CoTraining error given m examples : CoTraining setting → : learn f X Y = × where X X X 1 2 where x drawn from unknown distributi on ∃ ∀ = = , ( ) ( ) ( ) ( ) and g g x g x g x f x 1 2 1 1 2 2 = ∑ [ ] ∈ − ∈ m ( )( 1 ( )) E error P x g P x g j j j Where g is the j th connected component of graph j of L+U, m is number of labeled examples

Semi-Supervised Learning and Text Analysis Machine Learning 10-701 - PowerPoint PPT Presentation

Semi-Supervised Learning and Text Analysis Machine Learning 10-701 November 29, 2005 Tom M. Mitchell Carnegie Mellon University Document Classification: Bag of Words Approach aardvark 0 about 2 all 2 Africa 1 apple 0 anxious 0 ...

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

10701 Semi supervised learning Can Unlabeled Data improve supervised learning? Important

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

5 Semi-Supervised Learning BVM Tutorial: Advanced Deep Learning Methods David Zimmerer, Division

Regularizing objective functionals in semi-supervised learning Dejan Slep cev Carnegie Mellon

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

LINGUISTIC PERSPECTIVES IN CAUSATION Isabelle C HARNAVEL ( Harvard University )

Those Incredible independent city-states covering a large area including modern day Greece,

Coalition Update November, 2018 Adrianne Feinberg afeinberg@gha.org 770-249-4559 Lanetta

Research Update Examining PCa Disparities Globally Rotimi Nettey PGY-6 May 2020 Prostate

A Review of Fact-Checking, Fake News Detection and Argumentation Tariq Alhindi March 02, 2020

For Tuesday Read chapter 12, sections 1-4 Homework: Chapter 10, exercise 9 Chapter

Talk 3: On the Classical Limit of Quantum Mechanics Bruce Driver Department of Mathematics,

Beyond Convexity Submodularity in Machine Learning Andreas Krause, Carlos Guestrin Carnegie