Natural ___ Processing Noah A. Smith School of Computer Science - PowerPoint PPT Presentation

Natural ___ Processing Noah A. Smith School of Computer Science Carnegie Mellon University nasmith@cs.cmu.edu

This Talk 1. A light discussion of micro-analysis words “micro” sentences documents 2. Some recent developments in macro-analysis , specifically linking text and non-text “macro”

Text Mining/Information Retrieval vs. Natural Language Processing • Typical representation of a document’s text content: bag of words • Great for ranking documents given a query, and for classifying documents, and a number of other applications. • Engineering: Will (some) applications work better if we “square” the data and opt for more abstract representations of text? • “Turning unstructured data into structured data” • Cf. Claire’s talk yesterday!

Beyond Engineering • This isn’t just about how to build a better piece of software. • Old AI philosophers: Is a bag of words really “understanding”? • Computational linguistics: computational models for theorizing about the phenomenon of human language • Very hard problem: representing what a piece of text “means” • Long running debate: linguistic theories and NLP • Less controversial: linguistic representations in NLP • Is there a parallel for computational social science ?

Deeper, More Abstract, Structured Representations • Stemming, lemmatization, and (for other languages) morphological disambiguation Jan’s Continuum • Syntactic disambiguation (parts of speech, phrase- structure parsing, dependency parsing) • Word sense disambiguation, semantic roles, predicate- argument structures • Named entity recognition, coreference resolution • Opinion, sentiment, and subjectivity analysis, discourse relations and parsing

Phrase-Structure Syntax SINV S-TPC VP NP-PRD NP-SBJ NP-SBJ SBAR SBAR S WHNP S VP VP VP VP NP-SBJ NP PP WHNP PP NP-SBJ NP PP NP NP S NP NP NP NP * * * * " DT VBZ RB DT NN IN NN TO VB DT JJ NN IN , " VBD * NNP NNP , WP VBZ NN IN DT JJ NNS . " This is not the sort of market to have a big position in , " said David Smith , who heads trading in all non-U.S. stocks . 0 28

http://www.ark.cs.cmu.edu/TurboParser Dependency Syntax " This is not the sort of market to have a big position in , " said David Smith , who heads trading in all non-U.S. stocks . 0 28 " This is not the sort of market to have a big position in , " said David Smith , who heads trading in all non-U.S. stocks . 0

http://www.ark.cs.cmu.edu/SEMAFOR Frame-Semantic Structure Emotion Emotion People by directed directed vocation Make noise The professor chuckled with unabashed glee Experiencer State Sound Internal cause Sound source Experiencer

A Bit of History • Many of these are “old” problems. • Twenty years ago we started using text data to solve them: 1. Pay experts to annotate examples. 2. Use a combination of human ingenuity and statistics to build systems and/or do science. • Sound familiar? + Reusable data, clearly identified evaluation tasks, benchmarks → progress - Domain specificity, overcommitting to representations, nearsightedness

Comments on “Domain” • Huge challenge: models built for one domain don’t work well in others. • What’s a “domain”? • Huge effort expended on news and, later, biomedical articles. • Lesson: NLP people will follow the annotated datasets.

The Next Paradigm? • If we could do with less or no annotated data, maybe we could be more “agile” about text domains and representations. 1. Pay experts to annotate examples. 2. Use a combination of human ingenuity and statistics to build systems and/or do science. • Small, noisy datasets; active, semisupervised, and unsupervised learning • NB: this is where we do Bayesian statistics! • Can we get information about language from other data? (See part 2.)

Current Work (Yano, Resnik, and Smith, 2010) • Pilot study: can untrained annotators consistently annotate political bias? • What clues give away political bias in sentences from political blogs? (See Monroe, Colaresi, and Quinn, 2008) • We could probably define “political bias” under Jan’s subjectivity umbrella! • Sample of sentences from six political blogs (2008) - not uniform • Amazon Mechanical Turk judgments of bias (buzzword: crowdsourcing) • Survey of basic political views of annotators

Example 1 Feminism has much to answer for denigrating men and encouraging women to seek independence whatever the cost to their families.

Example 2 They died because of the Bush administration’s hubris.

Example 3 In any event, for those of us who have followed this White House carefully during the entirety of the war, the charge is frankly absurd.

Other Observations • Bias is hard to label without context, but not always impossible. • Pairwise kappa of 0.5 to 0.55 (decent for “moderately difficult” tasks) • Annotators: more social liberals, more fiscal conservatives • Liberal blogs are liberal, conservative blogs are conservative • Liberals are quick to see conservative bias; conservatives see both!

Lexical Indicators of Bias Overall Liberal Conservative Not Sure Which bad 0.60 Administration 0.28 illegal 0.40 pass 0.32 personally 0.56 Americans 0.24 Obama’s 0.38 bad 0.32 illegal 0.53 woman 0.24 corruption 0.32 sure 0.28 woman 0.52 single 0.24 rich 0.28 blame 0.28 single 0.52 personally 0.24 stop 0.26 they’re 0.24 rich 0.52 lobbyists 0.23 tax 0.25 happen 0.24 corruption 0.52 Republican 0.22 claimed 0.25 doubt 0.24 Administration 0.52 union 0.20 human 0.24 doing 0.24 Americans 0.51 torture 0.20 doesn’t 0.24 death 0.24 conservative 0.50 rich 0.20 difficult 0.24 actually 0.24 doubt 0.48 interests 0.20 Democrats 0.24 exactly 0.22 torture 0.47 doing 0.20 less 0.23 wrong 0.22 Most strongly biased words, ranked by relative frequency of receiving a bias mark, normalized by total frequenc

Current Work (Eisenstein, Yano, Smith, Cohen, and Xing, in progress) • Goal: find mentions of specific “persons of interest” in noisy multi-author text • Starting point: a list of names • Cf. noun phrase coreference resolution • Deal with misspellings, variations, titles, even sobriquets like “Obamessiah” • Exploit local and document context • Predominantly unsupervised approach (supervision is the list of names)

NLP Success Stories ? Search engines ✓ Translation (Google) ✓ Information extraction (text → databases) ✓ Question answering (recently: IBM’s Watson) ✓ Opinion mining

Opinion NLP: The Refrigerator of the Information Age • It enables all kinds of more exciting activities. • It does something you can’t do very well (cf. dishwashers). • When it’s working for you, it’s always quietly humming in the background. • You don’t think about it too often, and instructions never mention it. • When it stops working, you will notice. • Though expertise is required, there is very little glamor in manufacturing, maintaining, or improving it.

Note to John: These may be superficial. 2. Adventures in Macro-Analysis: Linking Text to Other Data social behavior “data describing government activity in all parts of the policymaking process” economic, financial, political

Natural ___ Processing Noah A. Smith School of Computer Science - PowerPoint PPT Presentation

Natural ___ Processing Noah A. Smith School of Computer Science Carnegie Mellon University nasmith@cs.cmu.edu This Talk 1. A light discussion of micro-analysis words micro sentences documents 2. Some recent developments in

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Oil & Natural Gas Production, Oil & Natural Gas Production, Oil & Natural Gas

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Introduction Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing: Traditional Processing Pipeline Roman Kern <rkern@tugraz.at>

Natural & Cultural Scottish Natural Heritage Heritage Fund Natural & Cultural

Natural Refrigerants Natural Refrigerants Natural Refrigerants Natural Refrigerants Safe

v_of_rho Step 3 : compute V H & Vxc Hartree potential Hartree potential is computed from the

AFTER vision screening, eye exam, and prescription glasses NOTE TO PRESENTER: Click remote or use

secure software @lady_nerd laura@safestack.io https://safestack.io In this talk Everything is

Great British Scientists Learning Objective: T o explor e the w ork of Anning, W allace and

Overview Motivation Introduction of the RDM method Recent results Summary and

Berry Phases and Curvatures in Electronic-Structure Theory David Vanderbilt Rutgers University

Movement Marco Chiarandini Department of Mathematics & Computer Science University of

Symmetry Matters Learning Scalars and Tensors in Materials and Molecules David M. Wilkins

Sambuz

Useful Links

Newsletter

Mail Us

Natural ___ Processing Noah A. Smith School of Computer Science - PowerPoint PPT Presentation

Natural ___ Processing Noah A. Smith School of Computer Science Carnegie Mellon University nasmith@cs.cmu.edu This Talk 1. A light discussion of micro-analysis words micro sentences documents 2. Some recent developments in

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Oil &amp; Natural Gas Production, Oil &amp; Natural Gas Production, Oil &amp; Natural Gas

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Introduction Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing: Traditional Processing Pipeline Roman Kern &lt;rkern@tugraz.at&gt;

Natural &amp; Cultural Scottish Natural Heritage Heritage Fund Natural &amp; Cultural

Natural Refrigerants Natural Refrigerants Natural Refrigerants Natural Refrigerants Safe

v_of_rho Step 3 : compute V H &amp; Vxc Hartree potential Hartree potential is computed from the

AFTER vision screening, eye exam, and prescription glasses NOTE TO PRESENTER: Click remote or use

secure software @lady_nerd laura@safestack.io https://safestack.io In this talk Everything is

Great British Scientists Learning Objective: T o explor e the w ork of Anning, W allace and

Overview Motivation Introduction of the RDM method Recent results Summary and

Berry Phases and Curvatures in Electronic-Structure Theory David Vanderbilt Rutgers University

Movement Marco Chiarandini Department of Mathematics &amp; Computer Science University of

Symmetry Matters Learning Scalars and Tensors in Materials and Molecules David M. Wilkins

Sambuz

Useful Links

Newsletter

Mail Us

Oil & Natural Gas Production, Oil & Natural Gas Production, Oil & Natural Gas

Natural Language Processing: Traditional Processing Pipeline Roman Kern <rkern@tugraz.at>

Natural & Cultural Scottish Natural Heritage Heritage Fund Natural & Cultural

v_of_rho Step 3 : compute V H & Vxc Hartree potential Hartree potential is computed from the

Movement Marco Chiarandini Department of Mathematics & Computer Science University of