Introduction: Where do NLP and DM Meet? 1 7/26/2015 Slightly - PDF document

7/26/2015 Successful Data Mining Methods for NLP Jiawei Han (UIUC), Heng Ji (RPI), Yizhou Sun (NEU) http://hanj.cs.illinois.edu/slides/dmnlp15.pptx[pdf] http://nlp.cs.rpi.edu/paper/dmnlp15.pptx[pdf] 1 Introduction: Where do NLP and DM Meet? 1

7/26/2015 Slightly Different Research Philosophies  Data Mining (DM): High ‐ level (statistical)  NLP: Deep understanding of understanding, discovery and synthesis of individual words, phrases and the most salient information (“Macro ‐ sentences (“micro ‐ level”); focus on level”); historically more on structured and unstructured text data semi ‐ structured data Related to “Health Care Bill”  NewsNet (Tao et al., 2014) 3 DM Solution: Data to Networks to Knowledge (D2N2K)  Advantages of NLP  Construct graphs/networks with fine ‐ grained semantics from unstructured texts Use large ‐ scale annotations for real ‐ world data   Advantages of DM: Deep understanding through structured/correlation inference Using a structured representation (e.g., graph, network) as a bridge to capture  interactions between NLP and DM  Example: Heterogeneous Information Networks [Han et al., 2010; Sun et al., 2012] Data Networks Knowledge 4 2

7/26/2015 A Promising Direction: Integrating DM and NLP  Major theme of this tutorial Applying novel DM methods to solve traditional NLP problems   Integrating DM and NLP, transforming Data to Networks to Knowledge  Road Map of this tutorial Effective Network Construction by Leveraging Information Redundancy   Theme I: Phrase Mining and Topic Modeling from Large Corpora Theme II: Entity Extraction and Linking by Relational Graph Construction   Mining Knowledge from Structured Networks  Theme III: Search and Mining Structured Graphs and Heterogeneous Networks  Looking forward to the Future 5 Theme I: Phrase Mining and Topic Modeling from Large Corpora 6 3

7/26/2015 Why Phrase Mining?  Phrase: Minimal, unambiguous semantic unit; basic building block for information network and knowledge base  Unigrams vs. phrases  Unigrams (single words) are ambiguous  Example: “United”: United States? United Airline? United Parcel Service?  Phrase : A natural, meaningful, unambiguous semantic unit  Example: “United States” vs. “United Airline”  Mining semantically meaningful phrases  Transform text data from word granularity to phrase granularity  Enhance the power and efficiency at manipulating unstructured data using database technology 7 Mining Phrases: Why Not Just Use NLP Methods?  Phrase mining: Originated from the NLP community—“Chunking”  Model it as a sequence labeling problem (B ‐ NP, I ‐ NP, O, …) Need annotation and training  Annotate hundreds of POS tagged documents as training data  Train a supervised model based on part ‐ of ‐ speech features   Recent trend:  Use distributional features based on web n ‐ grams (Bergsma et al., 2010)  State ‐ of ‐ the ‐ art Performance: ~95% accuracy, ~88% phrase ‐ level F ‐ score Limitations  High annotation cost, not scalable to a new language, domain or genre  May not fit domain ‐ specific, dynamic, emerging applications  Scientific domains, query logs, or social media, e.g., Yelp, Twitter  Use only local features, no ranking, no links to topics  8 4

7/26/2015 Data Mining Approaches for Phrase Mining  General principle: Corpus ‐ based; fully exploit information redundancy and data ‐ driven criteria to determine phrase boundaries and salience; using local evidence to adjust corpus ‐ level data statistics Phrase Mining and Topic Modeling from Large Corpora   Strategy 1: Simultaneously Inferring Phrases and Topics  Bigram topical model [Wallach’06], topical n ‐ gram model [Wang, et al.’07], phrase discovering topic model [Lindsey, et al.’12]  Strategy 2: Post Topic Modeling Phrase Construction  Label topic [Mei et al.’07], TurboTopic [Blei & Lafferty’09], KERT [Danilevsky, et al.’14] Strategy 3: First Phrase Mining then Topic Modeling:  ToPMine [El ‐ kishky, et al., VLDB’15]   Integration of Phrase Mining with Document Segmentation  SegPhrase [Liu, et al., SIGMOD’15] 9 Strategy 1: Simultaneously Inferring Phrases and Topics Bigram Topic Model [Wallach’06]  Probabilistic generative model that conditions on previous word and topic when  drawing next word Topical N ‐ Grams (TNG) [Wang, et al.’07]  Probabilistic model that generates words in textual order  Create n ‐ grams by concatenating successive bigrams (a generalization of Bigram  Topic Model) Phrase ‐ Discovering LDA (PDLDA) [Lindsey, et al.’12]   Viewing each sentence as a time ‐ series of words, PDLDA posits that the generative parameter (topic) changes periodically  Each word is drawn based on previous m words (context) and current phrase topic  High model complexity: Tends to overfitting; High inference cost: Slow 10 5

7/26/2015 Strategy 2: Post Topic Modeling Phrase Construction  TurboTopics [Blei & Lafferty’09] – Phrase construction as a post ‐ processing step to Latent Dirichlet Allocation Perform Latent Dirichlet Allocation on corpus to assign each token a topic label   Merge adjacent unigrams with the same topic label by a distribution ‐ free permutation test on arbitrary ‐ length back ‐ off model  End recursive merging when all significant adjacent unigrams have been merged  KERT [Danilevsky et al.’14] – Phrase construction as a post ‐ processing step to Latent Dirichlet Allocation  Perform frequent pattern mining on each topic Perform phrase ranking based on four different criteria  11 Example of TurboTopics Perform LDA on corpus to assign each token a topic label  E.g., … phase 11 transition 11 …. game 153 theory 127 …   Then merge adjacent unigrams with same topic label 12 6

7/26/2015 Framework of KERT 1. Run bag ‐ of ‐ words model inference and assign topic label to each token 2. Extract candidate keyphrases within each topic Frequent pattern mining Frequent pattern mining 3. Rank the keyphrases in each topic Comparability property: directly compare phrases of mixed lengths kpRel KERT KERT KERT KERT [Zhao et al. 11] ( ‐ popularity) ( ‐ discriminativeness) ( ‐ concordance) [Danilevsky et al. 14] learning effective support vector machines learning learning classification text feature selection classification support vector machines selection probabilistic reinforcement learning selection reinforcement learning models identification conditional random fields feature feature selection algorithm mapping constraint satisfaction decision conditional random fields features task decision trees bayesian classification decision planning dimensionality reduction trees decision trees : : : : : 13 Strategy 3: First Phrase Mining then Topic Modeling  ToPMine [El ‐ Kishky et al. VLDB’15] First phrase construction, then topic mining   Contrast with KERT: topic modeling, then phrase mining  The ToPMine Framework:  Perform frequent contiguous pattern mining to extract candidate phrases and their counts Perform agglomerative merging of adjacent unigrams as guided by a significance  score—This segments each document into a “bag ‐ of ‐ phrases”  The newly formed bag ‐ of ‐ phrases are passed as input to PhraseLDA, an extension of LDA, that constrains all words in a phrase to each sharing the same latent topic 14 7

7/26/2015 Why First Phrase Mining then Topic Modeling ?  With Strategy 2, tokens in the same phrase may be assigned to different topics Ex. knowledge discovery using least squares support vector machine classifiers…   Knowledge discovery and support vector machine should have coherent topic labels Solution: switch the order of phrase mining and topic model inference  [ knowledge discovery] using [least [ knowledge discovery] using [least squares] [support vector machine] squares] [support vector machine] [classifiers] … [classifiers] … Techniques   Phrase mining and document segmentation  Topic model inference with phrase constraint 15 Phrase Mining: Frequent Pattern Mining + Statistical Analysis Quality phrases Based on significance score [Church et al.’91]: α (P 1 , P 2 ) ≈ (f(P 1 ● P 2 ) ̶ µ 0 (P 1 ,P 2 ))/ √ f(P 1 ● P 2 ) [Markov blanket] [feature selection] for [support vector Raw True Phrase freq. freq. machines] [support vector machine] 90 80 [knowledge discovery] using [least squares] [support 95 0 [vector machine] vector machine] [classifiers] [support vector] 100 20 …[support vector] for [machine learning]… 16 8

Introduction: Where do NLP and DM Meet? 1 7/26/2015 Slightly - PDF document

7/26/2015 Successful Data Mining Methods for NLP Jiawei Han (UIUC), Heng Ji (RPI), Yizhou Sun (NEU) http://hanj.cs.illinois.edu/slides/dmnlp15.pptx[pdf] http://nlp.cs.rpi.edu/paper/dmnlp15.pptx[pdf] 1 Introduction: Where do NLP and DM Meet?

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Checki king in and Treating High-Achievi ving Students Meet Meet you your r Doctor Doctor

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

Facing NLP German Rigau i Claramunt http://adimen.si.ehu.es/~rigau IXA group Departamento de

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Natural Language Processing (NLP) In 11-711 Algorithms for NLP we take an

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

What your Team needs to know and do at a CARA Meet PRIOR TO THE MEET 1. OBTAIN MEET SCHEDULE

Prominent Research Directions in NLP Alexander Panchenko Assistant Professor for NLP About

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Capsule Networks for NLP Will Merrill Advanced NLP 10/25/18 Capsule Networks: A Better ConvNet

Extracting semantic relations from unlabeled text Chandra Prakash Vishal Kumar Gupta Mentor: Dr.

Biodiversity Disturbance Succession Works Cited Return to Table of Contents Slide

Data Mining: Concepts and Techniques Chapter 9 Graph mining and Social Network Analysis

The Longitudinal Aging Study Amsterdam and the challenge of informing policy and practice

Sec 4 Sec 4 & 5 & 5 Par arent ent Enga Engagement Session gement Session 18 Jan

How big will an epidemic be? Illuminations from a simple model Collective Dynamics Group,

REGULAR MEETING OF CITY COUNCIL October 7, 2013 at 12:00 p.m. With immediate adjournment to

I have nothing to disclose in relation to Kjell Torn, MD, PhD this topic Professor, Section

Introduction: Where do NLP and DM Meet? 1 7/26/2015 Slightly - PDF document

7/26/2015 Successful Data Mining Methods for NLP Jiawei Han (UIUC), Heng Ji (RPI), Yizhou Sun (NEU) http://hanj.cs.illinois.edu/slides/dmnlp15.pptx[pdf] http://nlp.cs.rpi.edu/paper/dmnlp15.pptx[pdf] 1 Introduction: Where do NLP and DM Meet?

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Checki king in and Treating High-Achievi ving Students Meet Meet you your r Doctor Doctor

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

Facing NLP German Rigau i Claramunt http://adimen.si.ehu.es/~rigau IXA group Departamento de

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Natural Language Processing (NLP) In 11-711 Algorithms for NLP we take an

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

What your Team needs to know and do at a CARA Meet PRIOR TO THE MEET 1. OBTAIN MEET SCHEDULE

Prominent Research Directions in NLP Alexander Panchenko Assistant Professor for NLP About

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Capsule Networks for NLP Will Merrill Advanced NLP 10/25/18 Capsule Networks: A Better ConvNet

Extracting semantic relations from unlabeled text Chandra Prakash Vishal Kumar Gupta Mentor: Dr.

Biodiversity Disturbance Succession Works Cited Return to Table of Contents Slide

Data Mining: Concepts and Techniques Chapter 9 Graph mining and Social Network Analysis

The Longitudinal Aging Study Amsterdam and the challenge of informing policy and practice

Sec 4 Sec 4 &amp; 5 &amp; 5 Par arent ent Enga Engagement Session gement Session 18 Jan

How big will an epidemic be? Illuminations from a simple model Collective Dynamics Group,

REGULAR MEETING OF CITY COUNCIL October 7, 2013 at 12:00 p.m. With immediate adjournment to

I have nothing to disclose in relation to Kjell Torn, MD, PhD this topic Professor, Section

Sec 4 Sec 4 & 5 & 5 Par arent ent Enga Engagement Session gement Session 18 Jan