Part II: Joint Extraction of Typed Entity and Relation Effort-Light - - PowerPoint PPT Presentation
Part II: Joint Extraction of Typed Entity and Relation Effort-Light - - PowerPoint PPT Presentation
Constructing Structured Information Networks from Massive Text Corpora Part II: Joint Extraction of Typed Entity and Relation Effort-Light StructMine: Methodology Data-driven text Entity names segmentation Text & context units
Effort-Light StructMine: Methodology
2
Data-driven text segmentation
(SIGMOD’15, WWW’16)
Entity names & context units
Pa Partially- la labele led corpus
Learning Corpus- specific Model
(KDD’15, KDD’16, EMNLP’16, WWW’17)
St Structures from the remaining unl unlabe abeled data
Knowledge bases
Text corpus
Closed-world Assumption Open-world Assumption
vs
Effort-Light StructMine: Typing
3
Data-driven text segmentation
(SIGMOD’15, WWW’16)
Entity names & context units
Pa Partially- la labele led corpus
Learning Corpus- specific Model
(KDD’15, KDD’16, EMNLP’16, WWW’17)
St Structures from the remaining unl unlabe abeled data
Entity Recognition and Coarse-grained Typing (KDD’15) Fine-grained Entity Typing (KDD’16) Joint Entity and Relation Extraction (WWW’17)
Corpus to Structured Network: The Roadmap
Knowledge bases
Text corpus
En Entity Re Recognition an and Co Coarse-gr grained ed Ty Typing (K (KDD’15) Fine-grained Entity Typing (KDD’16)
Corpus to Structured Network: The Roadmap
4
Joint Entity and Relation Extraction (WWW’17)
Data-driven text segmentation
(SIGMOD’15, WWW’16)
entity names & context units Partially- labeled corpus
Learning Corpus- specific Model
(KDD’15, KDD’16, EMNLP’16, WWW’17)
Structures from the remaining unlabeled data
Knowledge bases
Text corpus
5
Recognizing Entities of Target Types in Text
The best BBQ BBQ I’ve tasted in Ph Phoen enix ! I had the pulled p pork s sandwich with co coleslaw and bak baked d be beans ans for lunch. The
- w
- wner is very nice. …
food
location person
The best BBQ I’ve tasted in Phoenix! I had the pulled pork sandwich with coleslaw and baked beans for lunch. The
- wner is very nice. …
Traditional Named Entity Recognition (NER) Systems
- Heavy reliance on corpus-specific human labeling
- Training sequence models is slow
6
A manual annotation interface
e.g., (McMallum & Li, 2003), (Finkel et al.,2005), (Ratinov & Roth, 2009), …
The The be best BBQ BBQ I’ I’ve ta tasted in in Ph Phoenix ix O O Food O O O Location
Sequence ce mo model tr training
NER Systems: Stanford NER Illinois Name Tagger IBM Alchemy APIs …
Weak-Supervision Systems: Pattern-Based Bootstrapping
- Requires manual seed selection & mid-point checking
7
Annotate corpus using entities Generate candidate patterns Score candidate patterns Select Top patterns Apply patterns to find new entities
Se Seeds fo for Food
Pi Pizza Fr French Fr Fries Ho Hot Do Dog Panc ancak ake
.. ...
Seed entities and corpus Patterns for Food th the be best <X <X> I’ I’ve tr tried in in th their <X <X> ta tastes am amaz azing ing …
e.g., (Etzioni et al., 2005), (Talukdar et al., 2010), (Gupta et al., 2014), (Mitchell et al., 2015), …
Systems: CMU NELL UW KnowItAll Stanford DeepDive Max-Planck PROSPERA …
8
Leveraging Distant Supervision
1. 1. Detec ect entity names from text 2. 2. Ma Match name strings to KB entities 3. 3. Pr Propagate te types to the un-matchable names
ID
Sentence
S1
Ph Phoenix is my all-time favorite dive bar in Ne New Yo York Ci City .
S2
The best BBQ BBQ I’ve tasted in Ph Phoenix.
S3
Ph Phoenix has become one of my favorite bars in NY NY . BBQ BBQ Ph Phoen enix NY NY tas asted in in has as be become me on
- ne of
- f
my my fa favorite bar bars in in Location Ne New Yo York Ci City ??? ??? Food is my my all all-ti time fa favorite div dive bar bar in in Location à
(Lin et al., 2012), (Ling et al., 2012), (Nakashole et al., 2013)
ID Sentence S1 Ph Phoenix is my all-time favorite dive bar in Ne New Yo York Ci City . S2 The best BBQ BBQ I’ve tasted in Ph Phoenix. S3 Ph Phoenix has become one of my favorite bars in NY NY .
Current Distant Supervision: Limitation I
- 1. Context-agnostic type prediction
- Predict types for each mention regardless of context
- 2. Sparsity of contextual bridges
9
Current Distant Supervision: Limitation II
- 1. Context-agnostic type prediction
- 2. Sparsity of contextual bridges
- Some re
relational phr phrases es are in infr frequent in the corpus à ineffective type propagation
10
ID Sentence S1
Ph Phoenix is my all-time favorite dive bar in Ne New Yo York Ci City .
S3
Ph Phoenix has become one of my favorite bars in NY NY .
11
ClusType: Data-Driven Entity Mention Detection
Corpus-level Concordance Syntactic quality Quality
- f merging
- Significance of a merging between two sub-phrases
Pattern Example
(J (J*)N )N* support vector machine VP VP tasted in, damage on VW VW*(P (P) train a classifier with
Good Concordance
12
ClusType: Data-Driven Entity Mention Detection
Corpus-level Concordance Syntactic quality Quality
- f merging
- Significance of a merging between two sub-phrases
Pattern Example
(J (J*)N )N* support vector machine VP VP tasted in, damage on VW VW*(P (P) train a classifier with
Good Concordance
The best BBQ BBQ I’ve ta tasted in Ph Phoenix ! I ha had the pulle pulled po d pork sandw sandwic ich h wi with th co coleslaw and bak baked be d beans ans for lunch. … This plac place se serves up up the best ch cheese eese st steak sandw sandwich ch in in we west of
- f Mi
Missi ssissi ssippi.
13
My Solution: ClusType (KDD’15)
BBQ BBQ NY NY tas asted in in has as be become me on
- ne of
- f
my my fa favorite bar bars in in Ne New Yo York Ci City is is my all all-ti time fa favorite dive bar in
ID Segmented Sentences
S1
Ph Phoenix is is my my all all-ti time fa favorite di dive ba bar in in Ne New Yo York Ci City .
S2
The best BBQ BBQ I’ve ta tasted in in Ph Phoenix.
S3
Ph Phoenix ha has be become on
- ne of
- f my
my fa favorite ba bars in in NY NY .
S2 S2: BBQ BBQ
S3 S3: NY NY S1 S1: Ne New Yo York Ci City
S2 S2: Ph Phoen enix S3 S3: Ph Phoen enix
Pu Putting two su sub- ta tasks to together: 1. Type label propagation 2. Relation phrase clustering
Similar relation phrases
Correlated mentions
Ph Phoen enix S1 S1: Ph Phoen enix
Represents
- bject
interactions
14
Type Propagation in ClusType
Sm Smoothness As Assumption If two objects are similar according to the graph, then their type labels should be also similar
Ed Edge we weight / ob
- bject si
similarity
BBQ BBQ NY NY tas asted in in has as be become me on
- ne of
- f
my my fa favorite bar bars in in Ne New Yo York Ci City is is my all all-ti time fa favorite dive bar in Ph Phoen enix
fi fj
S3 S3: Ph Phoen enix S1 S1: Ph Phoen enix
Wij
(Belkin & Partha, NIPS’01), (Ren et al., KDD’15)
15
Relation Phrase Clustering in ClusType
NY NY has as be become me on
- ne of
- f
my my fa favorite bar bars in in Ne New Yo York Ci City is is my all all-ti time fa favorite div dive bar bar in in Ph Phoen enix Similar relation phrases Location ??? Location à
(Ren et al., KDD’15)
Two subtasks mutually enhance each other
- Two relation phrases should be grouped together if:
1. Similar string 2. Similar context 3. Similar types for entity arguments
“Multi-view” clustering 5 102
16
ClusType: Comparing with State-of-the-Art Systems (F1 Score)
Me Methods NY NYT Ye Yelp Tw Tweet
Pa Pattern (Stanford, CONLL’14) 0.301 0.199 0.223 Se SemT mTagger (U Utah, ACL’10) 0.407 0.296 0.236 NNP NNPLB (UW, EMNLP’12) 0.637 0.511 0.246 AP APOLLO (THU, CIKM’12) 0.795 0.283 0.188 FI FIGER (UW, AAAI’12) 0.881 0.198 0.308 Cl Clus usType pe (KDD’15) 0. 0.939 939 0. 0.808 808 0. 0.451 451
Precision (P) =
#"#$$%&'()*')+%, .%/'0#/1 #2)1'%.*$%/04%, .%/'0#/1 , Recall (R) = #"#$$%&'()*')+%, .%/'0#/1 #3$#5/,*'$5'6 .%/'0#/1 , F1 score = 7 8×: (8<:)
Bootstrapping Label propagation Classifier with linguistic features
NYT: 118k news articles (1k manually labeled for evaluation); Yelp: 230k business reviews (2.5k reviews are manually labeled for evaluation); Tweet: 302 tweets (3k tweets are manually labeled for evaluation)
- vs. bo
bootstrappi pping ng: context-aware prediction on “un-matchable”
- vs
- vs. lab
label pr propa pagation: group similar relation phrases
- vs
- vs. FI
FIGER: no reliance on complex feature engineering
Entity Recognition and Coarse-grained Typing (KDD’15) Fi Fine-gr grained ed En Entity Ty Typing (K (KDD’16)
17
Joint Entity and Relation Extraction (WWW’17)
entity names & context units Partially- labeled corpus
Learning Corpus- specific Model
(KDD’15, KDD’16, EMNLP’16, WWW’17)
Structures from the remaining unlabeled data
Knowledge bases
Corpus to Structured Network: The Roadmap
Text corpus
Data-driven text segmentation
(SIGMOD’15, WWW’16)
18
From Coarse-Grained Typing to Fine-Grained Entity Typing
ID Sentence S1
Do Donald ld Tr Trump spent 14 television seasons presiding
- ver a game show, NBC’s The Apprentice.
Person
Location Organization
root product person location
- rganiz
ation
... ...
politician artist business man
... ... ...
author actor singer
... ... ...
A few common types A type hierarchy with 100+ types (from knowledge base)
(Ling et al., 2012), (Nakashole et al., 2013), (Yogatama et al., 2015)
Current Distant Supervision: Context-Agnostic Labeling
19
root person location
- rganization
politician artist businessman author actor singer
...
En Enti tity ty ty types fr from
- m
kn knowledge ge bas base
Entity: Donald Trump S1 S1: Do Donal nald Tr Trump Entity Types: pe person, ar artis ist, act ctor, author, businessman, politician
ID Sentence S1
Do Donald Tr Trump spent 14 television seasons presiding over a game show, NBC’s The Apprentice
- Inaccurate labels in training data
- Prior work: all labels are “perfect”
My Solution: Partial Label Embedding (KDD’16)
20
“De-noised” labeled data
ID
Sentence
S1
Donald Trump spent 14 television seasons presiding
- ver a game show, NBC’s The Apprentice
Extract Text Features “Label Noise Reduction” with PLE Train Classifiers
- n De-noised Data
Prediction
- n New Data
S1 S1: Do Donal nald Tr Trump Entity Types: person, artist, actor, author, businessman, politician
Text features: TOKEN_Donald, CONTEXT: television, CONTEXT: season, TOKEN_trump, SHAPE: AA
More effective classifiers
(Ren et al., KDD’16)
PLE: Modeling Clean and Noisy Mentions Separately
21
For a clean mention, its “positive types” should be ranked higher than all its “negative types” For a noisy mention, its “best candidate type” should be ranked higher than all its “non-candidate types”
S1 S1: Do Donald Tr Trump Types in KB: person, artist, actor, author, businessman, politician
ID
Noisy Entity Mention S1
Donald Trump spent 14 television seasons presiding over a game show, NBC’s The Apprentice
(+) actor 0.88 (+ (+) artist 0.74 (+) person 0.55 (+) author 0.41 (+) politician 0.33 (+) business 0.31
“Best” candidate type
(+) actor (-) singer (-) coach (-) doctor (-) location (-) organization
Types ranked
(Ren et al., KDD’16)
Si: Te Ted Cr Cruz Types in KB: person, politician
ro root pr produc duct pe person lo locat atio ion
- r
- rganization
- n
po politician ar artis ist bus busine nessman au author ac actor si singer
... ...
pr preside dent po politician pe person ac actor sen senator
- r
gav gave ad address st star pl play
ID Sentence Si President Trump gave an all-hands address to troops at the U.S. Central Command headquarters …
+ + +
Pr President gav gave spe speech Vectors for text features Test mention: Si_T _Trump
- Top-down nearest neighbor search
in the given type hierarchy
Low-dimensional vector space
Type Inference in PLE
Type hierarchy (from knowledge base) (Ren et al., KDD’16)
PLE: Performance of Fine-Grained Entity Typing
23
- Ra
Raw: candidate types from distant supervision
- WS
WSABIE (Google, ACL’15): joint feature and type embedding
- Predictive Text Embedding
(MSR, WWW’15): joint mention,
feature and type embedding
- Both WASBIE and PTE suffer
from “noisy” training labels
- PL
PLE (KDD’16): partial-label loss for context-aware labeling
0.7 0.45 0.05 0.79 0.49 0.14 0.78 0.51 0.19 0.81 0.62 0.48
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Level-1 Level-2 Level-3 Accuracy on different type levels
Raw WSABIE PTE PLE
Accuracy =
# .%/'0#/1 >0'6 ?(( ')+%1 &#$$%&'() +$%,0&'%, # .%/'0#/1 0/ '6% '%1' 1%'
On OntoNotes public dataset (Weischedel et al. 2011, Gillick et al., 2014): 13,109 news articles, 77 annotated documents, 89 entity types
Entity Recognition and Coarse-grained Typing (KDD’15) Fine-grained Entity Typing (KDD’16)
24
Jo Joint En Entity an and Re Relation Extract ction (W (WWW’17)
entity names & context units Partially- labeled corpus
Learning Corpus- specific Model
(KDD’15, KDD’16, EMNLP’16, WWW’17)
Structures from the remaining unlabeled data
Knowledge bases
Corpus to Structured Network: The Roadmap
Text corpus
Data-driven text segmentation
(SIGMOD’15, WWW’16)
Joint Extraction of Typed Entities and Relations
25
Donald Trump
located at
protest Women’s March Washington D.C. Washington United States
president of located at located at aim at
Person Location Organization Event
The Wo Women’s s Ma March was a worldwide pr protest on January 21,
- 2017. The pr
protest was aimed at Do Donald ald T Trump, the recently inaugurated president of the Un Unit ited ed States
- es. The first pr
protest was planned in Wa Washington, D.C., and was known as the Wo Women‘s Ma March on Wa Washington.
Prior Work: Relation Extraction (RE)
26 Mintz et al. Distant supervision for relation extraction without labeled data. ACL, 2009. Etzioni et al. Web-scale information extraction in knowitall. WWW, 2004. Surdeanu et al. Multi-instance multi-label learning for relation extraction. EMNLP, 2012.
Supervised RE systems Pattern-based bootstrapping RE systems Distantly- supervised RE systems (cont.)
- Hard to be ported to
deal with different kinds of corpora
- Focus on “explicit”
relation mentions
- “Semantic drift”
- Error propagation
- Noisy candidate
type labels
No human annotation Substantial human annotation
Prior Work: An “Incremental” System Pipeline
27
Entity mention detection Context-aware entity typing Relation mention detection Context-aware relation typing
(wo women, pro protest) à (p (protest, Ja January 21, 21, 2017 2017) The Wo Women ’s March was a worldwide pro protest on Januar January 21, 21, 2017 2017.
Entity boundary errors:
(wo women, pro protest) ✗ (pro protest, Januar January 21, 21, 2017 2017)
Relation mention errors:
is a ✗ The Wo Women ’s March was a worldwide pro protest on Januar January 21, 21, 2017
- 2017. à
Entity type errors: person
✗
(wo women, pro protest) à (p (protest, Ja January 21, 21, 2017 2017)
Relation type errors
is a ✗
Error propagation cascading down the pipeline
(Mintz et al., 2009), (Riedel et al., 2010), (Hoffmann et al., 2011), (Surdeanu et al., 2012), (Nagesh et al., 2014), …
- Con
Context-aw aware type modeling
- Model en
entity-re relation in interac actio ions ns
My Solution: CoType (WWW’17)
28
(Ren et al. WWW’17)
- 1. Data-driven detection of
entity and relation mentions
- Data-driven text segmentation
- Syntactic pattern learning from KBs
- 2. Joint typing of entity and
relation mentions
Entity mention detection Context-aware entity typing Relation mention detection Context-aware relation typing
- Con
Context-aw aware type modeling
- Model en
entity-re relation in interac actio ions ns
My Solution: CoType (WWW’17)
29
(Ren et al. WWW’17)
- 1. Data-driven detection of
entity and relation mentions
- Data-driven text segmentation
- Syntactic pattern learning from KBs
- 2. Joint typing of entity and
relation mentions
Entity mention detection Context-aware entity typing Relation mention detection Context-aware relation typing
Data-Driven Entity and Relation Detection
30
S2: The protest was aimed at Donald Trump, the recently inaugurated president of the United States. Frequent Pattern Mining Phrases quality: United States: 0.9, was aimed at: 0.4, .... Part-of-speech (POS) patterns quality: ADJ NN: 0.85, V PROP: 0.4, … Segment Quality Estimation POS-guided Segmentation Quality Re-estimation & Re-segmentation
(S2: pro protest, Do Donal nald Tr Trump), ), (S2: Do Donal nald Tr Trump, Un United St States es)
S2: The pr protest was aim aimed d at Do Donald Trump, the recently inaugurated pr preside ident of the Un United S States. S2: The pr protest was aimed at Do Donald Trump, the recently inaugurated president of the Un United S States.
Entity Mention Detection: Results
31
NYT NYT Wi Wiki-KBP KBP Bi BioInfer FI FIGER se segmenter [U [UW, 2012 2012] 0.751 0.814 0.652 Ou Our Approach ch 0. 0.837 837 0.833 0.785
CoType: Co-Embedding for Typing Entities and Relations
32
(Ren et al. WWW’17)
Mention Feature Type politician artist person None (“Barack Obama”, “US”, S1) author_of born_in president_of None EM1_Obama BETWEEN_ book BETWEEN_ president HEAD_Obama TOKEN_ States CONTEXT_ book CONTEXT_ president EM1_ Barack Obama book (“Barack Obama”, “United States”, S3) (“Obama”, “Dream of My Father”, S2) travel_to S1_"US” S1_”Barack Obama” S3_”Barack Obama” S3_”United States” S2_”Dream of My Father” S2_”Obama” Relation Mention Entity Mention Entity Type LOC ORG
S2_Obama author politician S1_Barack Obama
Entity Mention Embedding Space
CONTEXT_ president president_of author_of (Barack Obama, US, S1) BETWEEN_ president BETWEEN_ book
Relation Mention Embedding Space
(“Obama”, “Dream of My Father”, S2)
Model entity- relation interactions
S3_Barack Obama artist CONTEXT_ book person location travel_to BETWEEN_ back to
Object interactions in a heterogeneous graph Low-dimensional vector spaces
Modeling Mention-Feature Co-occurrences
- Second-order Proximity
Mentions with similar distributions over text features should have similar types
33
Vertex mi and mj have a large second-
- rder proximity
TKN_ Trump BETWEEN_ president EM2_US EM2_United States
S7 S7: (Do Donald Tr Trump, Un United St States) S6 S6: (Tr Trump, US US)
mi mj ???
president of
(Tang et al., WWW’15), (Ren et al. WWW’17)
Challenge: Context-Agnostic Labeling
34
ID Sentence
S2
The protest was aimed at Do Donald Trump, the recently inaugurated president of the Un United States. Re Relation ty types fr from kn knowl wledge ba base se Type labels for relation mention:
Entity 1 Donald Trump Entity 2 United States
president of live in born in E1: Do Donald J.
- J. Tr
Trump E2: Un United St States E1 Types: pe perso son, po politician, businessman, author, actor E2 Types: lo locatio ion,
- rganization
Relations between E1, E2 in KB: pr preside dent of
- f, liv
live in in, bo born rn in in
Context-Aware Type Modeling
Partial-label Loss Function
- Vector representation of the relation
mention should be more e si similar to its “bes est” candidate e ty type, than to any
- ther no
non-candidate e ty type
35
Ma Maximal sco core fo for non-candidate ty types Sco core fo for “best” candidate ty type
(Ren et al. WWW’17)
Modeling Entity-Relation Interactions
Object “Translating” Assumption
For a relation mention z between entity arguments m1 m1 and m2 m2:
ve vec(m1) ≈ ve vec(m2) + ve vec(z)
36
Low-dimensional vector space
m1 = “U “USA”
(country)
m2 m2 = “W “Washington D.C. C.” (city) z = capital_ci city_of “F “France” “P “Paris”
pos
- sitive
rel elation tr triple neg egative rel elation tr triple
(Bordes, NIPS’13), (Ren et al., WWW’17)
Error on a relation triple (z, m1, m2):
Reducing Error Propagation: A Joint Optimization Framework
37
(Ren et al., WWW’17)
Modeling entity-relation interactions Modeling types of relation mentions Modeling types of entity mentions
CoType: Comparing with State-of-the-Arts RE Systems
- Given candidate relation mentions, predict its relation type if it
expresses a relation of interest; otherwise, output “None”
38
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Precision Recall
DeepWalk DS+Logistic LINE MultiR CoType-RM CoType
- DS+Logistic (Stanford, ACL’09):
logistic classifier on DS
- MultiR (UW, ACL’11): handles
inappropriate labels in DS
- DeepWalk (StonyBrook, KDD’14):
homogeneous graph embedding
- LINE (MSR, WWW’15): joint
feature & type embedding
- Co
CoType-RM RM (W (WWW’17): only models relation mentions
- Co
CoType (W (WWW’17): models entity-relation interactions
NYT public dataset (Riedel et al. 2010, Hoffmann et al., 2011): 1.18M sentences in the corpus, 395 manually annotated sentences for evaluation, 24 relation types
An Ongoing Application to Life Sciences
39 (Pyysalo et al., BMC Bioinformatics’07) Performance evaluation on BioInfer: Relation Classification Accuracy = 61.7% (11%↑ over the best-performing baseline)
LifeNet: A Knowledge Exploration and Analytics System for Life Sciences
LifeNet by Effort-Light StructMine
Machine-created 4 Million+ PubMed papers 1,000+ entity types 400+ relation types <1 hour, single machine 10,000x more facts
BioInfer Network by human labeling
(Pyysalo et al., 2007)
Human-created 1,100 sentences 94 protein-protein interactions 2,500 man-hours 2,662 facts
(Ren et al., ACL’17 demo, under review)
Fi Fine-gr grained ed En Entity Ty Typing (K (KDD’16) En Entity Re Recognition an and Co Coarse-gr grained ed Ty Typing (K (KDD’15)
40
Jo Joint En Entity an and Re Relation Extract ction (W (WWW’17)
entity names & context units Partially- labeled corpus
Learning Corpus- specific Model
(KDD’15, KDD’16, EMNLP’16, WWW’17)
Structures from the remaining unlabeled data
Knowledge bases
Corpus to Structured Network: The Roadmap
Text corpus
Data-driven text segmentation
(SIGMOD’15, WWW’16)
References I
- Xi
Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Tarek F. Abdelzaher, Jiawei Han. CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases. WWW, 2017.
- Xi
Xiang Re Ren, Ahmed El-Kishky, Heng Ji, and Jiawei Han. Automatic Entity Recognition and Typing in Massive Text Data (Conference Tutorial). SIGMOD, 2016.
- Xi
Xiang Ren*, Wenqi He*, Meng Qu, Lifu Huang, Heng Ji, Jiawei Han. AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding. EMNLP, 2016.
- Xi
Xiang Ren*, Wenqi He*, Meng Qu, Heng Ji, Clare R. Voss, Jiawei Han. Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding. KDD, 2016.
- Xi
Xiang Ren, Wenqi He, Ahmed El-Kishky, Clare R. Voss, Heng Ji, Meng Qu, Jiawei Han. Entity Typing: A Critical Step for Mining Structures from Massive Unstructured Text (Invited Paper). MLG, 2016.
- Xi
Xiang Ren, A. El-Kishky, C. Wang, F. Tao, C. R. Voss, H. Ji, J. Han. ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering. KDD, 2015.
- Xi
Xiang Ren, Tao Cheng. Synonym Discovery for Structured Entities on Heterogeneous Graphs. WWW, 2015.
- Tarique A. Siddiqui*, Xi
Xiang Ren*, Aditya Parameswaran, Jiawei Han. FacetGist: Collective Extraction of Document Facets in Large Technical Corpora. CIKM, 2016.
- Jialu Liu, Jingbo Shang, Chi Wang, Xi
Xiang Ren, Jiawei Han. Mining Quality Phrases from Massive Text
- Corpora. SIGMOD, 2015.
41
References II
- Marina Danilevsky, Chi Wang, Nihit Desai, Xi
Xiang R Ren, Jingyi Guo, and Jiawei Han. Automatic Construction and Ranking of Topical Keyphrases on Collections of Short Documents. SDM, 2014
- Xi
Xiang R Ren, Yuanhua Lv, Kuansan Wang, Jiawei Han. Comparative Document Analysis for Large Text
- Corpora. WSDM, 2017.
- Jialu Liu, Xi
Xiang R Ren, Jingbo Shang, Taylor Cassidy, Clare R. Voss, Jiawei Han. Representing Documents via Latent Keyphrase Inference. WWW, 2016.
- Hyungsul Kim, Xi
Xiang R Ren, Yizhou Sun, Chi Wang, and Jiawei Han. Semantic Frame-Based Document Representation for Comparable Corpora. ICDM, 2013.
- Xi
Xiang R Ren, J. Liu, X. Yu, U. Khandelwal, Q. Gu, L. Wang, and J. Han. ClusCite: Effective Citation Recommendation by Information Network-Based Clustering. KDD, 2014.
- X. Yu, Xian
iang Re Ren, Y. Sun, B. Sturt, U. Khandelwal, Q. Gu, B. Norick, and J. Han. Personalized Entity Recommendation: A Heterogeneous Information Network Approach. WSDM 2014a.
- Xi
Xiang Re Ren, Yujing Wang, Xiao Yu, Jun Yan, Zheng Chen, Jiawei Han. Heterogeneous Graph-Based Intent Learning from Queries, Web Pages and Wikipedia Concepts. WSDM 2014b.
- X. Yu, Xian
iang Re Ren, Y. Sun, B. Sturt, U. Khandelwal, Q. Gu, B. Norick, and J. Han. HeteRec: Entity Recommendation in Heterogeneous Information Networks with Implicit User Feedback. RecSys, 2013..
- Xiao Yu, Xiang Ren, Quanquan Gu, Yizhou Sun and Jiawei Han. Collaborative Filtering with Entity
Similarity Regularization in Heterogeneous Information Networks. IJCAI-HINA, 2013.
42