6/27/13 ¡ 1 ¡
Ontology Generation for Large Email Collections
Grace Hui Yang and Jamie Callan Carnegie Mellon University
Ontology Generation for Large Email Collections Grace Hui Yang and - - PDF document
6/27/13 Ontology Generation for Large Email Collections Grace Hui Yang and Jamie Callan Carnegie Mellon University Introduction Subtasks in Ontology Learning Supervised Hierarchical Clustering Framework Experimental
Grace Hui Yang and Jamie Callan Carnegie Mellon University
Each sentence is parsed
An n-gram generator
Bigrams and trigrams
Longer Named Entities
Web-based POS error
detection
Assumption:
snippets, a valid concept appears more than a threshold (4 in our case) Remove POS errors
NN Remove Spelling errors
Species subsumes animal species subsumes bear species
Their hypernym in Wordnet is used to label the group
Different fragments are grouped
Problem
level are not grouped In any clustering
metric to measure distance for those top level nodes
j and x(i) k is y(i) jk∈{0,1},
jk = 0, if x(i) j and x(i) k in the same group;
jk = 1, otherwise.
j, x(i) k)represents a set of pairwise underlying
l, x(i+1) m)