Exploiting Internal and External Semantics Xia Hu for the - PowerPoint PPT Presentation

Improve the Clustering of Short Texts Exploiting Internal and External Semantics Xia Hu for the Clustering of Short Texts Using outline World Knowledge Introduction Proposed Framework Evaluation Xia Hu, 1 , 2 Nan Sun, 1 Chao Zhang, 1 Tat-Seng Chua 1 Conclusion and Future Work 1 School of Computing National University of Singapore 2 School of Computer Science and Engineering BeiHang University November 2, 2009

outline Improve the Clustering of Short Texts Xia Hu 1 Introduction outline Introduction Proposed 2 Proposed Framework Framework Evaluation Conclusion and Future 3 Evaluation Work 4 Conclusion and Future Work

Aggregated Search Improve the The form of browsing search results. Clustering of Short Texts Xia Hu outline Introduction Proposed Framework Evaluation Conclusion and Future Work

Short Texts Improve the Clustering of Short Texts Xia Hu outline Short texts, such as the snippets, product descriptions, Introduction QA passages and image captions etc., have played im- Proposed portant roles in current Web and IR applications. Framework Evaluation Unlike standard texts with lots of words in length, short Conclusion texts, which only consist of a few phrases or 2–3 sen- and Future Work tences, especially present great challenges in clustering. Problems : “data sparseness” & “semantic gap”.

Related Work Improve the Clustering of Short Texts Xia Hu Many methods have been proposed to improve the rep- outline resentation of standard text for clustering and clas- Introduction sification, including “surface representation”[3,19] and Proposed Framework “integrating world knowledge”[14]. Evaluation Several clustering techniques were employed to place Conclusion the search engine snippets to their highly relevant topic- and Future Work coherent groups[5,29]. World knowledge bases have been found useful in im- proving the short text representation[1,23].

The General Framework Improve the ,�&��'�� (��'�� 2��'�� Clustering of ,��!!�� Short Texts �� Xia Hu �� … �� ,�� !�� … � (��& 3�&�� "�� 2�� !�� outline #��$�� "��#�� $��%��&�� !�� %��&�� '�� "�� Introduction '�� #�� $�� (��)� %��&�� … � … � Proposed ��,��0��&��*��1�,��(�� Framework Evaluation *�� ./�� ,�� +�� *�� (�� !�� Conclusion *�� "�� -�� #�� and Future +�� 2��!�� &�� Work �� (��)� *��)� ./��,��./��*�� *�� 0��&�� *�� #��$��%��&�� *�� *�� *��&1� ,�� ./�� *�� *��)� Fig: Framework for feature constructor

Hierarchical Resolution Improve the “Jul 18, 2008 ... It is the best American film of the year so far Clustering of Short Texts and likely to remain that way. Christopher Nolan’s The Dark Xia Hu Knight is revelatory, visceral ...” outline Text Introduction Proposed S S S Framework Evaluation July 18, 2008 . . . . . . NP VP Conclusion and Future Work VBZ NP NP NP NNP NNP POS DT NNP NNP is NN JJ Christopher Nolan ’s The Dark Knight revelatory visceral Fig: Syntax tree of the snippet

Original Feature Extraction Improve the Clustering of Short Texts Xia Hu Segment-level features. outline Introduction Phrase-level features. Proposed Sentence1 : [NP July 18 2008] Framework Sentence2 : [NP It] [VP is] [NP the best American film] Evaluation [PP of] [NP the year] [ADVP so far] and/CC [ADJP Conclusion likely] [VP to remain] [NP that way] and Future Work Sentence3 : [NP Christopher Nolan ’s] [NP The Dark Knight] [VP is] [NP revelatory visceral] Word-level features.

Feature Generation Improve the Clustering of Short Texts Xia Hu outline Two steps: Introduction Proposed the construction of basic features Framework seed phrases from internal semantics. Evaluation Conclusion the generation of external features. and Future Work external features from world knowledge bases.

Seed phrases selection (I) Improve the Clustering of Short Texts Xia Hu outline There are redundancies between phrase level features Introduction and segment level features. Proposed Framework We propose to measure the semantic similarity between Evaluation the two kinds of feature to eliminate information redun- Conclusion dancy. and Future Work For Wikipedia we download the XML corpus, remove xml tags and create a Solr index of all XML articles.

Seed phrases selection (II) Improve the Clustering of Short Texts Xia Hu outline Introduction Let P denotes a segment level feature, P = { p 1 , p 2 , . . . , p n } . Proposed We calculate the semantic similarity between p i and Framework Evaluation { p 1 , p 2 , . . . , p n } as InfoScore ( p i ). Conclusion The p ∗ which has the largest similarity with other fea- and Future Work tures in P will be removed as the redundant feature.

Seed phrases selection (III) Improve the Clustering of Given two phrases p i and p j , the variants of three pop- Short Texts ular co-occurrence measures[6] are defined as below: Xia Hu outline W ikiDice ( p i , p j ) Introduction Proposed  0 if f ( p i | p j ) = 0 Framework     or f ( p j | p i ) = 0   Evaluation = , (1) Conclusion    and Future f ( p i | p j )+ f ( p j | p i )   otherwise  Work f ( p i )+ f ( p j ) where WikiDice is a variant of the Dice coefficient. W ikiJaccard ( p i , p j ) min ( f ( p i | p j ) , f ( p j | p i )) = (2) f ( p i ) + f ( p j ) − max ( f ( p i | p j ) , f ( p j | p i )) , where WikiJaccard is a variant of the Jaccard coefficient.

Seed phrases selection (IV) Improve the Clustering of Short Texts Xia Hu W ikiOverlap ( p i , p j ) = min ( f ( p i | p j ) , f ( p j | p i )) (3) , min ( f ( p i ) , f ( p j )) outline where WikiOverlap is a variant of the Overlap(Simpson) coefficient. Introduction Linear normalization formula is defined below: Proposed Framework W ikiDice ij − min ( W ikiDice k ) Evaluation W D ij = (4) max ( W ikiDice k ) − min ( W ikiDice k ) , Conclusion and Future Work A linear combination is then used to incorporate the three similarity measures into an overall semantic similarity between two phrases p i and p j , as follows: W ikiSem ( p i , p j ) = (1 − α − β ) W D ij + αW J ij + βW O ij , (5) where α and β weight the importance of the three similarity measures.

Seed phrases selection (V) Improve the Clustering of Short Texts Xia Hu For each segment level feature, we rank the information score defined in Equation 5 for its child node features at phrase outline level . Introduction n � Proposed InfoScore ( p i ) = W ikiSem ( p i , p j ) . (6) Framework j =1 ,j � = i Evaluation Finally, we remove the phrase level feature p ∗ , which dele- Conclusion gates the most information duplicate to the segment level and Future Work feature P , and it is defined as: p ∗ = arg p i ∈{ p 1 ,p 2 ,...,p n } InfoScore ( p i ) . max (7)

Background Knowledge Bases Improve the Clustering of Short Texts Xia Hu outline Wikipedia, as background knowledge, has a wider knowl- Introduction edge coverage than WordNet and is regularly updated Proposed Framework to reflect recent events. Evaluation On the other hand, as the construction of WordNet Conclusion and Future follows theoretical model or corpus evidence, it contains Work rich lexical semantic knowledge.

Exploiting Internal and External Semantics Xia Hu for the - PowerPoint PPT Presentation

Improve the Clustering of Short Texts Exploiting Internal and External Semantics Xia Hu for the Clustering of Short Texts Using outline World Knowledge Introduction Proposed Framework Evaluation Xia Hu, 1 , 2 Nan Sun, 1 Chao Zhang, 1

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Exposure Routes Internal and External Exposure External exposure Internal exposure Body surface

Audit and Compliance Committee Internal Audit Proposed Internal Audit Charter March 15, 2019

Internal Audit Professionals & Internal Audit Students Pizza Mingle! Managing the Internal

External buffer Raslan Darawsheh Mellanox External buffer First was introduced by Olivier

Polyteam Semantics Team Semantics Axiomatizations in team semantics Polyteams and Jonni

Semantics in Practice Semantics of Practice How do we write semantics? 1: pen-and-paper How do

Introductory Notes Jigsaw Semantics or: Dynamic Semantics Put Together Again Formal semantics

Polyteam Semantics Team Semantics Axiomatisations in team semantics Polyteams and

Meritor Air Disc Brake Evolution ADB-1560 D-LISA EX + External

Internal Controls A short presentation from Your Internal Audit Department The Old Internal

Internal Service Providers Course Outline 1) Internal Service Request Process Overview Internal

An Overview of Internal Audit An Overview of Internal Audit Jim Farquhar Chief Internal

Data Monitoring Committee Training Lecture Two: DMC Examples 1.1 DMC Examples 1.2 Overview of

Bioimaging2 November 7, 2018 1 Lecture 22: Bioimaging II CBIO (CSCI) 4835/6835: Introduction to

Support vector machines and applications in computational biology Jean-Philippe Vert

Machine learning for computational biology Jean-Philippe Vert Jean-Philippe.Vert@mines.org

Toeplitz and Asymptotic Toeplitz operators on H 2 ( D n ) Amit Maji (Joint work with Jaydeb

What is a TPN? To ensure safe, optimal execution, the control sequencer: Dynamically selects

Structural Translation From Time Petri Nets to Timed Automata Franck Cassez and Olivier H. Roux

Comparison of the Expressiveness of Timed Automata and Time Petri Nets B. Brard 1 , F. Cassez 2

Exploiting Internal and External Semantics Xia Hu for the - PowerPoint PPT Presentation

Improve the Clustering of Short Texts Exploiting Internal and External Semantics Xia Hu for the Clustering of Short Texts Using outline World Knowledge Introduction Proposed Framework Evaluation Xia Hu, 1 , 2 Nan Sun, 1 Chao Zhang, 1

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Exposure Routes Internal and External Exposure External exposure Internal exposure Body surface

Audit and Compliance Committee Internal Audit Proposed Internal Audit Charter March 15, 2019

Internal Audit Professionals &amp; Internal Audit Students Pizza Mingle! Managing the Internal

External buffer Raslan Darawsheh Mellanox External buffer First was introduced by Olivier

Polyteam Semantics Team Semantics Axiomatizations in team semantics Polyteams and Jonni

Semantics in Practice Semantics of Practice How do we write semantics? 1: pen-and-paper How do

Introductory Notes Jigsaw Semantics or: Dynamic Semantics Put Together Again Formal semantics

Polyteam Semantics Team Semantics Axiomatisations in team semantics Polyteams and

Meritor Air Disc Brake Evolution ADB-1560 D-LISA EX + External

Internal Controls A short presentation from Your Internal Audit Department The Old Internal

Internal Service Providers Course Outline 1) Internal Service Request Process Overview Internal

An Overview of Internal Audit An Overview of Internal Audit Jim Farquhar Chief Internal

Data Monitoring Committee Training Lecture Two: DMC Examples 1.1 DMC Examples 1.2 Overview of

Bioimaging2 November 7, 2018 1 Lecture 22: Bioimaging II CBIO (CSCI) 4835/6835: Introduction to

Support vector machines and applications in computational biology Jean-Philippe Vert

Machine learning for computational biology Jean-Philippe Vert Jean-Philippe.Vert@mines.org

Toeplitz and Asymptotic Toeplitz operators on H 2 ( D n ) Amit Maji (Joint work with Jaydeb

What is a TPN? To ensure safe, optimal execution, the control sequencer: Dynamically selects

Structural Translation From Time Petri Nets to Timed Automata Franck Cassez and Olivier H. Roux

Comparison of the Expressiveness of Timed Automata and Time Petri Nets B. Brard 1 , F. Cassez 2

Internal Audit Professionals & Internal Audit Students Pizza Mingle! Managing the Internal