automatic construction of distributional thesaurus for
play

Automatic construction of distributional thesaurus (for multiple - PowerPoint PPT Presentation

Automatic construction of distributional thesaurus (for multiple languages) Zheng ZHANG 1 st year PhD student ILES, LIMSI 28/03/2017 1 What is a distributional thesaurus? For a given input, a distributional thesaurus identifies


  1. Automatic construction of distributional thesaurus (for multiple languages) Zheng ZHANG 1 st year PhD student ILES, LIMSI 28/03/2017 1

  2. What is a distributional thesaurus? • For a given input, a distributional thesaurus identifies semantically similar words based on the assumption that they share a similar distribution . • Distributional assumption: In practice, two words are considered similar if their occurrences share similar contexts . Ref. Vincent Claveau, Ewa Kijak. Distributional Thesauri for Information Retrieval and vice versa. 28/03/2017 2

  3. Why do we need it? • It is useful for alleviating data sparseness in many NLP applications. • It is useful for completing lexical resources. Ref. Enrique Henestroza Anguiano, Pascal Denis. FreDist: Automatic construction of distributional thesauri for French. 28/03/2017 3

  4. Contexts • These contexts are typically co-occurring words in a limited window around the considered words, or syntactically linked words. Ref. http://nlp.stanford.edu:8080/corenlp/process 28/03/2017 4

  5. Contexts • These contexts are typically co-occurring words in a limited window around the considered words, or syntactically linked words. Ref. http://nlp.stanford.edu:8080/corenlp/process 28/03/2017 4

  6. A new context: Graph-of-words • A graph whose vertices represent unique terms of the document and whose edges represent co-occurrences between the terms within a fixed-size sliding window. • “This is an example about how to generate a graph. ” (window size=4) Ref. Rousseau F., Vazirgiannis M. (2015) Main Core Retention on Graph-of-Words for Single-Document Keyword Extraction. https://safetyapp.shinyapps.io/GoWvis/ 28/03/2017 5

  7. Graph attributes: K-core • A subgraph H k = (Vʹ,Eʹ), induced by the subset of vertices Vʹ ⊆ V (and a fortiori by the subset of edges Eʹ ⊆ E), is called a k-core or a core of order k iff ∀ v ∈ Vʹ, degH k (v) ≥ k and H k is the maximal subgraph with this property, i.e. it cannot be augmented without losing this property. Ref. Rousseau F., Vazirgiannis M. (2015) Main Core Retention on Graph-of-Words for Single-Document Keyword Extraction. Text Mining – an introduction, Michalis Vazirgiannis, 2017 Data Science Winter School, Beijing, China 28/03/2017 6

  8. Graph attributes: K-core • In other words, the k-core of a graph corresponds to the maximal • A subgraph H k = (Vʹ,Eʹ), induced by the subset of vertices Vʹ ⊆ V (and connected subgraph whose vertices are at least of degree k within the a fortiori by the subset of edges Eʹ ⊆ E), is called a k-core or a core of order k iff ∀ v ∈ Vʹ, degH k (v) ≥ k and H k is the maximal subgraph subgraph. with this property, i.e. it cannot be augmented without losing this property. Ref. Rousseau F., Vazirgiannis M. (2015) Main Core Retention on Graph-of-Words for Single-Document Keyword Extraction. Text Mining – an introduction, Michalis Vazirgiannis, 2017 Data Science Winter School, Beijing, China 28/03/2017 6

  9. Why graph-of-words may be a good choice? • Graph-of-words: • Taking into account word co-occurrence and word order (optional) . (compared with bag-of-words) • K-core: • In one core, all neighborhoods contribute equally to the subgraph. (compared with centrality which is used in PageRank & HITS) • K-cores are adaptive. • It has been proved that main core has a good performance in information retrieval. Ref. Rousseau F., Vazirgiannis M. (2015) Main Core Retention on Graph-of-Words for Single-Document Keyword Extraction. 28/03/2017 7

  10. Difficulty: optimization for Big data • Texts: Multiprocessing • Encoding text by local ids • Merge local id-word dictionaries to get an universal id-word dictionary • Transfer local encoded text • “MapReduce like” Multiprocessing to prepare edges files • “This is an example about how to generate a graph. ” (window size=2) • Edges of window size n = edges of distance 2 + … + edges of distance n 28/03/2017 8

  11. Difficulty: optimization for Big data • Texts: Multiprocessing • Encoding text by local ids • Merge local id-word dictionaries to get an universal id-word dictionary • Transfer local encoded text • “MapReduce like” Multiprocessing to prepare edges files • “This is an example about how to generate a graph. ” (window size=2) 3 • Edges of window size n = edges of distance 2 + … + edges of distance n 28/03/2017 8

  12. Difficulty: optimization for Big data • Texts: Multiprocessing • Encoding text by local ids • Merge local id-word dictionaries to get an universal id-word dictionary • Transfer local encoded text • “MapReduce like” Multiprocessing to prepare edges files • “This is an example about how to generate a graph. ” (window size=2) 3 4 • Edges of window size n = edges of distance 2 + … + edges of distance n 28/03/2017 8

  13. Multiple languages (ideas) • Using a small dictionary to generate a mixed text • Find common graph patterns for multiple languages Ref. Stephan Gouws, Anders Søgaard, Simple task-specific bilingual word embeddings 28/03/2017 9

  14. Future work • word2vec: GoW model architecture • Using graph-of-words for other task. (e.g. identifying parallel sentences in comparable corpora, BUCC2017 shared task) • From distributional thesaurus to semantic classes 28/03/2017 10

  15. Me Merci

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend