induction and embedding of linguistic structures from
play

Induction and embedding of linguistic structures from text Overview - PowerPoint PPT Presentation

Alexander Panchenko Induction and embedding of linguistic structures from text Overview November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 2/80 Making induced senses interpretable [Panchenko et al.,


  1. Inducing word sense representations Sense embeddings using retrofjtting Word Sense Disambiguation 1 Context extraction : use context words around the target word 2 Context fjltering : based on context word's relevance for disambiguation 3 Sense choice in context : maximise similarity between a context vector and a sense vector November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 21/80

  2. Inducing word sense representations Sense embeddings using retrofjtting November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 22/80

  3. Inducing word sense representations Sense embeddings using retrofjtting November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 23/80

  4. Inducing word sense representations Sense embeddings using retrofjtting November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 24/80

  5. Inducing word sense representations Sense embeddings using retrofjtting November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 25/80

  6. Semantic Similarity task [Remus & Biemann, 2018]: SimLex, WordSim353, MEN and other datasets; Improves the results compared to the original word embeddigns … across difgerent models (GloVe, word2vec, …). Inducing word sense representations Evaluation: Key results Word Sense Induction task [Pelevina et al., 2016]: SemEval SemEval'13 dataset; Performs comparably to SOTA (by 2016) … including neural models. November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 26/80

  7. Inducing word sense representations Evaluation: Key results Word Sense Induction task [Pelevina et al., 2016]: SemEval SemEval'13 dataset; Performs comparably to SOTA (by 2016) … including neural models. Semantic Similarity task [Remus & Biemann, 2018]: SimLex, WordSim353, MEN and other datasets; Improves the results compared to the original word embeddigns … across difgerent models (GloVe, word2vec, …). November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 26/80

  8. Word sense induction November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 27/80

  9. Contexts where the word occurs, e.g.: ``river bank is a slope beside a body of water'' `` bank is a fjnancial institution that accepts deposits'' ``Oh, the bank was robbed. They took about a million dollars.'' `` bank of Elbe is a good and popular hangout spot complete with good food and fun'' You need to group the contexts by senses : ``river bank is a slope beside a body of water'' `` bank of Elbe is a good and popular hangout spot complete with good food and fun'' `` bank is a fjnancial institution that accepts deposits'' ``Oh, the bank was robbed. They took about a million dollars.'' Word sense induction A lexical sample WSI task Target word , e.g. ``bank''. November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 28/80

  10. You need to group the contexts by senses : ``river bank is a slope beside a body of water'' `` bank of Elbe is a good and popular hangout spot complete with good food and fun'' `` bank is a fjnancial institution that accepts deposits'' ``Oh, the bank was robbed. They took about a million dollars.'' Word sense induction A lexical sample WSI task Target word , e.g. ``bank''. Contexts where the word occurs, e.g.: ``river bank is a slope beside a body of water'' `` bank is a fjnancial institution that accepts deposits'' ``Oh, the bank was robbed. They took about a million dollars.'' `` bank of Elbe is a good and popular hangout spot complete with good food and fun'' November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 28/80

  11. Word sense induction A lexical sample WSI task Target word , e.g. ``bank''. Contexts where the word occurs, e.g.: ``river bank is a slope beside a body of water'' `` bank is a fjnancial institution that accepts deposits'' ``Oh, the bank was robbed. They took about a million dollars.'' `` bank of Elbe is a good and popular hangout spot complete with good food and fun'' You need to group the contexts by senses : ``river bank is a slope beside a body of water'' `` bank of Elbe is a good and popular hangout spot complete with good food and fun'' `` bank is a fjnancial institution that accepts deposits'' ``Oh, the bank was robbed. They took about a million dollars.'' November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 28/80

  12. Representation Sparse vector model (TF-IDF, etc.) Weighted (TF-IDF, , etc.) sum of word embeddings Sentence embeddings (InterSent, Skip-Thougts, doc2vec, etc.) Clustering Affjnity Propagation Agglomerative Clustering -means Word sense induction Sense induction using clustering Representation of Clustering of the Context clusters each context in a contexts in the vector Text contexts corresponding to vector space space of a word senses November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 29/80

  13. Clustering Affjnity Propagation Agglomerative Clustering -means Word sense induction Sense induction using clustering Representation of Clustering of the Context clusters each context in a contexts in the vector Text contexts corresponding to vector space space of a word senses Representation Sparse vector model (TF-IDF, etc.) Weighted (TF-IDF, χ 2 , etc.) sum of word embeddings Sentence embeddings (InterSent, Skip-Thougts, doc2vec, etc.) November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 29/80

  14. Word sense induction Sense induction using clustering Representation of Clustering of the Context clusters each context in a contexts in the vector Text contexts corresponding to vector space space of a word senses Representation Sparse vector model (TF-IDF, etc.) Weighted (TF-IDF, χ 2 , etc.) sum of word embeddings Sentence embeddings (InterSent, Skip-Thougts, doc2vec, etc.) Clustering Affjnity Propagation Agglomerative Clustering k -means November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 29/80

  15. Word sense induction Sense induction using neighbors 1 Get the neighbors of a target word, e.g. ``bank'' : lender 1 2 river 3 citybank 4 slope … 5 2 Get similar to ``bank'' and dissimilar to ``lender'' : river 1 2 slope 3 land 4 … 3 Compute distances to ``lender'' and ``river'' . November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 30/80

  16. 2 Build an ego network of the word : are computed based on word similarities; 1 are pruned based on the anti-edge constraints: . 2 3 Cluster the ego network of the word . 4 Find cluster labels by fjnding the central nodes in a cluster. Word sense induction Graph-vector sense induction 1 For i -th neighbor of the target word w among k neigbours: Get a pair of opposite words for the i neighbor: ( w j , w k ) 1 2 Add them as as nodes: V = V ∪ { w j , w k } 3 Remember the pair as an anti-edge: A = A ∪ ( w j , w k ) November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 31/80

  17. 3 Cluster the ego network of the word . 4 Find cluster labels by fjnding the central nodes in a cluster. Word sense induction Graph-vector sense induction 1 For i -th neighbor of the target word w among k neigbours: Get a pair of opposite words for the i neighbor: ( w j , w k ) 1 2 Add them as as nodes: V = V ∪ { w j , w k } 3 Remember the pair as an anti-edge: A = A ∪ ( w j , w k ) 2 Build an ego network G = ( V, E ) of the word w : E are computed based on word similarities; 1 2 E are pruned based on the anti-edge constraints: E = E ∖ A . November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 31/80

  18. 4 Find cluster labels by fjnding the central nodes in a cluster. Word sense induction Graph-vector sense induction 1 For i -th neighbor of the target word w among k neigbours: Get a pair of opposite words for the i neighbor: ( w j , w k ) 1 2 Add them as as nodes: V = V ∪ { w j , w k } 3 Remember the pair as an anti-edge: A = A ∪ ( w j , w k ) 2 Build an ego network G = ( V, E ) of the word w : E are computed based on word similarities; 1 2 E are pruned based on the anti-edge constraints: E = E ∖ A . 3 Cluster the ego network of the word w . November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 31/80

  19. Word sense induction Graph-vector sense induction 1 For i -th neighbor of the target word w among k neigbours: Get a pair of opposite words for the i neighbor: ( w j , w k ) 1 2 Add them as as nodes: V = V ∪ { w j , w k } 3 Remember the pair as an anti-edge: A = A ∪ ( w j , w k ) 2 Build an ego network G = ( V, E ) of the word w : E are computed based on word similarities; 1 2 E are pruned based on the anti-edge constraints: E = E ∖ A . 3 Cluster the ego network of the word w . 4 Find cluster labels by fjnding the central nodes in a cluster. November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 31/80

  20. Word sense induction Graph-vector sense induction Get the neighbors of a target word, e.g. ``java'' : Python 1 2 Borneo 3 C++ 4 Sumatra Arabica 5 6 Robusta Ruby 7 8 JavaScript 9 Bali 10 … November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 32/80

  21. Word sense induction Graph-vector sense induction Get the neighbors of a target word, e.g. ``java'' : Python ̸ = Borneo 1 2 Borneo ̸ = Scala 3 C++ ̸ = Borneo 4 Sumatra ̸ = highway Arabica ̸ = Python 5 6 Robusta ̸ = Python Ruby ̸ = Arabica 7 8 Bali ̸ = North November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 33/80

  22. Word sense induction Graph-vector sense induction Nodes : Python 1 2 Borneo 3 C++ 4 Arabica Robusta 5 6 Ruby November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 34/80

  23. Word sense induction Sense induction example November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 35/80

  24. Word sense induction Sense induction example November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 36/80

  25. Word sense induction Sense induction example November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 37/80

  26. Word sense induction Sense induction example November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 38/80

  27. Trump leads the world, backward. Disrespecting international laws leads to many complications. Rosenzweig heads the climate impacts section at NASA's Goddard Institute. Word sense induction Datasets 1 SemEval 2007 2 SemEval 2010 3 RUSSE 2018 4 SemEval 2019 Task 2 Subtask 1 : Clustering of verb occurrences Assign occurrences of the target verbs to a number of clusters, in such a way that verbs belonging to the same cluster evoke the same frame type. gold annotations for this subtask are based on FrameNet November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 39/80

  28. Word sense induction Datasets 1 SemEval 2007 2 SemEval 2010 3 RUSSE 2018 4 SemEval 2019 Task 2 Subtask 1 : Clustering of verb occurrences Assign occurrences of the target verbs to a number of clusters, in such a way that verbs belonging to the same cluster evoke the same frame type. gold annotations for this subtask are based on FrameNet Trump leads the world, backward. Disrespecting international laws leads to many complications. Rosenzweig heads the climate impacts section at NASA's Goddard Institute. November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 39/80

  29. Word sense induction Datasets 1 SemEval 2007 2 SemEval 2010 3 RUSSE 2018 4 SemEval 2019 Task 2 Subtask 1 : Clustering of verb occurrences Assign occurrences of the target verbs to a number of clusters, in such a way that verbs belonging to the same cluster evoke the same frame type. gold annotations for this subtask are based on FrameNet Trump leads the world, backward. Disrespecting international laws leads to many complications. Rosenzweig heads the climate impacts section at NASA's Goddard Institute. November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 40/80

  30. Word sense induction Semantic roles Semantic frame ``Abandonment'' from FrameNet November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 41/80

  31. Word sense induction Semantic classes A semantic class contains words that share a semantic feature . Examples of concrete semantic classes : people plants animals materials programming languages Examples of abstract semantic classes : qualities actions processes November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 42/80

  32. Word sense induction Sample of induced sense inventory Word Sense Local Sense Cluster: Related Senses Hypernyms mango#0 peach#1, grape#0, plum#0, apple#0, apricot#0, fruit#0, food#0, … watermelon#1, banana#1, coconut#0, pear#0, fjg#0, melon#0, mangosteen#0 , … apple#0 mango#0, pineapple#0, banana#1, melon#0, fruit#0, crop#0, … grape#0, peach#1, watermelon#1, apricot#0, cranberry#0, pumpkin#0, mangosteen#0 , … Java#1 C#4, Python#3, Apache#3, Ruby#6, Flash#1, programming C++#0, SQL#0, ASP#2, Visual Basic#1, CSS#0, language#3, lan- Delphi#2, MySQL#0, Excel#0, Pascal#0, … guage#0, … Python#3 PHP#0, Pascal#0, Java#1, SQL#0, Visual Ba- language#0, tech- sic#1, C++#0, JavaScript#0, Apache#3, Haskell#5, nology#0, … .NET#1, C#4, SQL Server#0, … November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 43/80

  33. Word sense induction Sample of induced semantic classes ID Global Sense Cluster: Semantic Class Hypernyms 1 peach#1, banana#1, pineapple#0, berry#0, black- vegetable#0, fruit#0, berry#0, grapefruit#0, strawberry#0, blueberry#0, crop#0, ingredi- mango#0, grape#0, melon#0, orange#0, pear#0, ent#0, food#0, · plum#0, raspberry#0, watermelon#0, apple#0, apri- cot#0, watermelon#0, pumpkin#0, berry#0, man- gosteen#0 , … 2 C#4, Basic#2, Haskell#5, Flash#1, Java#1, Pas- programming lan- cal#0, Ruby#6, PHP#0, Ada#1, Oracle#3, Python#3, guage#3, technol- Apache#3, Visual Basic#1, ASP#2, Delphi#2, SQL ogy#0, language#0, Server#0, CSS#0, AJAX#0, JavaScript#0, SQL format#2, app#0 Server#0, Apache#3, Delphi#2, Haskell#5, .NET#1, CSS#0, … November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 44/80

  34. Word sense induction Induction of semantic classes Induction of Semantic Classes Induced Word Senses Sense Ego-Networks Global Sense Graph Word Sense Induction Representing Senses Sense Graph Clustering of with Ego Networks from Text Corpus Construction Word Senes Global Sense Clusters s Noisy Hypernyms Labeling Sense Clusters with Hypernyms Semantic Classes Text Corpus Cleansed Hypernyms November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 45/80

  35. Word sense induction Induction of sense semantic classes Filtering noisy hypernyms with semantic classes LREC'18 [Panchenko et al., 2018]: Hypernyms, city#2 fruit#1 food#0 Added Removed Missing Wrong pear#0 mangosteen#0 apple#2 mango#0 Sense Cluster, November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 46/80

  36. http://panchenko.me/data/joint/nodes20000-layers7 Word sense induction Global sense clustering November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 47/80

  37. Word sense induction Global sense clustering November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 48/80

  38. Word sense induction Induction of sense semantic classes Filtering of a noisy hypernymy database with semantic classes. LREC'18 [Panchenko et al., 2018] Precision Recall F-score Original Hypernyms (Seitner et al., 2016) 0 . 475 0 . 546 0 . 508 Semantic Classes (coarse-grained) 0 . 541 0 . 679 0 . 602 November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 49/80

  39. Making induced senses interpretable November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 50/80

  40. Making induced senses interpretable Making induced senses interpretable Knowledge-based sense representations are interpretable November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 51/80

  41. Making induced senses interpretable Making induced senses interpretable Most knowledge-free sense representations are uninterpretable November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 52/80

  42. Making induced senses interpretable Making induced senses interpretable November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 53/80

  43. Making induced senses interpretable Making induced senses interpretable Hypernymy prediction in context. EMNLP'17 [Panchenko et al., 2017b] November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 54/80

  44. Induction of semantic frames November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 55/80

  45. Induction of semantic frames FrameNet: frame ''Kidnapping'' November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 56/80

  46. Induction of semantic frames Frame induction as a triclustering ACL'2018 [Ustalov et al., 2018a] Example of a LU tricluster corresponding to the ''Kidnapping'' frame from FrameNet. FrameNet Role Lexical Units (LU) Perpetrator Subject kidnapper, alien, militant FEE Verb snatch, kidnap, abduct Victim Object son, people, soldier, child November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 57/80

  47. Induction of semantic frames SVO triple elements November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 58/80

  48. Induction of semantic frames An SVO triple graph Mayor|lead|city Mayor|lead|city Governor|lead|state Governor|lead|state General|command|Department General|command|Department mayor|lead|city mayor|lead|city president|lead|state president|lead|state President|lead|party President|lead|party President|chair|committee President|chair|committee Chief|lead|department Chief|lead|department General|command|department General|command|department President|lead|company President|lead|company president|lead|government president|lead|government chairman|lead|company chairman|lead|company General|head|Department General|head|Department chief|lead|department chief|lead|department President|chair|Committee President|chair|Committee president|lead|department president|lead|department Chairman|lead|company Chairman|lead|company minister|lead|team minister|lead|team Director|lead|agency Director|lead|agency of fi cer|head|department of fi cer|head|department director|lead|department director|lead|department King|run|company King|run|company General|head|department General|head|department Chairman|lead|Committee Chairman|lead|Committee Director|lead|company Director|lead|company Minister|head|government Minister|head|government Director|head|Department Director|head|Department Director|lead|Department Director|lead|Department minister|head|department minister|head|department Director|lead|department Director|lead|department chairman|lead|committee chairman|lead|committee of fi cer|lead|company of fi cer|lead|company president|chair|committee president|chair|committee leader|head|department leader|head|department director|lead|company director|lead|company president|head|government president|head|government director|head|department director|head|department president|chair|Committee president|chair|Committee leader|head|government leader|head|government leader|head|party leader|head|party boss|lead|company boss|lead|company director|chair|committee director|chair|committee Chairman|chair|Committee Chairman|chair|Committee of fi cer|head|team of fi cer|head|team Chairman|chair|committee Chairman|chair|committee Director|chair|committee Director|chair|committee Minister|chair|committee Minister|chair|committee leader|head|agency leader|head|agency Director|chair|Committee Director|chair|Committee Chairman|run|committee Chairman|run|committee leader|head|team leader|head|team President|head|team President|head|team chairman|head|committee chairman|head|committee director|head|agency director|head|agency minister|head|committee minister|head|committee leader|head|committee leader|head|committee president|head|team president|head|team chairman|run|committee chairman|run|committee director|head|team director|head|team representative|chair|committee representative|chair|committee president|head|committee president|head|committee of fi cer|chair|committee of fi cer|chair|committee director|head|committee director|head|committee Director|head|team Director|head|team Of fi cer|chair|Committee Of fi cer|chair|Committee representative|head|committee representative|head|committee November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 59/80

  49. Output: a set of triframes . for all Cluster do return NN Induction of semantic frames Triframes frame induction Input: an embedding model v ∈ V → ⃗ v ∈ R d , a set of SVO triples T ⊆ V 3 , the number of nearest neighbors k ∈ N , a graph clustering algorithm Cluster . November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 60/80

  50. for all Cluster do return NN Induction of semantic frames Triframes frame induction Input: an embedding model v ∈ V → ⃗ v ∈ R d , a set of SVO triples T ⊆ V 3 , the number of nearest neighbors k ∈ N , a graph clustering algorithm Cluster . Output: a set of triframes F . November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 60/80

  51. Induction of semantic frames Triframes frame induction Input: an embedding model v ∈ V → ⃗ v ∈ R d , a set of SVO triples T ⊆ V 3 , the number of nearest neighbors k ∈ N , a graph clustering algorithm Cluster . Output: a set of triframes F . t ∈ R 3 d : t ∈ T } S ← { t → ⃗ E ← { ( t, t ′ ) ∈ T 2 : t ′ ∈ NN S k ( ⃗ t ) , t ̸ = t ′ } F ← ∅ for all C ∈ Cluster ( T, E ) do f s ← { s ∈ V : ( s, v, o ) ∈ C } f v ← { v ∈ V : ( s, v, o ) ∈ C } f o ← { o ∈ V : ( s, v, o ) ∈ C } F ← F ∪ { ( f s , f v , f o ) } return F November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 60/80

  52. Induction of semantic frames Example of an extracted frame Frame # 848 Subjects: Company, fjrm, company Verbs: buy, supply, discharge, purchase, expect Objects: book, supply, house, land, share, company, grain, which, item, product, ticket, work, this, equipment, House, it, fjlm, water, something, she, what, service, plant, time November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 61/80

  53. Induction of semantic frames Example of an extracted frame Frame # 849 Subjects: student, scientist, we, pupil, member, company, man, nobody, you, they, US, group, it, people, Man, user, he Verbs: do, test, perform, execute, conduct Objects: experiment, test November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 62/80

  54. Induction of semantic frames Example of an extracted frame Frame # 3207 Subjects: people, we, they, you Verbs: feel, seek, look, search Objects: housing, inspiration, gold, witness, part- ner, accommodation, Partner November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 63/80

  55. Induction of semantic frames Evaluation datasets Dataset # instances # unique # clusters FrameNet Triples 99,744 94,170 383 Poly. Verb Classes 246 110 62 November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 64/80

  56. Quality Measures: nmPU : normalized modifjed purity, niPU : normalized inverse purity. Induction of semantic frames Evaluation settings Dataset # instances # unique # clusters FrameNet Triples 99,744 94,170 383 Poly. Verb Classes 246 110 62 November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 65/80

  57. Induction of semantic frames Evaluation settings Dataset # instances # unique # clusters FrameNet Triples 99,744 94,170 383 Poly. Verb Classes 246 110 62 Quality Measures: nmPU : normalized modifjed purity, niPU : normalized inverse purity. November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 65/80

  58. Induction of semantic frames Results: comparison to state-of-art F 1 -scores for verbs, subjects, objects, frames November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 66/80

  59. Graph embeddings November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 67/80

  60. Image source: https://www.tensorflow.org/tutorials/word2vec Graph embeddings Text: sparse symbolic representation November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 68/80

  61. https://www.tensorflow.org/tutorials/word2vec Graph embeddings Text: sparse symbolic representation Image source: November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 68/80

  62. Graph embeddings Graph: sparse symbolic representation November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 69/80

  63. Graph embeddings Embedding graph into a vector space From a survey on graph embeddings [Hamilton et al., 2017]: November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 70/80

  64. Graph embeddings Learning with an ''autoencoder'' From a survey on graph embeddings [Hamilton et al., 2017]: November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 71/80

  65. Graph embeddings Some established approaches From a survey on graph embeddings [Hamilton et al., 2017]: November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 72/80

  66. Jiang-Conrath (JCN) similarity measure: ln ln ln Graph embeddings Graph embeddings using similarities Given a tree ( V, E ) Leackock-Chodorow (LCH) similarity measure: sim ( v i , v j ) = − log shortest _ path _ distance ( v i , v j ) 2 h November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 73/80

  67. Graph embeddings Graph embeddings using similarities Given a tree ( V, E ) Leackock-Chodorow (LCH) similarity measure: sim ( v i , v j ) = − log shortest _ path _ distance ( v i , v j ) 2 h Jiang-Conrath (JCN) similarity measure: ln P lcs ( v i , v j ) sim ( v i , v j ) = 2 ln P ( v i ) + ln P ( v j ) November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 73/80

  68. Graph embeddings Graph embeddings using similarities path2vec model ( arxiv.org/abs/1808.05611 ): L = 1 i v j − sim ( v i , v j )) 2 + α v T ∑ ( ( v T i v in + α v T ) , j v jm | T | ( v i ,v j ) ∈ T sim ( v i , v j ) - the value of a ''gold'' similarity measure between a pair of nodes ( v i , v j ) ; v i - an embeddings of node; T - training batch; v in - random adjacent node of v i ; α - a small regularization coeffjcient, e.g. 0 . 001 . November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 74/80

  69. Graph embeddings Speedup: graph vs embeddings Computation of 82,115 pairwise similarities: Model Running time LCH in NLTK 30 sec. JCN in NLTK 6.7 sec. FSE embeddings 0.713 sec. path2vec and other fmoat vectors 0.007 sec. November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 75/80

  70. Graph embeddings Results: goodness of fjt Spearman correlation scores with WordNet similarities on SimLex999 noun pairs : Selection of synsets Model JCN-SemCor JCN-Brown LCH WordNet 1.0 1.0 1.0 Node2vec 0.655 0.671 0.724 Deepwalk 0.775 0.774 0.868 FSE 0.830 0.820 0.900 path2vec 0.917 0.914 0.934 November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 76/80

  71. Graph embeddings Results: SimLex999 dataset Spearman correlations with human SimLex999 noun similarities: Model Correlation Raw WordNet JCN-SemCor 0.487 Raw WordNet JCN-Brown 0.495 Raw WordNet LCH 0.513 node2vec [Grover & Leskovec, 2016] 0.450 Deepwalk [Perozzi et al., 2014] 0.533 FSE [Subercaze et al., 2015] 0.556 path2vec JCN-SemCor 0.549 path2vec JCN-Brown 0.540 path2vec LCH 0.540 November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 77/80

  72. Graph embeddings Results: SimLex999 dataset November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 78/80

  73. Conclusion November 7, 2018 Induction and embedding of linguistic structures from text, A. Panchenko 79/80

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend