word embeddings
play

Word Embeddings Luke Zettlemoyer (Slides adapted from Danqi Chen, - PowerPoint PPT Presentation

CSEP 517 Natural Language Processing Word Embeddings Luke Zettlemoyer (Slides adapted from Danqi Chen, Greg Durrett, Chris Manning, Dan Jurafsky) How to represent words? N-gram language models P ( w it is 76 F and ) It is 76 F and ___.


  1. CSEP 517 Natural Language Processing Word Embeddings Luke Zettlemoyer (Slides adapted from Danqi Chen, Greg Durrett, Chris Manning, Dan Jurafsky)

  2. How to represent words? N-gram language models P ( w ∣ it is 76 F and ) It is 76 F and ___. [0.0001, 0.1, 0, 0, 0.002, …, 0.3, …, 0] red sunny Text classification P ( y = 1 ∣ x ) = σ ( θ ⊺ w + b ) I like this movie. 👎 [0, 1, 0, 0, 0, …, 1, …, 1] w (1) [0, 1, 0, 1, 0, …, 1, …, 1] I don’t like this movie. 👏 w (2) don’t

  3. Representing words as discrete symbols In traditional NLP, we regard words as discrete symbols: hotel, conference, motel — a localist representation one 1, the rest 0’s Words can be represented by one-hot vectors: hotel = [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0] motel = [0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0] Vector dimension = number of words in vocabulary (e.g., 500,000) Challenge: How to compute similarity of two words?

  4. Representing words by their context Distributional hypothesis : words that occur in similar contexts tend to have similar meanings J.R.Firth 1957 • “You shall know a word by the company it keeps” • One of the most successful ideas of modern statistical NLP! These context words will represent banking.

  5. Distributional hypothesis C1: A bottle of ___ is on the table. “tejuino” C2: Everybody likes ___. C3: Don’t have ___ before you drive. C4: We make ___ out of corn.

  6. Distributional hypothesis C1 C2 C3 C4 tejuino 1 1 1 1 C1: A bottle of ___ is on the table. 0 0 0 0 loud C2: Everybody likes ___. 1 0 0 0 motor-oil C3: Don’t have ___ before you drive. 0 1 0 1 tortillas 0 1 0 0 choices C4: We make ___ out of corn. 1 1 1 0 wine “words that occur in similar contexts tend to have similar meanings”

  7. Words as vectors • We’ll build a new model of meaning focusing on similarity • Each word is a vector • Similar words are “nearby in space” • A first solution: we can just use context vectors to represent the meaning of words! • word-word co-occurrence matrix:

  8. <latexit sha1_base64="Ex14HCNmlweatqm/BdVuIyGZI=">ACX3icbZFLSwMxFIUz47tWHXUlboJFqCBlpgi6EUQ3LivYKnTaIZNmajDzMI+BEvIn3Qlu/Cdm2graeiFwON+9JPckLhgV0vc/HdldW19Y3Ortl3f2d3z9g96Ilcky7OWc6fYyQIoxnpSioZeS4QWnMyFP8elfxp5JwQfPsU4KMkjROKMJxUhaK/JKnItmCL5EidamXP4o0tzBq9hmHCEdShUGml6HZih7hmtIgrLiBpjwRuXS3jYruC/rJwxE3kNv+VPCy6LYC4aYF6dyHsPRzlWKckZkiIfuAXcqARlxQzYmqhEqRA+BWNSd/KDKVEDPQ0HwNPrTOCSc7tyScur8nNEqFmKSx7ay2F4usMv9jfSWTq4GmWaEkyfDsokQxKHNYhQ1HlBMs2cQKhDm1b4X4BdlIpf2Smg0hWFx5WfTarcBvBQ8XjZvbeRyb4BicgCYIwCW4AfegA7oAg0/HdbaduvPlbri7rjdrdZ35zCH4U+7RNyRuQc=</latexit> <latexit sha1_base64="Ex14HCNmlweatqm/BdVuIyGZI=">ACX3icbZFLSwMxFIUz47tWHXUlboJFqCBlpgi6EUQ3LivYKnTaIZNmajDzMI+BEvIn3Qlu/Cdm2graeiFwON+9JPckLhgV0vc/HdldW19Y3Ortl3f2d3z9g96Ilcky7OWc6fYyQIoxnpSioZeS4QWnMyFP8elfxp5JwQfPsU4KMkjROKMJxUhaK/JKnItmCL5EidamXP4o0tzBq9hmHCEdShUGml6HZih7hmtIgrLiBpjwRuXS3jYruC/rJwxE3kNv+VPCy6LYC4aYF6dyHsPRzlWKckZkiIfuAXcqARlxQzYmqhEqRA+BWNSd/KDKVEDPQ0HwNPrTOCSc7tyScur8nNEqFmKSx7ay2F4usMv9jfSWTq4GmWaEkyfDsokQxKHNYhQ1HlBMs2cQKhDm1b4X4BdlIpf2Smg0hWFx5WfTarcBvBQ8XjZvbeRyb4BicgCYIwCW4AfegA7oAg0/HdbaduvPlbri7rjdrdZ35zCH4U+7RNyRuQc=</latexit> <latexit sha1_base64="Ex14HCNmlweatqm/BdVuIyGZI=">ACX3icbZFLSwMxFIUz47tWHXUlboJFqCBlpgi6EUQ3LivYKnTaIZNmajDzMI+BEvIn3Qlu/Cdm2graeiFwON+9JPckLhgV0vc/HdldW19Y3Ortl3f2d3z9g96Ilcky7OWc6fYyQIoxnpSioZeS4QWnMyFP8elfxp5JwQfPsU4KMkjROKMJxUhaK/JKnItmCL5EidamXP4o0tzBq9hmHCEdShUGml6HZih7hmtIgrLiBpjwRuXS3jYruC/rJwxE3kNv+VPCy6LYC4aYF6dyHsPRzlWKckZkiIfuAXcqARlxQzYmqhEqRA+BWNSd/KDKVEDPQ0HwNPrTOCSc7tyScur8nNEqFmKSx7ay2F4usMv9jfSWTq4GmWaEkyfDsokQxKHNYhQ1HlBMs2cQKhDm1b4X4BdlIpf2Smg0hWFx5WfTarcBvBQ8XjZvbeRyb4BicgCYIwCW4AfegA7oAg0/HdbaduvPlbri7rjdrdZ35zCH4U+7RNyRuQc=</latexit> <latexit sha1_base64="Ex14HCNmlweatqm/BdVuIyGZI=">ACX3icbZFLSwMxFIUz47tWHXUlboJFqCBlpgi6EUQ3LivYKnTaIZNmajDzMI+BEvIn3Qlu/Cdm2graeiFwON+9JPckLhgV0vc/HdldW19Y3Ortl3f2d3z9g96Ilcky7OWc6fYyQIoxnpSioZeS4QWnMyFP8elfxp5JwQfPsU4KMkjROKMJxUhaK/JKnItmCL5EidamXP4o0tzBq9hmHCEdShUGml6HZih7hmtIgrLiBpjwRuXS3jYruC/rJwxE3kNv+VPCy6LYC4aYF6dyHsPRzlWKckZkiIfuAXcqARlxQzYmqhEqRA+BWNSd/KDKVEDPQ0HwNPrTOCSc7tyScur8nNEqFmKSx7ay2F4usMv9jfSWTq4GmWaEkyfDsokQxKHNYhQ1HlBMs2cQKhDm1b4X4BdlIpf2Smg0hWFx5WfTarcBvBQ8XjZvbeRyb4BicgCYIwCW4AfegA7oAg0/HdbaduvPlbri7rjdrdZ35zCH4U+7RNyRuQc=</latexit> <latexit sha1_base64="LwUwX9BK80Y7mx5ulvUsqXb08A0=">ACSHicbVDLSsNAFJ3UV62vqEs3g0WoICURQTdC0Y3LCvYBTQiT6aQdOsmEmUmhpPk8Ny7d+Q1uXCjizklbqbYeGDj3nHu5d4fMyqVZb0YhZXVtfWN4mZpa3tnd8/cP2hKnghMGpgzLto+koTRiDQUVYy0Y0FQ6DPS8ge3ud8aEiEpjx7UKCZuiHoRDShGSkue6WEuK06IVN8P0iQ7gz98mJ3Ca+gEAuF07kMHd7n61ZSlznhuO2M4L4e6zDyzbFWtCeAysWekDGaoe+az0+U4CUmkMENSdmwrVm6KhKYkazkJLECA9Qj3Q0jVBIpJtOgsjgiVa6MOBCv0jBifp7IkWhlKPQ1535kXLRy8X/vE6igis3pVGcKBLh6aIgYVBxmKcKu1QrNhIE4QF1bdC3Ec6O6WzL+kQ7MUvL5PmedW2qvb9Rbl2M4ujCI7AMagAG1yCGrgDdAGDyCV/AOPown4834NL6mrQVjNnMI/qBQ+AbQvrTU</latexit> <latexit sha1_base64="LwUwX9BK80Y7mx5ulvUsqXb08A0=">ACSHicbVDLSsNAFJ3UV62vqEs3g0WoICURQTdC0Y3LCvYBTQiT6aQdOsmEmUmhpPk8Ny7d+Q1uXCjizklbqbYeGDj3nHu5d4fMyqVZb0YhZXVtfWN4mZpa3tnd8/cP2hKnghMGpgzLto+koTRiDQUVYy0Y0FQ6DPS8ge3ud8aEiEpjx7UKCZuiHoRDShGSkue6WEuK06IVN8P0iQ7gz98mJ3Ca+gEAuF07kMHd7n61ZSlznhuO2M4L4e6zDyzbFWtCeAysWekDGaoe+az0+U4CUmkMENSdmwrVm6KhKYkazkJLECA9Qj3Q0jVBIpJtOgsjgiVa6MOBCv0jBifp7IkWhlKPQ1535kXLRy8X/vE6igis3pVGcKBLh6aIgYVBxmKcKu1QrNhIE4QF1bdC3Ec6O6WzL+kQ7MUvL5PmedW2qvb9Rbl2M4ujCI7AMagAG1yCGrgDdAGDyCV/AOPown4834NL6mrQVjNnMI/qBQ+AbQvrTU</latexit> <latexit sha1_base64="LwUwX9BK80Y7mx5ulvUsqXb08A0=">ACSHicbVDLSsNAFJ3UV62vqEs3g0WoICURQTdC0Y3LCvYBTQiT6aQdOsmEmUmhpPk8Ny7d+Q1uXCjizklbqbYeGDj3nHu5d4fMyqVZb0YhZXVtfWN4mZpa3tnd8/cP2hKnghMGpgzLto+koTRiDQUVYy0Y0FQ6DPS8ge3ud8aEiEpjx7UKCZuiHoRDShGSkue6WEuK06IVN8P0iQ7gz98mJ3Ca+gEAuF07kMHd7n61ZSlznhuO2M4L4e6zDyzbFWtCeAysWekDGaoe+az0+U4CUmkMENSdmwrVm6KhKYkazkJLECA9Qj3Q0jVBIpJtOgsjgiVa6MOBCv0jBifp7IkWhlKPQ1535kXLRy8X/vE6igis3pVGcKBLh6aIgYVBxmKcKu1QrNhIE4QF1bdC3Ec6O6WzL+kQ7MUvL5PmedW2qvb9Rbl2M4ujCI7AMagAG1yCGrgDdAGDyCV/AOPown4834NL6mrQVjNnMI/qBQ+AbQvrTU</latexit> <latexit sha1_base64="LwUwX9BK80Y7mx5ulvUsqXb08A0=">ACSHicbVDLSsNAFJ3UV62vqEs3g0WoICURQTdC0Y3LCvYBTQiT6aQdOsmEmUmhpPk8Ny7d+Q1uXCjizklbqbYeGDj3nHu5d4fMyqVZb0YhZXVtfWN4mZpa3tnd8/cP2hKnghMGpgzLto+koTRiDQUVYy0Y0FQ6DPS8ge3ud8aEiEpjx7UKCZuiHoRDShGSkue6WEuK06IVN8P0iQ7gz98mJ3Ca+gEAuF07kMHd7n61ZSlznhuO2M4L4e6zDyzbFWtCeAysWekDGaoe+az0+U4CUmkMENSdmwrVm6KhKYkazkJLECA9Qj3Q0jVBIpJtOgsjgiVa6MOBCv0jBifp7IkWhlKPQ1535kXLRy8X/vE6igis3pVGcKBLh6aIgYVBxmKcKu1QrNhIE4QF1bdC3Ec6O6WzL+kQ7MUvL5PmedW2qvb9Rbl2M4ujCI7AMagAG1yCGrgDdAGDyCV/AOPown4834NL6mrQVjNnMI/qBQ+AbQvrTU</latexit> Words as vectors u · v cos ( u , v ) = k u kk v k P V i =1 u i v i cos ( u , v ) = qP V qP V i =1 u 2 i =1 v 2 i i What is the range of ? cos( ⋅ )

  9. Words as vectors Problem: not all counts are equal, words can randomly co-occur • Solution: re-weight by how likely it is for the two words to co-occur by simple chance • PPMI = Positive Pointwise Mutual Information

  10. Sparse vs dense vectors • Still, the vectors we get from word-word occurrence matrix are sparse (most are 0’s) & long (vocabulary size) • Alternative: we want to represent words as short (50-300 dimensional) & dense (real-valued) vectors • The focus of this lecture • The basis of all the modern NLP systems

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend