convolutional neural networks for language
play

Convolutional Neural Networks for Language CS 6956: Deep Learning - PowerPoint PPT Presentation

Convolutional Neural Networks for Language CS 6956: Deep Learning for NLP Features from text Example: Sentiment classification The goal: Is the sentiment of a sentence positive, negative or neutral? The film is fun and is host to some truly


  1. Convolutional Neural Networks for Language CS 6956: Deep Learning for NLP

  2. Features from text Example: Sentiment classification The goal: Is the sentiment of a sentence positive, negative or neutral? The film is fun and is host to some truly excellent sequences Approach: Train a multiclass classifier What features? 2

  3. Features from text Example: Sentiment classification The goal: Is the sentiment of a sentence positive, negative or neutral? The film is fun and is host to some truly excellent sequences Approach: Train a multiclass classifier What features? Some words and ngrams are informative, while some are not 3

  4. Features from text Example: Sentiment classification The goal: Is the sentiment of a sentence positive, negative or neutral? The film is fun and is host to some truly excellent sequences Approach: Train a multiclass classifier What features? Some words and ngrams are informative, while some are not We need to: 1. Identify informative local information 2. Aggregate it into a fixed size vector representation 4

  5. Convolutional Neural Networks Designed to 1. Identify local predictors in a larger input 2. Pool them together to create a feature representation 3. And possibly repeat this in a hierarchical fashion In the NLP context, it helps identify predictive ngrams for a task 5

  6. Overview • Convolutional Neural Networks: A brief history • The two operations in a CNN – Convolution – Pooling • Convolution + Pooling as a building block • CNNs in NLP • Recurrent networks vs Convolutional networks 6

  7. Overview • Convolutional Neural Networks: A brief history • The two operations in a CNN – Convolution – Pooling • Convolution + Pooling as a building block • CNNs in NLP • Recurrent networks vs Convolutional networks 7

  8. Convolutional Neural Networks: Brief history First arose in the context of vision • Hubel and Wiesel, 1950s/60s: Mammalian visual cortex contain neurons that respond to small regions and specific patterns in the visual field Fukushima 1980, Neocognitron: Directly inspired by Hubel, Wiesel • – Key idea: locality of features in the visual cortex is important, integrate them locally and propagate them to further layers – Two operations: convolutional layer that reacts to specific patterns and a down-sampling layer that aggregates information • Le Cun 1989-today, Convolutional Neural Network: A supervised version – Related to convolution kernels in computer vision – Very successful on handwriting recognition and other computer vision tasks Has become better over recent years with more data, computation • – Krizhevsky et al 2012: Object detection with ImageNet – The de facto feature extractor for computer vision 8

  9. Convolutional Neural Networks: Brief history First arose in the context of vision • Hubel and Wiesel, 1950s/60s: Mammalian visual cortex contain neurons that respond to small regions and specific patterns in the visual field Nobel Prize in Physiology or Medicine, 1981 Torsten Wiesel David H. Hubel 9

  10. Convolutional Neural Networks: Brief history First arose in the context of vision • Hubel and Wiesel, 1950s/60s: Mammalian visual cortex contain neurons that respond to small regions and specific patterns in the visual field Fukushima 1980, Neocognitron: Directly inspired by Hubel, Wiesel • – Key idea: locality of features in the visual cortex is important, integrate them locally and propagate them to further layers – Two operations 1. convolutional layer that reacts to specific patterns and, 2. a down-sampling layer that aggregates information 10

  11. Convolutional Neural Networks: Brief history First arose in the context of vision • Hubel and Wiesel, 1950s/60s: Mammalian visual cortex contain neurons that respond to small regions and specific patterns in the visual field Fukushima 1980, Neocognitron: Directly inspired by Hubel, Wiesel • – Key idea: locality of features in the visual cortex is important, integrate them locally and propagate them to further layers – Two operations: convolutional layer that reacts to specific patterns and a down-sampling layer that aggregates information • Le Cun 1989-today, Convolutional Neural Network: A supervised version – Related to convolution kernels in computer vision – Success with handwriting recognition and other computer vision tasks 11

  12. Convolutional Neural Networks: Brief history First arose in the context of vision • Hubel and Wiesel, 1950s/60s: Mammalian visual cortex contain neurons that respond to small regions and specific patterns in the visual field Fukushima 1980, Neocognitron: Directly inspired by Hubel, Wiesel • – Key idea: locality of features in the visual cortex is important, integrate them locally and propagate them to further layers – Two operations: convolutional layer that reacts to specific patterns and a down-sampling layer that aggregates information • Le Cun 1989-today, Convolutional Neural Network: A supervised version – Related to convolution kernels in computer vision – Success with handwriting recognition and other computer vision tasks Has become better over recent years with more data, computation • – Krizhevsky et al 2012: Object detection with ImageNet – The de facto feature extractor for computer vision 12

  13. Convolutional Neural Networks: Brief history • Introduced to NLP by Collobert et al, 2011 – Used as a feature extraction system for semantic role labeling • Since then several other applications such as sentiment analysis, question classification, etc – Kalchbrener et al 2014, Kim 2014 13

  14. CNN terminology Shows its computer visions and signal processing origins Filter • – A function that transforms in input matrix/vector into a scalar feature – A filter is a learned feature detector C hannel • – In computer vision, color images have red, blue and green channels – In general, a channel represents a medium that captures information about an input independent of other channels For example, different kinds of word embeddings could be different channels • • Channels could themselves be produced by previous convolutional layers Receptive field • – The region of the input that a filter currently focuses on 14

  15. CNN terminology Shows its computer visions and signal processing origins Filter • – A function that transforms in input matrix/vector into a scalar feature – A filter is a learned feature detector (also called a feature map) C hannel • – In computer vision, color images have red, blue and green channels – In general, a channel represents a medium that captures information about an input independent of other channels For example, different kinds of word embeddings could be different channels • • Channels could themselves be produced by previous convolutional layers Receptive field • – The region of the input that a filter currently focuses on 15

  16. CNN terminology Shows its computer visions and signal processing origins Filter • – A function that transforms in input matrix/vector into a scalar feature – A filter is a learned feature detector (also called a feature map) C hannel • – In computer vision, color images have red, blue and green channels – In general, a channel represents a medium that captures information about an input independent of other channels For example, different kinds of word embeddings could be different channels • • Channels could themselves be produced by previous convolutional layers Receptive field • – The region of the input that a filter currently focuses on 16

  17. CNN terminology Shows its computer visions and signal processing origins Filter • – A function that transforms in input matrix/vector into a scalar feature – A filter is a learned feature detector (also called a feature map) C hannel • – In computer vision, color images have red, blue and green channels – In general, a channel represents a “view of the input” that captures information about an input independent of other channels For example, different kinds of word embeddings could be different channels • • Channels could themselves be produced by previous convolutional layers Receptive field • – The region of the input that a filter currently focuses on 17

  18. Overview • Convolutional Neural Networks: A brief history • The two operations in a CNN – Convolution – Pooling • Convolution + Pooling as a building block • CNNs in NLP • Recurrent networks vs Convolutional networks 18

  19. What is a convolution? Let’s see this using an example for vectors. We will generalize this to matrices and beyond, but the general idea remains the same. 19

  20. What is a convolution? An example using vectors A vector 𝐲 2 3 1 3 2 1 20

  21. What is a convolution? An example using vectors A vector 𝐲 2 3 1 3 2 1 Filter 𝐠 of size 𝑜 1 2 1 Here, the filter size is 3 21

  22. � What is a convolution? An example using vectors A vector 𝐲 2 3 1 3 2 1 Filter 𝐠 of size 𝑜 1 2 1 The output is also a vector output ( = * 𝑔 , ⋅ 𝑦 (/ 0 1 2, , 22

  23. � What is a convolution? An example using vectors A vector 𝐲 2 3 1 3 2 1 Filter 𝐠 of size 𝑜 1 2 1 The output is also a vector output ( = * 𝑔 , ⋅ 𝑦 (/ 0 1 2, , The filter moves across the vector. At each position, the output is the dot product of the filter with a slice of the vector of that size. 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend