Text Understanding from Scratch Xiang Zhang and Yann LeCun Article - - PowerPoint PPT Presentation

text understanding from scratch
SMART_READER_LITE
LIVE PREVIEW

Text Understanding from Scratch Xiang Zhang and Yann LeCun Article - - PowerPoint PPT Presentation

Text Understanding from Scratch Xiang Zhang and Yann LeCun Article presented by Chad DeChant Paper Highlights Text understanding...without artificially embedding knowledge about words, phrases, sentences or any other syntactic or semantic


slide-1
SLIDE 1

Text Understanding from Scratch

Xiang Zhang and Yann LeCun

Article presented by Chad DeChant

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4

Paper Highlights

“Text understanding...without artificially embedding knowledge about words, phrases, sentences or any

  • ther syntactic or semantic structures associated with

a language.”

  • Input is only characters, not words
  • No knowledge of syntax or semantic

structures is hardwired in

  • Easily modified for other languages
slide-5
SLIDE 5

Input

Alphabet size: 69 characters

abcdefghijklmnopqrst uvwxyz0123456789-,;.!?:’ ’’/\|_@#$%ˆ&* ̃‘+-=<>()[]{}

Length of input = L (1014) Frame size M is 69 Input is a set of frames of size M x L

slide-6
SLIDE 6

ConvNet Design

slide-7
SLIDE 7

ConvNet Layers

Convolutional layers Fully connected layers

slide-8
SLIDE 8

Training

  • SGD with minibatch size 128
  • Momentum
  • Rectified Linear Units
  • Torch 7
slide-9
SLIDE 9

Learning

Select kernel weights from the first layer

  • Network learned to attach more

importance to letters than other characters

slide-10
SLIDE 10

“We hypothesize that when trained from raw characters, temporal ConvNet is able to learn the hierarchical representations of words, phrases, and sentences in order to understand text.”

Learning

Select kernel weights from the first layer

slide-11
SLIDE 11

Data Augmentation with Thesaurus

Improve generalization by increasing the number of training examples

  • 1. Choose r words to be replaced

P[r] ~ pr

  • 2. Choose the index s in the thesaurus entry of

the replacement word P[s] ~ qs q = p = 0.5 geometric distribution

slide-12
SLIDE 12

Dataset and Results

“The unfortunate fact in [the] literature is that there is no openly accessible dataset that is large enough or with labels of sufficient quality for us...”

slide-13
SLIDE 13

Dataset and Results

Sentiment analysis text categorization

  • ntology classification

Several new datasets for:

slide-14
SLIDE 14

Comparisons

Bag of Words word2vec

Most common 5000 words from each dataset Same 5000 vectors trained on Google news corpus used for all dataset comparisons

Performance comparisons only against their own implementations of:

Less than state of the art comparisons

slide-15
SLIDE 15

Amazon review sentiment analysis

A very large dataset

Input text: Amazon reviews between 100 and 1000 characters

slide-16
SLIDE 16

Amazon review results

slide-17
SLIDE 17

Amazon review results

Other results for comparison: movie sentiment analysis

From Kalchbrenner, Grefenstette, Blunsome, “A Convolutional Neural Network for Modeling Sentences” 2014

slide-18
SLIDE 18

Yahoo answers topic dataset

Input text: Question title, question text, best answer

slide-19
SLIDE 19

Yahoo Answers results

slide-20
SLIDE 20

Yahoo Answers results

Other results for comparison: 6-way question classification

From Kalchbrenner, Grefenstette, Blunsome, “A Convolutional Neural Network for Modelling Sentences” 2014

slide-21
SLIDE 21

DBpedia Ontology Classification

Input text: title and abstract. length ≤ 1014 characters

slide-22
SLIDE 22

DBpedia Ontology Results

slide-23
SLIDE 23

News categorization results

Input text: title of article and description, length ≤ 1014 chars

slide-24
SLIDE 24

News categorization in Chinese

Extend the model to work with Chinese:

Segment text: transliterate:

我常常跟朋友看电影 我 常常 跟 朋友 看 电影

wo3 chang2chang2 gen1 peng2you3 kan4 dian4ying3 ioftenseemovieswithfriends i often see movies with friends

slide-25
SLIDE 25

News categorization in Chinese

Input text: title of article and content, 100 ≤ length ≤ 1014 chars

slide-26
SLIDE 26

Conclusions & Speculations

  • Good results
  • End to end learning
  • New datasets
slide-27
SLIDE 27

Conclusions & Speculations

slide-28
SLIDE 28

Conclusions & Speculations

Reinventing the wheel?

“Text understanding...without artificially embedding knowledge about words, phrases, sentences or any

  • ther syntactic or semantic structures associated with

a language.”

slide-29
SLIDE 29

Thank you