@EVANAHARI, YOW! AUSTRALIA 2016
Something Old, Something New
A Talk about NLP for the Curious
Something Old, Something New A Talk about NLP for the Curious - - PowerPoint PPT Presentation
Something Old, Something New A Talk about NLP for the Curious @EVANAHARI, YOW! AUSTRALIA 2016 Jabberwocky `Twas brillig, and the slithy toves Did gyre and gimble in the wabe: All mimsy were the borogoves, And the mome raths outgrabe.
@EVANAHARI, YOW! AUSTRALIA 2016
A Talk about NLP for the Curious
– Lewis Carroll from Through the Looking-Glass and What Alice Found There, 1871
Why are these monkeys following me? Arrfff! LOL
Phonology − organization of sounds Morphology − construction of words Syntax − creation of valid sentences/phrases and identifying the structural roles of words in them Semantics − finding meaning of words/phrases/sentences Pragmatics − Situational meaning of sentences Discourse − order of sentences affecting interpretation World knowledge − mapping to general world knowledge Context awareness - the hardest part…?
Lexical analysis Syntactic analysis Semantic analysis Discourse Integration Pragmatic Analysis
– J. R. Firth, 1957
A language model is a function that captures the statistical characteristics of the word-sequence distribution in a language
10-word sequence from a 100 000 word vocabulary —> 10^50 possible sequences
= [111100] = [111100] = [110011] = [111100]
= [443311]
Vocabulary: Happy birthday to you dear “name” = [100000] = [010000] = [001000] = [000100] = [000010] = [000001] Sample text: Happy birthday to you Happy birthday to you Happy birthday dear “name” Happy birthday to you Term frequency
“Hello everyone who is eager to learn NLP!”
the next word x based on probabilities
P(xi | xi-(n-1),… ,xi-1) = count(xi-(n-1),… ,xi-1) count (xi-(n-1),… ,xi-1, xi)
Examples:
+francisco”, … Common pattern: Inference of applying various models
Round Red HasLeaf
+ +
Sample Data Apple Red No Green Yes Yellow Yes Red Yes Red Yes Green Yes Yellow No Yellow No Red Yes Yellow Yes Red No Green Yes Green Yes Yellow No
Feature No Yes Green 4 4/14 0.29 Yellow 3 2 5/14 0.36 Red 2 3 5/14 0.36 Grand Total 5 9 5/14 9/14 0.36 0.64
Incoming fruit text says “red” - is it about an apple?
P(Yes | Red) = P( Red | Yes) * P(Yes) / P (Red) P (Red |Yes) = 3/9 = 0.33 P(Yes)= 9/14 = 0.64 P(Red) 0.36 P (Yes | Red) = 0.33 * 0.64 / 0.36 = 0.60
60% chance it’s about an apple!
Things to Consider:
most
smoothing
Things to Consider:
0 0 0 1 = = 0 1 0 0
= =
2 3 8 1 7 5 6 2
space, where each dimension is features of a word
semantic characteristics
features; continuous values
NNLM
distributed representation
model - even for very large data sets
Diagram borrowed from Mikolow et al’s paper
Approximate t with n - to gain simplicity of n-grams
Ck contains learned features for word k
where
P(w1, w2,… ,wt-1, wt) = P(w1)P(w2|w1)P(w3|w1,w2)…P(w1, w2,… ,wt-1) x = (Cwt-n+1, 1, …, Cwt-n+1, d, Cwt-n+2, 1, …, Cwt-2, d, Cwt-1, 1, …, Cwt-1, d) SUM(i=1 to N) eai eak P(wt = k|wt-n+1, … ,wt-1) = ak = bk + SUM(i=1 to h) Wki tanh(ci + SUM(j=1 to (n-1)d) Vijxj)
Diagram borrowed from Bengio et al’s paper
ABBYY, Angoss, Attensity, AUTINDEX, Autonomy, Averbis, Basis Technology, Clarabridge, Complete Discovery Source, Endeca Technologies, Expert System S.p.A., FICO Score, General Sentiment, IBM LanguageWare, IBM SPSS, Insight, LanguageWare, Language Computer Corporation, Lexalytics, LexisNexis, Luminoso, Mathematica, MeaningCloud, Medallia, Megaputer Intelligence, NetOwl, RapidMiner, SAS Text Miner and Teragram;, Semantria , Smartlogic, StatSoft, Sysomos, WordStat, Xpresso, ….
accurate)
No animals were harmed during this photo shoot
scale
tuning nobs, beCer syntax parsing model, very recently large scale too
forgiving
distributions within a language
More open source tools and frameworks and generated distributed representations available to all
@EVANAHARI, YOW! AUSTRALIA 2016
Vote!