SupervisedLearning StatisticalNLP Spring2010 - PDF document

Supervised�Learning Statistical�NLP Spring�2010 � Systems�duplicate�correct� analyses�from�training�data � Hand'annotation�of�data � Time'consuming � Expensive � Hard�to�adapt�for�new�purposes� (tasks,�languages,�domains,�etc) � Corpus�availability�drives� research,�not�tasks � Example:�Penn�Treebank Lecture�15:�Grammar�Induction � 50K�Sentences � Hand'parsed�over�several�years Dan�Klein�– UC�Berkeley Unsupervised�Learning Unsupervised�Parsing? � Systems�take�raw�data�and� � Start�with�raw�text,�learn�syntactic�structure automatically�detect�patterns � Some�have�argued�that�learning�syntax�from� � Why�unsupervised�learning? positive�data�alone�is�impossible: � More�data�than�annotation � Gold,�1967:�Non'identifiability in�the�limit � Insights�into�machine�learning,� � Chomsky,�1980:�The�poverty�of�the�stimulus clustering � Kids�learn�some�aspects�of� � Many�others�have�felt�it�should�be�possible: language�entirely�without� supervision � Lari and�Young,�1990 � Carroll�and�Charniak,�1992 � Here:�unsupervised�learning � Alex�Clark,�2001 � Work�purely�from�the�forms�of�the� � Mark�Paskin,�2001 utterances � …�and�many�more,�but�it�didn’t�work�well�(or�at�all)� until�the�past�few�years � Neither�assume�nor�exploit�prior� meaning�or�grounding�[ �� .� Feldman� �� .] � Surprising�result:�it’s�possible�to�get�entirely� unsupervised�parsing�to�(reasonably)�work�well! Learnability Learnability:�[Gold�67] � Learnability:�formal�conditions�under�which�a�class�of� � Criterion:�identification�in�the�limit languages�can�be�learned�in�some�sense � A� �� of�L�is�an�infinite�sequence�of�x’s from�L�in� which�each�x�occurs�at�least�once � A�learner�H� �� if�for�any�presentation�of� � Setup: L,�from�some�point�n�onward,�H�always�outputs�L � Class�of�languages�is� � � � � � A�class� � � is� �� if�there�is�some�single� � � � Learner�is�some�algorithm�H H�which�correctly�identifies�in�the�limit�any�L�in� � � � � � Learner�sees�a�sequences�X�of�strings�x 1 …�x n � H�maps�sequences�X�to�languages�L�in� � � � � � Example:�L�=�{{a},�{a,b}}�is�learnable�in�the�limit � Question:�for�what�classes�do�learners�exist? � Theorem�[Gold�67]:�Any� �� which�contains�all�finite� �� languages�and�at�least�one�infinite�language�(i.e.�is� superfinite)�is�unlearnable in�this�sense 1

Learnability:�[Gold�67] Learnability:�[Horning�69] � Proof�sketch � Problem:�IIL�requires�that�H�succeed�on�each� presentation,�even�the�weird�ones � Assume� �� is�superfinite �� There�exists�a�chain�L 1 ⊂ L 2 ⊂ …�L ∞ � Another�criterion:� �� Take�any�learner�H�assumed�to�identify� � � � � � Assume�a�distribution�P L (x)�for�each�L � Construct�the�following�misleading�sequence � Assume�P L (x)�puts�non'zero�mass�on�all�and�only�x�in�L � Present�strings�from�L 1 until�it�outputs�L 1 � Assume�infinite�presentation�X�drawn�i.i.d.�from�P L (x) � Present�strings�from�L 2 until�it�outputs�L 2 � H�measure'one�identifies�L�if�probability�of�drawing�an�X� from�which�H�identifies�L�is�1 � … � This�is�a�presentation�of�L ∞ ,�but�H�won’t�identify�L ∞ � [Horning�69]:�PCFGs�can�be�identified�in�this�sense � Note:�there�can�be�misleading�sequences,�they�just�have�to� be�(infinitely)�unlikely Learnability:�[Horning�69] Learnability Proof�sketch � Gold’s�result�says�little�about�real�learners� � Assume� �� is�a�recursively�enumerable�set�of�recursive�languages�(e.g.�the� (requirements�of�IIL�are�way�too�strong) � �� set�of�PCFGs) Assume�an�ordering�on�all�strings�x 1 <�x 2 <�… � Define:�two�sequences�A�and�B� �� if�for�all�x�<�x n ,�x�in�A� ⇔ x� � � Horning’s�algorithm�is�completely�impractical�(needs� in�B astronomical�amounts�of�data) Define�the� �� E(L,n,m): � All�sequences�such�that�the�first�m�elements�do�not�agree�with�L�through�n � These�are�the�sequences�which�contain�early�strings�outside�of�L�(can’t�happen)� � or�fail�to�contain�all�the�early�strings�in�L�(happens�less�as�m�increases) � Even�measure'one�identification�doesn’t�say�anything� Claim:�P(E(L,n,m))�goes�to�0�as�m�goes�to�∞ � about�tree�structures�(or�even�density�over�strings) Let�d L (n)�be�the�smallest�m�such�that�P(E)�<�2 'n � Let�d(n)�be�the�largest�d L (n)�in�first�n�languages � Only�talks�about�learning�grammatical�sets � Learner:�after�d(n)�pick�first�L�that�agrees�with�evidence�through�n � � Strong�generative�vs weak�generative�capacity Can�only�fail�for�sequence�X�if�X�keeps�showing�up�in�E(L,n,d(n)),�which� � happens�infinitely�often�with�probability�zero�(we�skipped�some�details) Unsupervised�Tagging? EM�for�HMMs:�Process � Alternate�between�recomputing�distributions�over�hidden�variables� (the�tags)�and�reestimating�parameters � AKA�part'of'speech�induction � Crucial�step:�we�want�to�tally�up�how�many�(fractional)�counts�of� each�kind�of�transition�and�emission�we�have�under�current�params: � Task: � Raw�sentences�in � Tagged�sentences�out � Obvious�thing�to�do: � Start�with�a�(mostly)�uniform�HMM � Same�quantities�we�needed�to�train�a�CRF! � Run�EM � Inspect�results 2

Merialdo:�Setup Merialdo:�Results � Some�(discouraging)�experiments�[Merialdo 94] � Setup: � You�know�the�set�of�allowable�tags�for�each�word � Learn�a�supervised�model�on�k�training�sentences � Learn�P(w|t)�on�these�examples � Learn�P(t|t '1 ,t '2 )�on�these�examples � On�n�>�k�sentences,�re'estimate�with�EM � Note:�we�know�allowed�tags�but�not�frequencies Distributional�Clustering Distributional�Clustering � Three�main�variants�on�the�same�idea: ♦ �� ♦ � Pairwise�similarities�and�heuristic�clustering � E.g.�[Finch�and�Chater�92] � Produces�dendrograms �� Vector�space�methods �� E.g.�[Shuetze�93] �� Models�of�ambiguity � �� Probabilistic�methods �� ♦ � Various�formulations,�e.g.�[Lee�and�Pereira�99] �� ♦ �� Nearest�Neighbors Dendrograms��_ 3

SupervisedLearning StatisticalNLP Spring2010 - PDF document

SupervisedLearning StatisticalNLP Spring2010 Systemsduplicatecorrect analysesfromtrainingdata Hand'annotationofdata Time'consuming Expensive Hardtoadaptfornewpurposes

Machine Learning for NLP Supervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Supervised Learning Prof. Kuan-Ting Lai 2020/4/9 Machine Learning Supervised Unsupervised

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL,

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Statistical Significance Tests in NLP Natural Language Processing VU (706.230) - Andi Rexha

Individual and Collective Intention Recognition Combined with Evolution Prospection Lus Moniz

10 Habits of World-Class Marketers Entrepreneur Quest 2009 Mike Barzacchini

Preparation 118 The content presented in this section is based on Part 34 through Part 38 of

Iola Missionary Baptist Church Iola Missionary Baptist Church We are so glad you came! We are so

STUDENTS RESUME SCHOOL FOR TERM 4 TUESDAY, OCTOBER 07, 2003 Gladly, for many, if not all, a

Principal Component Analysis http://setosa.io/ev/principal- Food consumption in the UK

HARBOR TOWN PROJECT Chapel Hill, NC 275993490 nick_didow@unc.edu 919.962.3189 Water

READING COLORS YOUR WORLD iREAD Programming Showcase Summer 2021 Presented by CLAs Summer @