Recap LING572 Advanced Statistical Methods for NLP January 23, - PowerPoint PPT Presentation

Recap LING572 Advanced Statistical Methods for NLP January 23, 2020 1

Outline ● Summary of the material so far ● Reading materials ● Math formulas 2

So far ● Introduction: – Course overview – Information theory – Overview of classification task ● Basic classification algorithms: – Decision tree – Naïve Bayes – kNN   ● Feature selection, chi-square test and recap ● Hw1-Hw3 3

Main steps for solving   a classification task ● Prepare the data: ● Reformulate the task into a learning problem ● Define features ● Feature selection ● Form feature vectors   ● Train a classifier with the training data   ● Run the classifier on the test data   ● Evaluation 4

Comparison of 3 Learners kNN Decision Tree Naïve Bayes Vote by your Choose the c that max Modeling Vote by your groups neighbors P(c | x) Learn P(c) and Training None Build a decision tree P(f | c) Calculate Decoding Find neighbors Traverse the tree P(c)P(x | c) Max depth K Hyper Split function Delta for smoothing parameters Similarity fn Thresholds 5

Implementation issues ● Taking the log: P ( f i | c )) = log P ( c ) + ∑ log( P ( c ) ∏ log P ( f i | c ) i i ● Ignoring some constants: | V | P ( w k | c ) N ik ∏ P ( d i | c ) = P ( | d i | ) | d i | ! N ik ! k =1 ● Increasing small numbers before dividing log P ( x , c 1 ) = − 200; log P ( x , c 2 ) = − 201 6

Implementation issues (cont) ● Reformulate the formulas: P ( d i , c ) = P ( c ) ∏ P ( w k | c ) ∏ (1 − P ( w k | c )) w k ∈ d i w k ∉ d i = P ( c ) ∏ P ( w k | c ) 1 − P ( w k | c ) ∏ (1 − P ( w k | c )) w k ∈ d i w k ∏ ● Store useful intermediate results: 1 − P ( w k | c ) w k ● Vectorize! (e.g. entropy) 7

Lessons learned ● Don’t follow the formulas blindly. Vectorize when possible. ● Ex1: Multinomial NB | V | ∏ P ( w k | c ) N ik P ( c ) k =1 ● Ex2: cosine function for kNN ∑ k d i , k d j , k cos( d i , d j ) = ∑ k d 2 ∑ k a 2 i , k j , k 8

Next • Next unit (2.5 weeks): two more advanced methods: – MaxEnt (aka multinomial logistic regression) – CRF (Conditional Random Fields) ● Focus: ● Main intuition, final formulas used for training and testing ● Mathematical foundation ● Implementation issues 9

Reading material 10

The purpose of having   reading material ● Something to rely on besides the slides ● Reading before class could be beneficial ● Papers (not textbooks; some blog posts) could be the main source of information in the future 11

Problems with the reading material ● The authors assume that you know the algorithm already: ● Little background info ● Page limit ● Style   ● The notation problem   ➔ It could take a long time to understand everything 12

Some tips ● Look at several papers and slides at the same time ● Skim through the papers first to get the main idea ● Go to class and understand the slides ● Then go back to the papers (if you have time) ● Focus on the main ideas. It’s ok if you don’t understand all the details in the paper. 13

Math formulas 14

The goal of LING572 ● Understand ML algorithms ● The core of the algorithms ● Implementation: e.g., efficiency issues   ● Learn how to use the algorithms: ● Reformulate a task into a learning problem ● Select features ● Write pre- and post-processing modules 15

Understanding ML methods ● 1: never heard about it ● 2: know very little ● 3: know the basics ● 4: understand the algorithm (modeling, training, testing) ● 5: have implemented the algorithm ● 6: know how to modify/extend the algorithm ➔ Our goal: kNN, DT, NB: 5 MaxEnt, CRF, SVM, NN: 3-4 Math is important for 4-6, especially for 6. 16

  Why are math formulas hard? ● Notation, notation, notation. ● Same meaning, different notation: f k , w k , t k ● Calculus, probability, statistics, optimization theory, linear programming, …   ● People often have typos in their formulas.   ● A lot of formulas to digest in a short period of time. 17

Some tips ● No need to memorize the formulas   ● Determine which part of the formulas matters | V | P ( w k | c ) N ik ∏ P ( d i | c ) = P ( | d i | ) | d i | ! N ik ! k =1 classify ( d i ) = arg max P ( c ) P ( d i | c ) c | V | ∏ classify ( d i ) = arg max P ( w k | c ) N ik P ( c ) c k =1 ● It’s normal if you do not understand it the 1 st /2 nd time around. 18

Understanding a formula 1 + ∑ | D | i =1 N it P ( c j | d i ) P ( w t | c j ) = | V | + ∑ | V | s =1 ∑ | D | i =1 N is P ( c j | d i ) ∑ | D | i =1 N it P ( c j | d i ) P ( w t | c j ) = ∑ | V | s =1 ∑ | D | i =1 N is P ( c j | d i ) ∑ | D | i =1 N it P ( c j | d i ) = Z ( c j ) ∑ d i ∈ D ( c j ) N it = Z ( c j ) 19

Next Week ● On to MaxEnt! Don’t forget: reading assignment due Tuesday at 11AM! 20

Recap LING572 Advanced Statistical Methods for NLP January 23, - PowerPoint PPT Presentation

Recap LING572 Advanced Statistical Methods for NLP January 23, 2020 1 Outline Summary of the material so far Reading materials Math formulas 2 So far Introduction: Course overview Information theory Overview of

Semiotics: Recap Examples References Jrg Cassens Data and Process Visualization SoSe 2017

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Access Methods 1 / 44 Recap Recap 2 / 44 Recap A More Detailed Architecture granularity:

Trees (Part 2) 1 / 59 Trees (Part 2) Recap Recap 2 / 59 Trees (Part 2) Recap B + Tree A B

Trees (Part 1) 1 / 57 Trees (Part 1) Recap Recap 2 / 57 Trees (Part 1) Recap Hash Tables

Proof of Stake Recap Bitcoin Incentives Block subsidy Transaction fees Recap

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

Ruby Monstas Session 14 Agenda Recap Standard Library: RSS Exercises Recap Recap: TodoList

PARTNERSHIPS FOR CHILDREN Branding and Positioning :: FINAL WORKSHOP RECAP WORKSHOP RECAP //

1 7 Wonders Recap 2 Inspiring Travel 7 Wonders Recap 2 3 Responses Scenic Byways 7 Wonders

61A Lecture 11 Friday, September 21 Midterm 1 Recap 2 Midterm 1 Recap The exam was more

Interactive Proofs Lecture 16 What the all-powerful can convince mere mortals of 1 Recap 2

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker - April-July 2015 - Lecture 13: Grand

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker & Debabrata Panja - April-July 2018

Welcome! Todays Agenda: Grand Recap Exam Now What Todays Agenda:

The Beginner's Guide to Dimensionality Reduction Explore the methods that data scientists use to

CSSE 220 Arrays, ArrayLists, Wrapper Classes, Auto-boxing, Enhanced for loop Please sit in the

Import Yo u r Data W OR K IN G W ITH DATA IN TH E TIDYVE R SE Alison Hill Professor &

Expected Physicists Usage of CMS Tier 3 Christopher D Jones Cornell University Overview

ECE700.07: Game Theory with Engineering Applications Le Lecture 5: 5: Ga Games in Ext Extensi

Policies and Principles Part I Recurring Themes Annoucements Homework 1 due now! '*'

Week 1 -Wednesday What did we talk about last time? Syllabus A little about computer

Identifying and Reading Research Papers By Andrew Suh and Zhongping Zhang What is the goal?

Sambuz

Useful Links

Newsletter

Mail Us

Recap LING572 Advanced Statistical Methods for NLP January 23, - PowerPoint PPT Presentation

Recap LING572 Advanced Statistical Methods for NLP January 23, 2020 1 Outline Summary of the material so far Reading materials Math formulas 2 So far Introduction: Course overview Information theory Overview of

Semiotics: Recap Examples References Jrg Cassens Data and Process Visualization SoSe 2017

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Access Methods 1 / 44 Recap Recap 2 / 44 Recap A More Detailed Architecture granularity:

Trees (Part 2) 1 / 59 Trees (Part 2) Recap Recap 2 / 59 Trees (Part 2) Recap B + Tree A B

Trees (Part 1) 1 / 57 Trees (Part 1) Recap Recap 2 / 57 Trees (Part 1) Recap Hash Tables

Proof of Stake Recap Bitcoin Incentives Block subsidy Transaction fees Recap

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

Ruby Monstas Session 14 Agenda Recap Standard Library: RSS Exercises Recap Recap: TodoList

PARTNERSHIPS FOR CHILDREN Branding and Positioning :: FINAL WORKSHOP RECAP WORKSHOP RECAP //

1 7 Wonders Recap 2 Inspiring Travel 7 Wonders Recap 2 3 Responses Scenic Byways 7 Wonders

61A Lecture 11 Friday, September 21 Midterm 1 Recap 2 Midterm 1 Recap The exam was more

Interactive Proofs Lecture 16 What the all-powerful can convince mere mortals of 1 Recap 2

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker - April-July 2015 - Lecture 13: Grand

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker &amp; Debabrata Panja - April-July 2018

Welcome! Todays Agenda: Grand Recap Exam Now What Todays Agenda:

The Beginner's Guide to Dimensionality Reduction Explore the methods that data scientists use to

CSSE 220 Arrays, ArrayLists, Wrapper Classes, Auto-boxing, Enhanced for loop Please sit in the

Import Yo u r Data W OR K IN G W ITH DATA IN TH E TIDYVE R SE Alison Hill Professor &amp;

Expected Physicists Usage of CMS Tier 3 Christopher D Jones Cornell University Overview

ECE700.07: Game Theory with Engineering Applications Le Lecture 5: 5: Ga Games in Ext Extensi

Policies and Principles Part I Recurring Themes Annoucements Homework 1 due now! '*'

Week 1 -Wednesday What did we talk about last time? Syllabus A little about computer

Identifying and Reading Research Papers By Andrew Suh and Zhongping Zhang What is the goal?

Sambuz

Useful Links

Newsletter

Mail Us

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker & Debabrata Panja - April-July 2018

Import Yo u r Data W OR K IN G W ITH DATA IN TH E TIDYVE R SE Alison Hill Professor &