Machine Learning for Computational Linguistics May 24, 2016 . - PDF document

Machine Learning for Computational Linguistics May 24, 2016 Ç. Çöltekin, label of an unknown label. In the example: Supervised learning: classifjcation PCA Hierarchical clustering Mixture densities K-means Clustering Introduction Summary Practical matters 4 / 40 SfS / University of Tübingen May 24, 2016 Unsupervised learning want to predict the value of variable. Supervised learning: regression PCA Hierarchical clustering Mixture densities K-means Clustering Introduction Summary Practical matters 3 / 40 May 24, 2016 SfS / University of Tübingen 5 / 40 Ç. Çöltekin, Practical matters May 24, 2016 SfS / University of Tübingen Ç. Çöltekin, parameters modify the objective functions to restrict the space of the Regularization PCA Hierarchical clustering Mixture densities K-means Clustering Introduction Summary 6 / 40 Practical matters May 24, 2016 SfS / University of Tübingen Ç. Çöltekin, descent using analytic solutions, or search methods such as gradient training Supervised learning: estimating parameters PCA Hierarchical clustering Mixture densities K-means Clustering Introduction Summary SfS / University of Tübingen Ç. Çöltekin, on what we learned (e.g., parameter estimates) during training fjrst/second word in the bigram, not its unigram probability PCA Hierarchical clustering Mixture densities K-means Clustering Introduction Summary Practical matters 1 / 40 May 24, 2016 SfS / University of Tübingen Ç. Çöltekin, you are consistent, using any base is fjne. theoretic measures. Base only changes the ‘unit’. As long as bigrams, you need to use probability of a word given it is the page) by June 13 with Common confusions (mainly about bigrams): Homework 1 PCA Hierarchical clustering Mixture densities K-means Clustering Introduction Summary Practical matters May 24, 2016 Seminar für Sprachwissenschaft University of Tübingen Çağrı Çöltekin Projects 7 / 40 K-means Mixture densities May 24, 2016 2 / 40 Practical matters Summary Introduction Clustering Ç. Çöltekin, SfS / University of Tübingen Hierarchical clustering PCA Supervised learning learning variables model ▶ Word ngrams (typically) do not cross sentence boundaries ▶ Order is important in a bigram ▶ While calculating conditional probabilities and PMI for ▶ The base of logarithm does not matter for information ▶ The methods we studied so far are instances of supervised ▶ Please send me short project proposal document (about one ▶ In supervised learning, we have a set of predictors x , and want to predict a response or outcome variable y ▶ the list of the project members ▶ a title, brief description ▶ During training we have access to both input and output ▶ whether you have already obtained data for the project or not ▶ the methods you intend to apply ▶ Typically, training consist of estimating parameters w of a ▶ and let me know as soon as you formed your project team ▶ During prediction, we are given x and make predictions based y x 2 + + + ▶ The response (outcome) is a ▶ The response (outcome) − + − variable ( y ) is a quantitative + positive + or negative − ? − + ▶ Given the features ( x 1 and ▶ Given the features ( x ) we − − x 2 ), we want to predict the − y − instance ? x x 1 ▶ Most models/methods estimate a set of parameters w during ▶ To counteract overfjtting to the training data, we typically ▶ Often we fjnd the parameters that minimize a cost function J ( w ) ▶ For least-squares regression ▶ Common regularization methods are ∑ y i − y i ) 2 J ( w ) = ( ˆ ▶ L1 regularization minimize i J ( w ) + λ ∥ w ∥ 1 ▶ For logistic regression, the negative log likelihood ▶ L2 regularization minimize J ( w ) = − log L ( w ) J ( w ) + λ ∥ w ∥ ▶ If the cost function is convex we can fjnd a global minimum

Practical matters Ç. Çöltekin, Mixture densities Hierarchical clustering PCA K-means clustering K-means is a popular method for clustering. clusters 2. Repeat until convergence Euclidean distance within each cluster SfS / University of Tübingen Clustering May 24, 2016 13 / 40 Practical matters Summary Introduction Clustering K-means Mixture densities Hierarchical clustering K-means Introduction K-means clustering: visualization How to do clustering 11 / 40 Practical matters Summary Introduction Clustering K-means Mixture densities Hierarchical clustering PCA Most clustering algorithms try to minimize the scatter within each Summary clusters Summary Exact solution (fjnding global optimum) is not possible for realistic data. We use methods that fjnd a local minimum. Ç. Çöltekin, SfS / University of Tübingen May 24, 2016 12 / 40 Practical matters PCA 0 SfS / University of Tübingen 1 PCA K-means clustering: visualization 0 1 2 3 4 5 0 2 Mixture densities 3 4 5 randomly closest centroid centroids Ç. Çöltekin, SfS / University of Tübingen May 24, 2016 Hierarchical clustering K-means 1 5 2 3 4 5 0 1 2 3 4 randomly Clustering closest centroid centroids Ç. Çöltekin, SfS / University of Tübingen May 24, 2016 14 / 40 Practical matters Summary Introduction May 24, 2016 14 / 40 Ç. Çöltekin, with the supervised methods Clustering example in two dimensions Summary linguistic units (letters, words, sentences, documents, …) Practical matters 8 / 40 May 24, 2016 SfS / University of Tübingen Ç. Çöltekin, not have labels Hierarchical clustering dimensional representation of the data groups in the data Hierarchical clustering together Ç. Çöltekin, SfS / University of Tübingen May 24, 2016 10 / 40 PCA Mixture densities Summary (divisive) Clustering Mixture densities to each other K-means Clustering Introduction users/authors … specifjc) solutions, we often rely on greedy algorithms fjnding local K-means optima. Ç. Çöltekin, SfS / University of Tübingen May 24, 2016 9 / 40 Practical matters Summary Introduction Clustering Practical matters data points are grouped Introduction Unsupervised learning Introduction Clustering Clustering K-means Mixture densities Hierarchical clustering PCA PCA Mixture densities K-means Hierarchical clustering PCA Similarity and distance the data ▶ Our aim is to fjnd groups of instances/items that are similar ▶ In unsupervised learning, we do not have labels ▶ Our aim is to fjnd useful patterns/structure in the data ▶ Clustering similar languages, dialects, documents, ▶ Typical unsupervised methods include ▶ Clustering: fjnd related groups of instances ▶ The distance measure is important (but also application ▶ Density estimation: fjnd a probability distribution that explains ▶ Clustering can be hierarchical or non-hierarchical ▶ Dimensionality reduction: fjnd a accurate/useful lower ▶ Clustering can be bottom-up (agglomerative) or top-down ▶ Evaluation is diffjcult: we do not have ‘true’ labels/values ▶ For most (useful) problems we cannot fjnd globally optimum ▶ Sometimes unsupervised methods can be used in conjunction x 2 ▶ The notion of distance (similarity) is very important in clustering. A distance measure D , ▶ is symmetric: D ( a , b ) = D ( b , a ) ▶ Unlike classifjcation we do ▶ non-negative: D ( a , b ) ⩾ 0 for all a , b , and it D ( a , b ) = 0 ifg a = b ▶ obeys triangle inequality: D ( a , b ) + D ( b , c ) ⩾ D ( a , c ) ▶ We want to fjnd ‘natural’ ▶ The choice of distance is application specifjc ▶ Intuitively, similar or closer ▶ A few common choices: √ ∑ k ▶ Euclidean distance: ∥ a − b ∥ = j = 1 ( a j − b j ) 2 ▶ Manhattan distance: ∥ a − b ∥ 1 = ∑ k j = 1 | a j − b j | ▶ We will often face with defjning distance measures between x 1 cluster. Which is equivalent to maximizing the scatter between x 2 1. Randomly choose centroids , m 1 , . . . , m K , representing K K 1 ∑ ∑ ∑ ▶ Assign each data point to the cluster of the nearest centroid d ( a , b ) 2 ▶ Re-calculate the centroid locations based on the assignments k = 1 C ( a )= k C ( b )= k Efgectively, we are fjnding a local minimum of the sum of squared ▶ K 1 K ∑ ∑ ∑ 1 d ( a , b ) ∑ ∑ ∑ ∥ a − b ∥ 2 2 2 k = 1 C ( a )= k C ( b ) ̸ = k k = 1 C ( a )= k C ( b )= k x 1 ▶ The data ▶ The data ▶ Set cluster centroids ▶ Set cluster centroids ▶ Assign data points to ▶ Assign data points to ▶ Recalculate the ▶ Recalculate the

Machine Learning for Computational Linguistics May 24, 2016 . - PDF document

Machine Learning for Computational Linguistics May 24, 2016 . ltekin, label of an unknown label. In the example: Supervised learning: classifjcation PCA Hierarchical clustering Mixture densities K-means Clustering Introduction

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Introduction to Linguistics Darrell Larsen Linguistics 101 Darrell Larsen Introduction to

Outline zipfR zipfR (Computational) linguistics Evert & Baroni Evert & Baroni

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Topics in Computational Linguistics Topics in Computational Linguistics March 28, 2014 GIL,

One-Shot Learning: Language Acquisition for Machine SS16 Computational Linguistics for

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Linguistics 201 Personnel Introduction to Linguistics General Course Description Syllabus

Workshop 14: R-mode MVA Murray Logan 06 Aug 2016 > # might want to put this in to make the

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Data Visualisation with R Caroline Sporleder & Ines Rehbein WS 09/10 Sporleder & Rehbein

Introduction to Data Science Winter Semester 2019/20 Oliver Ernst TU Chemnitz, Fakultt fr

/ Data Pre-Processing in R Fraud Detection Course - 2019/2020 Nuno Moniz nuno.moniz@fc.up.pt /

Introduction to the demo Dark Matter Tool on the Web

Wi Wireless Access Graduate course in Communications Engineering University of Rome La Sapienza

Markov Chains Carey Williamson Department of Computer Science University of Calgary 2 Outline

Machine Learning for Computational Linguistics May 24, 2016 . - PDF document

Machine Learning for Computational Linguistics May 24, 2016 . ltekin, label of an unknown label. In the example: Supervised learning: classifjcation PCA Hierarchical clustering Mixture densities K-means Clustering Introduction

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Introduction to Linguistics Darrell Larsen Linguistics 101 Darrell Larsen Introduction to

Outline zipfR zipfR (Computational) linguistics Evert &amp; Baroni Evert &amp; Baroni

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Topics in Computational Linguistics Topics in Computational Linguistics March 28, 2014 GIL,

One-Shot Learning: Language Acquisition for Machine SS16 Computational Linguistics for

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Linguistics 201 Personnel Introduction to Linguistics General Course Description Syllabus

Workshop 14: R-mode MVA Murray Logan 06 Aug 2016 &gt; # might want to put this in to make the

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Data Visualisation with R Caroline Sporleder &amp; Ines Rehbein WS 09/10 Sporleder &amp; Rehbein

Introduction to Data Science Winter Semester 2019/20 Oliver Ernst TU Chemnitz, Fakultt fr

/ Data Pre-Processing in R Fraud Detection Course - 2019/2020 Nuno Moniz nuno.moniz@fc.up.pt /

Introduction to the demo Dark Matter Tool on the Web

Wi Wireless Access Graduate course in Communications Engineering University of Rome La Sapienza

Markov Chains Carey Williamson Department of Computer Science University of Calgary 2 Outline

Outline zipfR zipfR (Computational) linguistics Evert & Baroni Evert & Baroni

Workshop 14: R-mode MVA Murray Logan 06 Aug 2016 > # might want to put this in to make the

Data Visualisation with R Caroline Sporleder & Ines Rehbein WS 09/10 Sporleder & Rehbein