Statistical Natural Language Processing . ltekin, variables - PDF document

Statistical Natural Language Processing Ç. Çöltekin, variables learned without labels … … Hidden Markov models Models with hidden variables Practical matters Autoencoders PCA Clustering Recap 4 / 53 Summer Semester 2017 SfS / University of Tübingen such as gradient descent SfS / University of Tübingen using analytic solutions, otherwise use search methods – For logistic regression, the negative log likelihood – For least-squares regression Unsupervised machine learning during training Supervised learning: estimating parameters Practical matters Autoencoders PCA Clustering Recap 3 / 53 Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, Summer Semester 2017 label of an unknown Clustering Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, – … – Clustering documents, e.g., news into topics – Clustering words for e.g., better parsing – Clustering (literary) texts, for e.g., authorship attribution relations – Clustering languages, dialects for determining their similar to each other Clustering: why do we do it? Practical matters Autoencoders PCA Recap 5 / 53 – Clustering : fjnd related groups of instances Recap Clustering PCA Autoencoders Practical matters Unsupervised learning – Density estimation : fjnd a probability distribution that 6 / 53 explains the data – Dimensionality reduction : fjnd an accurate/useful lower dimensional representation of the data Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 Ç. Çöltekin, 7 / 53 SfS / University of Tübingen Ç. Çöltekin, Autoencoders PCA Clustering Recap 2 / 53 learning Summer Semester 2017 based on model we learned SfS / University of Tübingen Supervised learning: regression Summer Semester 2017 1 / 53 Recap Clustering Ç. Çöltekin, PCA Autoencoders want to predict the value quantitative variable. Practical matters Supervised learning: classifjcation Supervised learning PCA a label. In the example: Çağrı Çöltekin University of Tübingen Summer Semester 2017 Recap Clustering Seminar für Sprachwissenschaft Autoencoders Practical matters Practical matters • The methods we studied so far are instances of supervised • In supervised learning, we have a set of predictors x , and want to predict a response or outcome variable y • During training, we have both input and output variables • Training consist of estimating parameters w of a model • During prediction, we are given x and make predictions x 2 y + + + • The response (outcome) is • The response (outcome) − + − variable ( y ) is a + positive + or negative − ? − + • Given the features ( x 1 and − • Given the features ( x ) we − x 2 ), we want to predict the − of y − instance ? x 1 x • Most models/methods estimate a set of parameters w q 0 q 1 q 2 q 3 q 4 q T • Often we fjnd the parameters that minimize a loss function y i − y i ) 2 + ∥ w ∥ ∑ J ( w ) = ( ˆ o 1 o 2 o 3 o 4 o T i • HMMs, or other models with hidden variables, can be J ( w ) = − log L ( w ) + ∥ w ∥ • Unsupervised learning is essentially learning the hidden • If the loss function is convex , we can fjnd a global minimum • In unsupervised learning, we do not have labels • The aim is to fjnd groups of instances/items that are • Our aim is to fjnd useful patterns/structure in the data • Typical unsupervised methods include • Applications include • All can be cast as graphical models with hidden variables • Evaluation is diffjcult: we do not have ‘true’ labels/values

Recap 13 / 53 assignments Efgectively, we are fjnding a local minimum of the sum of squared Euclidean distance within each cluster * Note the similarity with the EM algorithm Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 Recap – Assign each data point to the cluster of the nearest centroid Clustering PCA Autoencoders Practical matters K-means clustering: visualization 0 1 2 – Re-calculate the centroid locations based on the 2. Repeat until convergence 4 SfS / University of Tübingen Practical matters How to do clustering Most clustering algorithms try to minimize the scatter within each cluster. Which is equivalent to maximizing the scatter between clusters Clustering Ç. Çöltekin, Summer Semester 2017 clusters 12 / 53 Recap Clustering PCA Autoencoders Practical matters K-means clustering K-means is a popular method for clustering. 3 5 PCA 3 2 3 4 5 0 1 2 4 0 5 randomly to the closest centroid centroids Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 1 K-means clustering: visualization 0 centroid 1 2 3 4 5 randomly to the closest centroids Practical matters Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 14 / 53 Recap Clustering PCA Autoencoders Autoencoders 14 / 53 Clustering Autoencoders Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 9 / 53 Recap Clustering PCA Practical matters closer data points are Similarity and distance between linguistic units (letters, words, sentences, Recap Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 10 / 53 Recap grouped together groups in the data PCA Ç. Çöltekin, PCA Autoencoders Practical matters Clustering (divisive) optimum solutions, we often rely on greedy algorithms that fjnd a local minimum important SfS / University of Tübingen not have labels Summer Semester 2017 8 / 53 Recap Clustering PCA Autoencoders Practical matters Clustering in two dimensional space Clustering documents, …) Autoencoders Practical matters 11 / 53 Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, Distance measures in Euclidean space x 2 • Clustering can be hierarchical or non-hierarchical • Unlike classifjcation, we do • Clustering can be bottom-up (agglomerative) or top-down • We want to fjnd ‘natural’ • For most (useful) problems we cannot fjnd globally • Intuitively, similar or • The measure of distance or similarity between the items is x 1 • The notion of distance (similarity) is important in • Euclidean distance: clustering. A distance measure D , � � k – is symmetric: D ( a , b ) = D ( b , a ) � � ∑ ∥ a − b ∥ = ( a j − b j ) 2 – non-negative: D ( a , b ) ⩾ 0 for all a , b , and it D ( a , b ) = 0 ifg a = b j = 1 – obeys triangle inequality: D ( a , b ) + D ( b , c ) ⩾ D ( a , c ) • Manhattan distance: • The choice of distance is application specifjc k • We will often face with defjning distance measures ∑ ∥ a − b ∥ 1 = | a j − b j | j = 1 1. Randomly choose centroids , m 1 , . . . , m K , representing K x 2 K ∑ ∑ ∑ d ( a , b ) k = 1 C ( a )= k C ( b )= k K K 1 ∑ ∑ ∑ ∑ ∑ ∑ ∥ a − b ∥ 2 d ( a , b ) 2 k = 1 C ( a )= k C ( b ) ̸ = k k = 1 C ( a )= k C ( b )= k x 1 • The data • The data • Set cluster centroids • Set cluster centroids • Assign data points • Assign data points • Recalculate the • Recalculate the

Statistical Natural Language Processing . ltekin, variables - PDF document

Statistical Natural Language Processing . ltekin, variables learned without labels Hidden Markov models Models with hidden variables Practical matters Autoencoders PCA Clustering Recap 4 / 53 Summer Semester 2017 SfS /

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

Statistical Natural Language Processing Prasad Tadepalli CS430 lecture Natural Language

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Introduction Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing

Statistical natural language processing 24.05.19 Statistical Natural Language Processing 1 The

DEVELOPING A THEORETICAL MODEL OF CLINICIAN INFORMATION USAGE PROPENSITY Dr Philip J Scott MSc

The Overlapping Thermodynamic Dissociation Constants of the Antidepressant Vortioxetine Using

Principal Component Analysis Ken Kreutz-Delgado (Nuno Vasconcelos) UCSD ECE 175A Winter

Reducing dimensionality Principal components R.W. Oldford Reducing dimensions Recall how

Boosting New Physics Searches with Deep Learning David Shih NHETC, Rutgers University

LEARNING NEW PHYSICS FROM A MACHINE Raffaele Tito DAgnolo - SLAC GGI 2018 RTD and Andrea

Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best

Cbio 16S analysis pipeline Katie Lennard Microbiome analysis workflow Data preprocessing (UCT