Supervised and unsupervised learning. Petr Pok Czech Technical - PowerPoint PPT Presentation

Supervised and unsupervised learning. Petr Pošík Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics This lecture is based on the book Ten Lectures on Statistical and Structural Pattern Recognition by Michail I. Schlesinger and Václav Hlaváˇ c (Kluwer, 2002). ceské verzi kniha vyšla ve vydavatelství ˇ (V ˇ CVUT v roce 1999 pod názvem Deset pˇ rednášek z teorie statistického a strukturálního rozpoznávání ). P. Pošík c � 2013 Artificial Intelligence – 1 / 48

Learning Decision strategy design Learning as parameter estimation Learning as optimal strategy selection Several surrogate criteria Learning revisited Unsupervised Learning Clustering Summary Learning P. Pošík c � 2013 Artificial Intelligence – 2 / 48

Decision strategy design Using an observation x ∈ X of an object of interest with a hidden state k ∈ K , we should design a decision strategy q : X → D which would be optimal with respect to certain criterion. P. Pošík c � 2013 Artificial Intelligence – 3 / 48

Decision strategy design Using an observation x ∈ X of an object of interest with a hidden state k ∈ K , we should design a decision strategy q : X → D which would be optimal with respect to certain criterion. Bayesian decision theory requires complete statistical information p XK ( x , k ) of the object of interest to be known, and a suitable penalty function W : K × D → R must be provided. Non-Bayesian decision theory studies tasks for which some of the above information is not available. In practical applications, typically, none of the probabilities are known! The designer is only provided with the training (multi)set T = { ( x 1 , k 1 ) , ( x 2 , k 2 ) , . . . , ( x l , k l ) } of examples. ✔ It is simpler to provide good examples than to gain complete or partial statistical model, build general theories, or create explicit descriptions of concepts (hidden states). ✔ The aim is to find definitions of concepts (classes, hidden states) which are ✘ complete (all positive examples are satisfied), and consistent (no negative examples are satisfied). ✘ ✔ The training (multi)set is finite , the found concept description is only a hypothesis . P. Pošík c � 2013 Artificial Intelligence – 3 / 48

Decision strategy design Using an observation x ∈ X of an object of interest with a hidden state k ∈ K , we should design a decision strategy q : X → D which would be optimal with respect to certain criterion. Bayesian decision theory requires complete statistical information p XK ( x , k ) of the object of interest to be known, and a suitable penalty function W : K × D → R must be provided. Non-Bayesian decision theory studies tasks for which some of the above information is not available. In practical applications, typically, none of the probabilities are known! The designer is only provided with the training (multi)set T = { ( x 1 , k 1 ) , ( x 2 , k 2 ) , . . . , ( x l , k l ) } of examples. ✔ It is simpler to provide good examples than to gain complete or partial statistical model, build general theories, or create explicit descriptions of concepts (hidden states). ✔ The aim is to find definitions of concepts (classes, hidden states) which are ✘ complete (all positive examples are satisfied), and consistent (no negative examples are satisfied). ✘ ✔ The training (multi)set is finite , the found concept description is only a hypothesis . When do we need to use learning? When knowledge about the recognized object is insufficient to solve the PR task. ✔ Most often, we have insufficient knowledge about p X | K ( x | k ) . ✔ P. Pošík c � 2013 Artificial Intelligence – 3 / 48

Decision strategy design Using an observation x ∈ X of an object of interest with a hidden state k ∈ K , we should design a decision strategy q : X → D which would be optimal with respect to certain criterion. Bayesian decision theory requires complete statistical information p XK ( x , k ) of the object of interest to be known, and a suitable penalty function W : K × D → R must be provided. Non-Bayesian decision theory studies tasks for which some of the above information is not available. In practical applications, typically, none of the probabilities are known! The designer is only provided with the training (multi)set T = { ( x 1 , k 1 ) , ( x 2 , k 2 ) , . . . , ( x l , k l ) } of examples. ✔ It is simpler to provide good examples than to gain complete or partial statistical model, build general theories, or create explicit descriptions of concepts (hidden states). ✔ The aim is to find definitions of concepts (classes, hidden states) which are ✘ complete (all positive examples are satisfied), and consistent (no negative examples are satisfied). ✘ ✔ The training (multi)set is finite , the found concept description is only a hypothesis . When do we need to use learning? When knowledge about the recognized object is insufficient to solve the PR task. ✔ Most often, we have insufficient knowledge about p X | K ( x | k ) . ✔ How do we proceed? P. Pošík c � 2013 Artificial Intelligence – 3 / 48

Learning as parameter estimation Learning Assume p XK ( x , k ) has a particular form (e.g. Gaussian, mixture of Gaussians, 1. Decision strategy design piece-wise constant) with a small number of parameters Θ k . Learning as parameter estimation Estimate the values of parameters Θ k using the training set T . 2. Learning as optimal strategy selection p XK ( x , k ) was the true (and 3. Solve the classifier design problem as if the estimated ˆ Several surrogate criteria unknown) p XK ( x , k ) . Learning revisited Unsupervised Learning Clustering Summary P. Pošík c � 2013 Artificial Intelligence – 4 / 48

Learning as parameter estimation Learning Assume p XK ( x , k ) has a particular form (e.g. Gaussian, mixture of Gaussians, 1. Decision strategy design piece-wise constant) with a small number of parameters Θ k . Learning as parameter estimation Estimate the values of parameters Θ k using the training set T . 2. Learning as optimal strategy selection p XK ( x , k ) was the true (and 3. Solve the classifier design problem as if the estimated ˆ Several surrogate criteria unknown) p XK ( x , k ) . Learning revisited Unsupervised Learning Clustering Pros and cons: Summary If the true p XK ( x , k ) does not have the assumed form, the resulting strategy q ′ ( x ) ✔ can be arbitrarilly bad, even if the training set size L approaches infinity. Implementation is often straightforward, especially if the parameters Θ k are ✔ assumed to be independent for each class ( naive bayes classifier ). P. Pošík c � 2013 Artificial Intelligence – 4 / 48

Learning as optimal strategy selection Learning Choose a class Q of strategies q Θ : X → D . The class Q is usually given as a ✔ Decision strategy design parametrized set of strategies of the same kind, i.e. q Θ ( x , Θ 1 , . . . , Θ | K | ) . Learning as parameter estimation ✔ The problem can be formulated as a non-Bayesian task with non-random Learning as optimal strategy selection interventions: Several surrogate criteria The unknown parameters Θ k are the non-random interventions. Learning revisited ✘ Unsupervised Learning The probabilities p X | K , Θ ( x | k , Θ k ) must be known. ✘ Clustering ✘ The solution may be e.g. such a strategy that minimizes the maximal Summary probability of incorrect decision over all Θ k , i.e. strategy that minimizes the probability of incorrect decision in case of the worst possible parameter settings. ✘ But even this minimal probability may not be low enough—this happens especially in cases when the class Q of strategies is too broad. ✘ It is necessary to narrow the set of possible strategies using additional information—the training (multi)set T . Learning then amounts to selecting a particular strategy q ∗ ✔ Θ from the a priori known set Q using the information provided as training set T . Natural criterion for the selection of one particular strategy is the risk R ( q Θ ) , ✘ but it cannot be computed because p XK ( x , k ) is unknown. The strategy q ∗ Θ ∈ Q is chosen by minimizing some other surrogate criterion ✘ on the training set which approximates R ( q Θ ) . The choice of the surrogate criterion determines the learning paradigm . ✘ P. Pošík c � 2013 Artificial Intelligence – 5 / 48

Several surrogate criteria All the following surrogate criteria can be computed using the training data T . Learning as parameter estimation ✔ according to the maximum likelihood . ✔ according to a non-random training set . Learning as optimal strategy selection by minimization of the empirical risk . ✔ ✔ by minimization of the structural risk . P. Pošík c � 2013 Artificial Intelligence – 6 / 48

Supervised and unsupervised learning. Petr Pok Czech Technical - PowerPoint PPT Presentation

Supervised and unsupervised learning. Petr Pok Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics This lecture is based on the book Ten Lectures on Statistical and Structural Pattern Recognition by

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Martin Emms September 20, 2019 4CSLL5

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Current State of Unsupervised Deep Learning William Falcon, PhD Student AGENDA AGENDA

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Supervised Learning Prof. Kuan-Ting Lai 2020/4/9 Machine Learning Supervised Unsupervised

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

Self-Organization in Autonomous Sensor/Actuator Networks [SelfOrg] Dr.-Ing. Falko Dressler

Relational learning with many relations Guillaume Obozinski Laboratoire dInformatique Gaspard

Evaluating the impact of an intensive Inspire . education workshop on evidence-informed decision

Pre-m aster psychology Evelien Wolthuis Study Adviser Pre-master and MSc Discover the world at

Reduction of complexity in an heterogeneous population with two timescales Colloque Jeunes

Highlights in physics Highlights in physics today: today: 100 years after the 100 years after

P -values are really quite nifty The P-Value Controversy: Where Do We Go from Here? 2019 Joint

Solutions to exercise 3 Rasmus Waagepetersen October 16, 2019 Exercise 3 (rats) We plot