Projects Chandrasekar, Arun Kumar, Group 17 Nearly all group have - PowerPoint PPT Presentation

Projects • Chandrasekar, Arun Kumar, Group 17 • Nearly all group have submitted a proposal • May 21: Each person gives one slide, 15 min/group.

First principles vs Data driven Small data Big data to train Data High reliance on domain Results with little domain Domain expertise expertise knowledge Universal link can handle non- Limited by the range of values Fidelity/ linear complex relations spanned by training data Robustness Complex and time consuming Rapidly adapt to new problems Adaptability derivation to use new relations Parameters are physical! Physically agnostic, limited by Interpretability the rigidity of the functional form Perceived Importance. SIO SP Peter Google

Machine learning versus knowledge based

Supervised learning Supervised y = w T x ! " , $ " , ! % , $ % , ! & , $ & Training set We are given the two classes.

Unsupervised learning Supervised y=wx " , ! " $ , ! " " , ! $ $ , ! $ % , ! $ % ! " Training set

Unsupervised learning Unsupervised machine learning is inferring a function to describe hidden structure from "unlabeled" data (a classification or categorization is not included in the observations). Since the examples given to the learner are unlabeled, there is no evaluation of the accuracy of the structure that is output by the relevant algorithm—which is one way of distinguishing unsupervised learning from supervised learning. We are not interested in prediction Supervised learning : all classification and regression. ! = # $ % Prediction is important.

Unsupervised learning • Unsupervised learning is more subjective than supervised learning, as there is no simple goal for the analysis, such as prediction of a response. • But techniques for unsupervised learning are of growing importance in several fields: – subgroups of breast cancer patients grouped by their gene expression measurements, – groups of shoppers characterized by their browsing and purchase histories, – movies grouped by the ratings assigned by movie viewers. • It is often easier to obtain unlabeled data — from a lab instrument or a computer — than labeled data, which can require human intervention. – For example it is difficult to automatically assess the overall sentiment of a movie review: is it favorable or not?

Kmeans • Input : Points x 1 ,..., x N ∈ R p ; integer K • Output : “Centers”, or representatives, μ 1 ,..., μ K ∈ R p • Output also z 1 ,..., z N ∈ R K Goal : Minimize average squared distance between points and their nearest representatives: 0 • #$%& ' ( , … , ' + = ∑ ./( min 5 . − ' 4 X 4 The centers carve R p up into k convex regions: µ j ’s region consists of points for which it is the closest center.

K-means N K � � r nk ∥ x n − µ k ∥ 2 J = (9.1) n =1 k =1 sum of the squares of the distances of each data point to its Solving for r nk � if k = arg min j ∥ x n − µ j ∥ 2 1 r nk = (9.2) 0 otherwise . Differentiating for ! " consider the optimization of the with the held fixed. The objective N � 2 r nk ( x n − µ k ) = 0 (9.3) n =1 which we can easily solve for µ k to give � n r nk x n µ k = n r nk . (9.4) � The denominator in this expression is equal to the number of points assigned to

K-means

Old Faithful, Kmeans from Murphy

Example Data Step 1 Iteration 1, Step 2a The progress of the K-means algorithm with K =3. • Top left: The observations are shown. • Top center: In Step 1 of the algorithm, each observation is randomly assigned to a cluster. • Top right: In Step 2(a), the cluster centroids are computed. These are shown as large colored disks. Initially the Iteration 1, Step 2b Iteration 2, Step 2a Final Results centroids are almost completely overlapping because the initial cluster assignments were chosen at random. • Bottom left: In Step 2(b), each observation is assigned to the nearest centroid. • Bottom center: Step 2(a) is once again performed, leading to new cluster centroids. • Bottom right: The results obtained after 10 iterations. Likely From Hastie book

Different starting values 320.9 235.8 235.8 K -means clustering performed six times on the data from previous figure with K = 3, each time with a di ff erent random assignment of the observations in Step 1 of the K -means algorithm. Above each plot is the value of the objective (4). Three di ff erent local optima were obtained, one of which resulted in a smaller value of the objective and provides better separation between the clusters. 235.8 235.8 310.9 Those labeled in red all achieved the same best solution, with an objective value of 235.8 Likely From Hastie book

Vector Quantization VQ Murphy book Fig 11.12 vqdemo.m Each pixel x i is represented By codebook of K entries ! " Encode( x i )= argmin ) * − ! " " Consider N=64k observations, of D=1 (b/w) dimension, C=8 bit NC=513k Nlog 2 K+KC bits is needed K=4 gives 128k a factor 4.

Mixtures of Gaussians (1) Old Faithful geyser: The time between eruptions has a bimodal distribution, with the mean interval being either 65 or 91 minutes, and is dependent on the length of the prior eruption. Within a margin of error of ±10 minutes, Old Faithful will erupt either 65 minutes after an eruption lasting less than 2 1 ⁄ 2 minutes, or 91 minutes after an eruption lasting more than 2 1 ⁄ 2 minutes. Single Gaussian Mixture of two Gaussians

Mixtures of Gaussians (2) Combine simple models into a complex model: Component Mixing coefficient K=3

Mixtures of Gaussians (3)

• Gaussian mixture ' ( & )(#; , & , Σ & ) – " # = ∑ & • Latent variable: – Un-observed – Often hidden • Here " 0 & = ( & p(z)p(x|z) N iid { x n } with latent { z n }

! " # $ = 1 = '("; * $ , , $ ) ! " . = !(", .) = !(") = Responsibilities / # $ = ! # $ = 1 " =

Mixture of Gaussians • Mixtures of Gaussians K � p ( x ) = π k N ( x | µ k , Σ k ) . k =1 • Expressed with latent variable z -dimensional binary random variable z having K � � p ( z ) p ( x | z ) = π k N ( x | µ k , Σ k ) p ( x ) = k =1 z • Posterior probability: responsibility p ( z k = 1) p ( x | z k = 1) γ ( z k ) ≡ p ( z k = 1 | x ) = K � p ( z j = 1) p ( x | z j = 1) j =1 π k N ( x | µ k , Σ k ) = . K � π j N ( x | µ j , Σ j ) j =1 p(z)p(x|z) N iid { x n } with latent { z n }

Max Likelihood ' ( & )(#; , & , Σ & ) • " # = ∑ & • ) observations X ' ( & )(# D ; , & , Σ & )] • ln[" =|?, @, Σ ] = ∏ C ln[∑ & p ( x ) com- Gaus- N iid { x n } with latent { z n } x

1. Initialize the means µ k , covariances Σ k and mixing coefficients π k , and EM Gauss Mix evaluate the initial value of the log likelihood. 2. E step . Evaluate the responsibilities using the current parameter values π k N ( x n | µ k , Σ k ) γ ( z nk ) = . (9.23) K � π j N ( x n | µ j , Σ j ) j =1 3. M step . Re-estimate the parameters using the current responsibilities N 1 � µ new = γ ( z nk ) x n (9.24) k N k n =1 N 1 � ) T Σ new γ ( z nk ) ( x n − µ new ) ( x n − µ new = (9.25) k k k N k n =1 N k π new = (9.26) k N where N � N k = γ ( z nk ) . (9.27) n =1 4. Evaluate the log likelihood � K N � � � ln p ( X | µ , Σ , π ) = ln π k N ( x n | µ k , Σ k ) (9.28) n =1 k =1 and check for convergence of either the parameters or the log likelihood. If the convergence criterion is not satisfied return to step 2.

General EM Given a joint distribution p ( X , Z | θ ) over observed variables X and latent variables Z , governed by parameters θ , the goal is to maximize the likelihood function p ( X | θ ) with respect to θ . 1. Choose an initial setting for the parameters θ old . 2. E step Evaluate p ( Z | X , θ old ) . 3. M step Evaluate θ new given by θ new = arg max Q ( θ , θ old ) (9.32) θ where � Q ( θ , θ old ) = p ( Z | X , θ old ) ln p ( X , Z | θ ) . (9.33) Z 4. Check for convergence of either the log likelihood or the parameter values. If the convergence criterion is not satisfied, then let θ old ← θ new (9.34) and return to step 2.

Gaussian Mixtures

Hierarchical Clustering • K -means clustering requires us to pre-specify the number of clusters K . This can be a disadvantage (later we discuss strategies for choosing K ) • Hierarchical clustering is an alternative approach which does not require that we commit to a particular choice of K . • In this section, we describe bottom-up or agglomerative clustering. This is the most common type of hierarchical clustering, and refers to the fact that a dendrogram is built starting from the leaves and combining clusters up to the trunk.

Projects Chandrasekar, Arun Kumar, Group 17 Nearly all group have - PowerPoint PPT Presentation

Projects Chandrasekar, Arun Kumar, Group 17 Nearly all group have submitted a proposal May 21: Each person gives one slide, 15 min/group. First principles vs Data driven Small data Big data to train Data High reliance on domain

INTERNATIONAL 2018 CALL FOR PROJECTS PROJECTS PRESENTATION March 2020 1 International

MEDITERRANEAN 2018 CALL FOR PROJECTS PROJECTS PRESENTATION April 2020 1 Mediterranean

INTERNATIONAL 2018 CALL FOR PROJECTS PROJECTS PRESENTATION June 2020 1 International

CATALYTIC INVESTMENT PROJECTS FEB 2017 CONTENTS 1. CATALYTIC PROJECTS CRITERIA 2. CATALYTIC

Semester projects Semester projects Semester projects Semester projects Principles of Complex

Henin Beaumont Lighting projects meeting 24-02-2014 Domaine de Grenade Lighting projects meeting

Projects Delivery : Developing and delivering Capital Projects Who we are What we do

Outline Semester projects Semester projects The Plan The Plan Principles of Complex Systems

ENKA HOSPITAL PROJECTS Hosptal Projects ENKAs involvement in hospital projects began in

National Projects Requiring HL7 Standards National Projects Requiring HL7 Standards National

Capstone Projects Capstone Projects A.K.A. Service Projects 1-888-SCOUTS-NOW | scouts.ca |

Lt. Colonel Chris Attard AFM Staff Officer 1 EU Projects EXTERNAL BORDER FUNDS 2007-2013 19

Summary of Other BRAID Research Projects Dr. Richard Oster MDSi Final Community Gathering

Narrative hierarchy Semester projects The Plan The Plan Principles of Complex Systems

TRANSPORTATION PROJECTS GREATER BLUE RIDGE CORRIDOR ALL PROJECTS TRINITY ROAD AT ALL PROJECTS YOUTH

58:080 Final Projects Overview of Past Projects University of Iowa Propose and Plan Final

Machine Learning Lecture Notes on Clustering (II) 2017-2018 Davide Eynard davide.eynard@usi.ch

Clustering Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors

Hierarchical Clustering 4-4-16 Hierarchical clustering: the setting Unsupervised learning

Hierarchical Clustering Lecture 15 David Sontag New York

LECTURE 7 Clustering The k-means algorithm Hierarchical Clustering The DBSCAN algorithm

Clustering Algorithms Dalya Baron (Tel Aviv University) XXX Winter School, November 2018

Clustering: Models and Algorithms Shikui Tu 2019-02-28 1 Outline Clustering K-mean

Data Clustering: A Very Brief Overview Serhan Cosar INRIA-STARS Outline Introduction

Sambuz

Useful Links

Newsletter

Mail Us