CS 6316 Machine Learning Generative Models Yangfeng Ji Department - PowerPoint PPT Presentation

MLE: Gaussian Distribution The definition of multi-variate Gaussian distribution exp � 2 ( x − µ ) T Σ − 1 ( x − µ ) � 1 − 1 q ( x | y ; µ , Σ ) � (18) � ( 2 π ) d | Σ | ◮ For y � + 1 , MLE on µ + and Σ + will only consider the samples x with y � + 1 (assume it’s S + ) ◮ MLE on µ + � 1 µ � (19) x i | S + | x i ∈ S + ◮ MLE on Σ + � 1 ( x i − µ )( x i − µ ) T (20) Σ + � | S + | x i ∈ S + ◮ Exercise : prove equations 19 and 20 with d � 1 15

Example: Parameter Estimation Given N � 1000 samples, here are the parameters Parameter p (·) q (·) [ 2 , 0 ] T [ 1 . 95 , − 0 . 11 ] T µ + � 1 . 0 � 0 . 88 � � 0 . 8 0 . 74 Σ + 0 . 8 2 . 0 0 . 74 1 . 97 [− 2 , 0 ] T [− 2 . 08 , 0 . 08 ] T µ - � 2 . 0 � 1 . 88 � � 0 . 6 0 . 55 Σ - 0 . 6 1 . 0 0 . 55 1 . 07 16

Prediction ◮ For a new data point x ′ , the prediction is given as q ( y ′ | x ′ ) � q ( y ′ ) q ( x | y ′ ) ∝ q ( y ′ ) q ( x ′ | y ′ ) (21) q ( x ′ ) No need to compute q ( x ′ ) 17

Prediction ◮ For a new data point x ′ , the prediction is given as q ( y ′ | x ′ ) � q ( y ′ ) q ( x | y ′ ) ∝ q ( y ′ ) q ( x ′ | y ′ ) (21) q ( x ′ ) No need to compute q ( x ′ ) ◮ Prediction rule � + 1 q ( y ′ � + 1 | x ′ ) > q ( y ′ � − 1 | x ′ ) y ′ � (22) q ( y ′ � + 1 | x ′ ) < q ( y ′ � + 1 | x ′ ) − 1 ◮ Although equation 22 looks like the one used in the Bayes optimal predictor, the prediction power is limited by q ( y ′ | x ′ ) ≈ p ( y | x ) (23) Again, we don’t know p (·) 17

Naive Bayes Classifiers

Number of Parameters Assume x � ( x · , 1 , . . . , x · , d ) ∈ R d , then the number of parameters in q ( x , y ) ◮ q ( y ) : 1 ( α ) ◮ q ( x | y � + 1 ) : ◮ µ + ∈ R d : d parameters ◮ Σ + ∈ R d × d : d 2 parameters ◮ q ( x | y � − 1 ) : d 2 + d parameters In total, we have 2 d 2 + 2 d + 1 parameters 19

Challenge of Parameter Estimation ◮ When d � 100 , we have 2 d 2 + 2 d + 1 � 20201 parameters ◮ A close look about the covariance matrix Σ in a multivariate Gaussian distribution  σ 2 σ 2  · · ·   1 , 1 1 , d   . . ... . . Σ � (24)   . .    σ 2 σ 2  · · ·   d , 1 d , d 20

Challenge of Parameter Estimation ◮ When d � 100 , we have 2 d 2 + 2 d + 1 � 20201 parameters ◮ A close look about the covariance matrix Σ in a multivariate Gaussian distribution  σ 2 σ 2  · · ·   1 , 1 1 , d   . . ... . . Σ � (24)   . .    σ 2 σ 2  · · ·   d , 1 d , d ◮ To reduce the number of parameters, we assume σ i , j � 0 if i � j (25) 20

Diagonal Covariance Matrix With the diagonal covariance matrix  σ 2  · · · 0   1 , 1   . . ... . . Σ � (26)   . .    σ 2  0 · · ·   d , d Now, the multivariate Gaussian distribution can be rewritten with d � σ 2 | Σ | (27) � j , j j � 1 d � ( x · , j − µ j ) 2 ( x − µ ) T Σ − 1 ( x − µ ) (28) � σ 2 j , j j � 1 21

Diagonal Covariance Matrix (II) In other words d � q ( x · , j | y ; µ j , σ 2 q ( x | y , µ , Σ ) � j , j ) (29) j � 1 22

Diagonal Covariance Matrix (II) In other words d � q ( x · , j | y ; µ j , σ 2 q ( x | y , µ , Σ ) � j , j ) (29) j � 1 ◮ Conditional Independence : Equation 29 means, given y , each component x j is independent of other components 22

Diagonal Covariance Matrix (II) In other words d � q ( x · , j | y ; µ j , σ 2 q ( x | y , µ , Σ ) � j , j ) (29) j � 1 ◮ Conditional Independence : Equation 29 means, given y , each component x j is independent of other components ◮ This is a strong and naive assumption about q ( x | ·) 22

Diagonal Covariance Matrix (II) In other words d � q ( x · , j | y ; µ j , σ 2 q ( x | y , µ , Σ ) � j , j ) (29) j � 1 ◮ Conditional Independence : Equation 29 means, given y , each component x j is independent of other components ◮ This is a strong and naive assumption about q ( x | ·) ◮ Together with q ( y ) , this generative model is called the Naive Bayes classifier 22

Diagonal Covariance Matrix (II) In other words d � q ( x · , j | y ; µ j , σ 2 q ( x | y , µ , Σ ) � j , j ) (29) j � 1 ◮ Conditional Independence : Equation 29 means, given y , each component x j is independent of other components ◮ This is a strong and naive assumption about q ( x | ·) ◮ Together with q ( y ) , this generative model is called the Naive Bayes classifier ◮ Parameter estimation can be done per dimension 22

Example: Parameter Estimation Given N � 1000 samples, here are the parameters Parameter p (·) q (·) Naive Bayes [ 2 , 0 ] T [ 1 . 95 , − 0 . 11 ] T [ 1 . 95 , − 0 . 11 ] T µ + � 1 . 0 � 0 . 88 � 0 . 88 � � � 0 . 8 0 . 74 0 Σ + 0 . 8 2 . 0 0 . 74 1 . 97 0 1 . 97 [− 2 , 0 ] T [− 2 . 08 , 0 . 08 ] T [− 2 . 08 , 0 . 08 ] T µ - � 2 . 0 � 1 . 88 � 1 . 88 � � � 0 . 6 0 . 55 0 Σ - 0 . 6 1 . 0 0 . 55 1 . 07 0 1 . 07 23

Latent Variable Models

Data Generation Model, Revisited Consider the following model again without any label information p ( x ) � α · N ( x ; µ 1 , Σ 1 ) + ( 1 − α ) · N ( x ; µ 2 , Σ 2 ) (30) � �� c � 1 c � 2 25

Data Generation Model, Revisited Consider the following model again without any label information p ( x ) � α · N ( x ; µ 1 , Σ 1 ) + ( 1 − α ) · N ( x ; µ 2 , Σ 2 ) (30) � �� c � 1 c � 2 ◮ No labeling information ◮ Instead of having two classes, now it has two components c ∈ { 1 , 2 } 25

Data Generation Model, Revisited Consider the following model again without any label information p ( x ) � α · N ( x ; µ 1 , Σ 1 ) + ( 1 − α ) · N ( x ; µ 2 , Σ 2 ) (30) � �� c � 1 c � 2 ◮ No labeling information ◮ Instead of having two classes, now it has two components c ∈ { 1 , 2 } ◮ It is a specific case of Gaussian mixture models ◮ A mixture model with two Gaussian components 25

Data Generation The data generation process: for each data point 1. Randomly select a component c based on p ( c � 1 ) � α p ( c � 2 ) � 1 − α (31) 26

Data Generation The data generation process: for each data point 1. Randomly select a component c based on p ( c � 1 ) � α p ( c � 2 ) � 1 − α (31) 2. Sample x from the corresponding component c � N ( x ; µ 1 , Σ 1 ) c � 1 p ( x | y ) � (32) N ( x ; µ 2 , Σ 2 ) c � 2 26

Data Generation The data generation process: for each data point 1. Randomly select a component c based on p ( c � 1 ) � α p ( c � 2 ) � 1 − α (31) 2. Sample x from the corresponding component c � N ( x ; µ 1 , Σ 1 ) c � 1 p ( x | y ) � (32) N ( x ; µ 2 , Σ 2 ) c � 2 3. Add x to S , go to step 1 26

Illustration Here is an example data set S with 1,000 samples No label information available 27

The Learning Problem Consider using the following distribution to fit the data S q ( x ) � α · N ( x ; µ 1 , Σ 1 ) + ( 1 − α ) · N ( x ; µ 2 , Σ 2 ) (33) 28

The Learning Problem Consider using the following distribution to fit the data S q ( x ) � α · N ( x ; µ 1 , Σ 1 ) + ( 1 − α ) · N ( x ; µ 2 , Σ 2 ) (33) ◮ This is a density estimation problem — one of the unsupervised learning problems 28

The Learning Problem Consider using the following distribution to fit the data S q ( x ) � α · N ( x ; µ 1 , Σ 1 ) + ( 1 − α ) · N ( x ; µ 2 , Σ 2 ) (33) ◮ This is a density estimation problem — one of the unsupervised learning problems ◮ The number of components in q ( x ) is part of the assumption based on our understanding about the data 28

The Learning Problem Consider using the following distribution to fit the data S q ( x ) � α · N ( x ; µ 1 , Σ 1 ) + ( 1 − α ) · N ( x ; µ 2 , Σ 2 ) (33) ◮ This is a density estimation problem — one of the unsupervised learning problems ◮ The number of components in q ( x ) is part of the assumption based on our understanding about the data ◮ Without knowing the true data distribution, the number of components is treated as a hyper-parameter (predetermined before learning) 28

Parameter Estimation ◮ Based on the general form of GMMs, the parameters are θ � { α, µ 1 , Σ 1 , µ 2 , Σ 2 } ◮ Given a set of training example S � { x 1 , . . . , x m } , the straightforward method is MLE m � L ( θ ) log q ( x i ; θ ) � i � 1 m � � log α · N ( x i ; µ 1 , Σ 1 ) � i � 1 � + ( 1 − α ) · N ( x i ; µ 2 , Σ 2 ) (34) ◮ Learning: θ ← argmax θ ′ L ( θ ′ ) 29

Singularity in GMM Parameter Estimation Singularity happens when one of the mixture component only captures a single data point, which eventually leads the (log-)likelihood to ∞ 30

Singularity in GMM Parameter Estimation Singularity happens when one of the mixture component only captures a single data point, which eventually leads the (log-)likelihood to ∞ ◮ It is easy to overfit the training set using GMMs, for example when K � m 30

Singularity in GMM Parameter Estimation Singularity happens when one of the mixture component only captures a single data point, which eventually leads the (log-)likelihood to ∞ ◮ It is easy to overfit the training set using GMMs, for example when K � m ◮ This issue does not exist when estimating parameters for a single Gaussian distribution 30

Gradient-based Learning Recall the definition of L ( θ ) m � � � L ( θ ) � log α · N ( x i ; µ 1 , Σ 1 ) + ( 1 − α )· N ( x i ; µ 2 , Σ 2 ) (35) i � 1 ◮ There is no closed form solution of ∇ L ( θ ) � 0 ◮ E.g., the value of α depends on { µ c , Σ c } 2 c � 1 , vice versa ◮ Gradient-based learning is still feasible as θ ( new ) ← θ ( old ) + η · ∇ L ( θ ) 31

Latent Variable Models To rewrite equation 33 into a full probabilistic form, we introduce a random variable z ∈ { 1 , 2 } , with q ( z � 1 ) � α q ( z � 2 ) � 1 − α (36) or q ( z ) � α δ ( z � 1 ) ( 1 − α ) δ ( z � 2 ) (37) 32

Latent Variable Models To rewrite equation 33 into a full probabilistic form, we introduce a random variable z ∈ { 1 , 2 } , with q ( z � 1 ) � α q ( z � 2 ) � 1 − α (36) or q ( z ) � α δ ( z � 1 ) ( 1 − α ) δ ( z � 2 ) (37) ◮ z is a random variable and indicates the mixture component for x (a similar role as y in the classification problem) 32

Latent Variable Models To rewrite equation 33 into a full probabilistic form, we introduce a random variable z ∈ { 1 , 2 } , with q ( z � 1 ) � α q ( z � 2 ) � 1 − α (36) or q ( z ) � α δ ( z � 1 ) ( 1 − α ) δ ( z � 2 ) (37) ◮ z is a random variable and indicates the mixture component for x (a similar role as y in the classification problem) ◮ z is not directly observed in the data, therefore it is a latent (random) variable. 32

GMM with Latent Variable With latent variable z , we can rewrite the probabilistic model as a joint distribution over x and z q ( x , z ) q ( z ) q ( x | z ) � α δ ( z � 1 ) · N ( x ; µ 1 , Σ 1 ) δ ( z � 1 ) � · ( 1 − α ) δ ( z � 2 ) · N ( x ; µ 2 , Σ 2 ) δ ( z � 2 ) (38) 33

GMM with Latent Variable With latent variable z , we can rewrite the probabilistic model as a joint distribution over x and z q ( x , z ) q ( z ) q ( x | z ) � α δ ( z � 1 ) · N ( x ; µ 1 , Σ 1 ) δ ( z � 1 ) � · ( 1 − α ) δ ( z � 2 ) · N ( x ; µ 2 , Σ 2 ) δ ( z � 2 ) (38) And the marginal probability p ( x ) is the same as in equation 33 q ( x ) q ( z � 1 ) q ( x | z � 1 ) + q ( z � 2 ) q ( x | z � 2 ) � α · N ( x ; µ 1 , Σ 1 ) + ( 1 − α ) · N ( x ; µ 2 , Σ 2 ) (39) � 33

Parameter Estimation: MLE? For each x i , we introduce a latent variable z i as mixture component indicator, then the log likelihood is defined as m � ℓ ( θ ) log q ( x i , z i ) � i � 1 m � � α δ ( z i � 1 ) · N ( x i ; µ 1 , Σ 1 ) δ ( z i � 1 ) log � i � 1 · ( 1 − α ) δ ( z i � 2 ) · N ( x i ; µ 2 , Σ 2 ) δ ( z i � 2 ) � (40) m � � δ ( z i � 1 ) log α + δ ( z i � 1 ) log N ( x i ; µ 1 , Σ 1 ) � i � 1 � δ ( z i � 2 ) log ( 1 − α ) + δ ( z i � 2 ) log N ( x i ; µ 2 , Σ 2 ) 34

Parameter Estimation: MLE? For each x i , we introduce a latent variable z i as mixture component indicator, then the log likelihood is defined as m � ℓ ( θ ) log q ( x i , z i ) � i � 1 m � � α δ ( z i � 1 ) · N ( x i ; µ 1 , Σ 1 ) δ ( z i � 1 ) log � i � 1 · ( 1 − α ) δ ( z i � 2 ) · N ( x i ; µ 2 , Σ 2 ) δ ( z i � 2 ) � (40) m � � δ ( z i � 1 ) log α + δ ( z i � 1 ) log N ( x i ; µ 1 , Σ 1 ) � i � 1 � δ ( z i � 2 ) log ( 1 − α ) + δ ( z i � 2 ) log N ( x i ; µ 2 , Σ 2 ) Question : we have already know that z i is a random variable, but E [ z i � 1 ] � α ? 34

EM Algorithm

Basic Idea ◮ The key challenge of GMM with latent variables is that we do not know the distributions of { z i } 36

Basic Idea ◮ The key challenge of GMM with latent variables is that we do not know the distributions of { z i } ◮ The basic idea of the EM algorithm is to alternatively address the challenge between { z i } m i � 1 ⇔ θ � { α, µ 1 , Σ 1 , µ 2 , Σ 2 } (41) 36

Basic Idea ◮ The key challenge of GMM with latent variables is that we do not know the distributions of { z i } ◮ The basic idea of the EM algorithm is to alternatively address the challenge between { z i } m i � 1 ⇔ θ � { α, µ 1 , Σ 1 , µ 2 , Σ 2 } (41) ◮ Basic procedure 1. Fix θ , estimate the distributions of { z i } m i � 1 2. Fix the distribution of { z i } m i � 1 , estimate the value of θ 3. Go back to step 1 36

How to Estimate z i ? Fix θ , we can estimate the distribution of each z i as (with equation 38 and 39) q ( x i , z i ) q ( z i | x i ) � q ( x i ) (42) Particularly, we have α · N ( x i ; µ 1 , Σ 1 ) q ( z i � 1 | x i ) � α · N ( x i ; µ 1 , Σ 1 ) + ( 1 − α ) · N ( x i ; µ 2 , Σ 2 ) (43) 37

Expectation Let γ i be the expectation of z i under the distribution of q ( z i | x i ) E [ z i ] � γ i (44) 38

Expectation Let γ i be the expectation of z i under the distribution of q ( z i | x i ) E [ z i ] � γ i (44) ◮ Since z i is a Bernoulli random variable, we also have q ( z i � 1 | x i ) � γ i 38

Expectation Let γ i be the expectation of z i under the distribution of q ( z i | x i ) E [ z i ] � γ i (44) ◮ Since z i is a Bernoulli random variable, we also have q ( z i � 1 | x i ) � γ i ◮ Furthermore, the expectation of δ ( z i � 1 ) under the distribution of q ( z i | x i ) E [ δ ( z i � 1 )] δ ( z i � 1 ) · q ( z i � 1 | x i ) � + δ ( z i � 1 ) · q ( z i � 2 | x i ) q ( z i � 1 ) � γ i (45) � 38

Parameter Estimation (I) Given m � � ℓ ( θ ) � δ ( z i � 1 ) log α + δ ( z i � 1 ) log N ( x i ; µ 1 , Σ 1 ) (46) i � 1 � δ ( z i � 2 ) log ( 1 − α ) + δ ( z i � 2 ) log N ( x i ; µ 2 , Σ 2 ) 39

Parameter Estimation (I) Given m � � ℓ ( θ ) � δ ( z i � 1 ) log α + δ ( z i � 1 ) log N ( x i ; µ 1 , Σ 1 ) (46) i � 1 � δ ( z i � 2 ) log ( 1 − α ) + δ ( z i � 2 ) log N ( x i ; µ 2 , Σ 2 ) To maximize ℓ ( θ ) with respect to α we have � δ ( z i � 1 ) m � � − δ ( z i � 2 ) � 0 (47) α 1 − α i � 1 39

Parameter Estimation (I) Given m � � ℓ ( θ ) � δ ( z i � 1 ) log α + δ ( z i � 1 ) log N ( x i ; µ 1 , Σ 1 ) (46) i � 1 � δ ( z i � 2 ) log ( 1 − α ) + δ ( z i � 2 ) log N ( x i ; µ 2 , Σ 2 ) To maximize ℓ ( θ ) with respect to α we have � δ ( z i � 1 ) m � � − δ ( z i � 2 ) � 0 (47) α 1 − α i � 1 and � m � m i � 1 δ ( z i � 1 ) i � 1 δ ( z i � 1 ) α | z � (48) � m i � 1 ( δ ( z i � 1 ) + δ ( z i � 2 )) � m which is similar to the classification example, except that z i is a random variable 39

Parameter Estimation (II) Without going through the details, the estimate of mean and covariance take the similar forms. For example, for the first component, we have m � 1 µ 1 | z δ ( z i � 1 ) x i (49) � m i � 1 m � 1 δ ( z i � 1 )( x i − µ 1 )( x i − µ 1 ) T Σ 1 | z (50) � m i � 1 40

Parameter Estimation (II) Without going through the details, the estimate of mean and covariance take the similar forms. For example, for the first component, we have m � 1 µ 1 | z δ ( z i � 1 ) x i (49) � m i � 1 m � 1 δ ( z i � 1 )( x i − µ 1 )( x i − µ 1 ) T Σ 1 | z (50) � m i � 1 Question : how to eliminate the randomness in α , µ 1 , Σ 1 (and similarly in µ 2 , Σ 2 )? 40

Expectation (II) With E [ δ ( z i � 1 )] � γ i , we have m � E [ α | z ] � 1 E [ δ ( z i � 1 )] x i α � m i � 1 m � 1 (51) γ i x i � m i � 1 41

Expectation (II) With E [ δ ( z i � 1 )] � γ i , we have m � E [ α | z ] � 1 E [ δ ( z i � 1 )] x i α � m i � 1 m � 1 (51) γ i x i � m i � 1 Similarly, we have m m � � µ 1 � 1 µ 2 � 1 γ i x i ( 1 − γ i ) x i m m i � 1 i � 1 m � 1 γ i ( x i − µ 1 )( x i − µ 1 ) T Σ 1 � m i � 1 � m 1 ( 1 − γ i )( x i − µ 2 )( x i − µ 2 ) T (52) Σ 2 41 � m i � 1

The EM Algorithm, Review The algorithm iteratively run the following two steps: E-step Given θ , for each x i , estimate the distribution of the corresponding latent variable z i q ( z i | x i ) � q ( x i , z i ) (53) q ( x i ) and its expectation γ i M-step Given { z i } m i � 1 , maximize the log-likelihood function ℓ ( θ ) and estimate the parameter θ with { γ i } m i � 1 42

Illustration [Bishop, 2006, Page 437] 43

Variational Inference (Optional)

The Computation of q ( z | x ) ◮ In the previous example, we were able to compute the analytic solution of q ( z | x ) as q ( z | x ) � q ( x , z ) (54) q ( x ) where q ( x ) � � z q ( x , z ) ◮ Challenge : Unlike the simple case in GMMs, usually q ( x ) is difficult to compute � q ( x ) q ( x , z ) discrete (55) � z ∫ q ( x , z ) dz continuous (56) � z 45

KL Divergence ◮ KL ( q ′ � q ) ≥ 0 and the equality holds if and only if q ′ � q 47

KL Divergence ◮ KL ( q ′ � q ) ≥ 0 and the equality holds if and only if q ′ � q ◮ Consider the continuous case for the visualization purpose. ∫ q ′ ( z | x ) log q ′ ( z | x ) KL ( q ′ � q ) � q ( z | x ) dz (58) z 47

KL Divergence ◮ KL ( q ′ � q ) ≥ 0 and the equality holds if and only if q ′ � q ◮ Consider the continuous case for the visualization purpose. ∫ q ′ ( z | x ) log q ′ ( z | x ) KL ( q ′ � q ) � q ( z | x ) dz (58) z ◮ Regardless what q ( z | x ) looks like, we decide to define q ′ ( z | x ) for simplicity 47

KL Divergence ◮ KL ( q ′ � q ) ≥ 0 and the equality holds if and only if q ′ � q ◮ Consider the continuous case for the visualization purpose. ∫ q ′ ( z | x ) log q ′ ( z | x ) KL ( q ′ � q ) � q ( z | x ) dz (58) z ◮ Regardless what q ( z | x ) looks like, we decide to define q ′ ( z | x ) for simplicity ◮ Because of q ( z | x ) in equation 58, the challenge still 47 exists

ELBo The learning objective for q ′ ( z | x ) is ∫ q ′ ( z | x ) log q ′ ( z | x ) KL ( q ′ � q ) q ( z | x ) dz � z 48

CS 6316 Machine Learning Generative Models Yangfeng Ji Department - PowerPoint PPT Presentation

CS 6316 Machine Learning Generative Models Yangfeng Ji Department of Computer Science University of Virginia Basic Definition Data generation process An idealized process to illustrate the relations among domain set X , label set Y , and the

CS 6316 Machine Learning The Bias-Complexity Tradeoff Yangfeng Ji Department of Computer Science

CS 6316 Machine Learning Introduction to Learning Theory Yangfeng Ji Department of Computer

CS 6316 Machine Learning Neural Networks Yangfeng Ji Department of Computer Science University

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

CS 6316 Machine Learning Boosting Yangfeng Ji Department of Computer Science University of

CS 6316 Machine Learning Clustering Yangfeng Ji Department of Computer Science University of

CS 6316 Machine Learning Support Vector Machines and Kernel Meth- ods Yangfeng Ji Department of

CS 6316 Machine Learning Dimensionality Reduction Yangfeng Ji Department of Computer Science

CS 6316 Machine Learning Review of Linear Algebra and Probability Yangfeng Ji Department of

CS 6316 Machine Learning Linear Predictors Yangfeng Ji Department of Computer Science

CS 6316 Machine Learning Model Selection and Validation Yangfeng Ji Department of Computer

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Nave Bayes Classifiers Lirong Xia Friday, April 8, 2014 Projects Project 3 average: 21.03

Introduction to Bayesian Statistics and an Application and an Application Timothy M. Bahr

Breaking the Circuit-Size Barrier in Secret Sharing Tianren Liu Vinod Vaikuntanathan MIT MIT

The NSA sieving circuit D. J. Bernstein University of Illinois at Chicago NSF DMS9970409

Clermont works during the April expert week CERN (2014 April 14-15) Romeo Bonnefoy and

MEG Status and Prospects Toshiyuki Iwamoto Paul Scherrer Institute Physics e

GausHit Finder Updates Tracy Usher LArSoft Coordination Meeting June 19, 2018 The Short

Towards an Operational Semantics of the Simulation Engine of Simulink Alexandre Chapoutot joint