Machine Learning 10-701 Tom M. Mitchell Machine Learning Department - PDF document

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University March 31, 2011 Today: Readings: Learning representations III • • Deep Belief Networks • ICA • CCA • Neuroscience example • Latent Dirichlet Allocation Deep Belief Networks [Hinton & Salakhutdinov, Science, 2006] • Problem: training networks with many hidden layers doesn’t work very well – local minima, very slow training if initialize with zero weights • Deep belief networks – autoencoder networks to learn low dimensional encodings – but more layers, to learn better encodings 1

Deep Belief Networks [Hinton & Salakhutdinov, 2006] original image reconstructed from 2000-1000-500-30 DBN reconstructed from 2000-300, linear PCA versus logistic transformations linear transformations Encoding of digit images in two dimensions [Hinton & Salakhutdinov, 2006] 784-2 linear encoding (PCA) 784-1000-500-250-2 DBNet 2

Restricted Boltzman Machine • Bipartite graph, logistic activation • Inference: fill in any nodes, estimate other nodes • consider v i , h j are boolean variables h 1 h 2 h 3 v 1 v 2 v n … [Hinton & Salakhutdinov, 2006] Deep Belief Networks: Training 3

Independent Components Analysis (ICA) • PCA seeks orthogonal directions < Y 1 … Y M > in feature space X that minimize reconstruction error • ICA seeks directions < Y 1 … Y M > that are most statistically independent . I.e., that minimize I(Y), the mutual information between the Y j : x x Dimensionality reduction across multiple datasets • Given data sets A and B, find linear projections of each into a common lower dimensional space! – Generalized SVD: minimize sq reconstruction errors of both – Canonical correlation analysis: maximize correlation of A and B in the projected space learned shared representation  data set A  data set B  4

[slide courtesy of Indra Rustandi] An Example Use of CCA Generative theory Generative theory     of word  arbitrary word  predicted brain  representation  activity  5

fMRI activation for “bottle”: bottle fMRI activation Mean activation averaged over 60 different stimuli: high average below “bottle” minus mean activation: average Idea: Predict neural activity from corpus statistics of stimulus word [Mitchell et al.,  Science , 2008]  Generative theory Generative theory  predicted activity  “telephone” for “telephone”  Statistical features  Mapping learned  from a trillion-word  from fMRI data  text corpus  6

Semantic feature values: Semantic feature values: “ celery” “ airplane” 0.8368, eat 0.8673, ride 0.3461, taste 0.2891, see 0.3153, fill 0.2851, say 0.2430, see 0.1689, near 0.1145, clean 0.1228, open 0.0600, open 0.0883, hear 0.0586, smell 0.0771, run 0.0286, touch 0.0749, lift … … … … 0.0000, drive 0.0049, smell 0.0000, wear 0.0010, wear 0.0000, lift 0.0000, taste 0.0000, break 0.0000, rub 0.0000, manipulate 0.0000, ride Predicted Activation is Sum of Feature Contributions “eat” “taste” “fill” Predicted + … Celery = 0.84 + 0.35 + 0.32 f eat (celery) from corpus c 14382,eat statistics learned high low 500,000 learned parameters Predicted “Celery” 7

“celery” “airplane” fMRI activation Predicted: high average Observed: below average Predicted and observed fMRI images for “celery” and “airplane” after training on 58 other words . Evaluating the Computational Model • Train it using 58 of the 60 word stimuli • Apply it to predict fMRI images for other 2 words • Test: show it the observed images for the 2 held-out, and make it predict which is which celery? airplane? 1770 test pairs in leave-2-out: – Random guessing  0.50 accuracy – Accuracy above 0.61 is significant (p<0.05) 8

Q4: What are the actual semantic primitives from which neural encodings are composed? predicted neural  representation  predict neural verb co- word representation occurrence features 25 verb   co-occurrence  counts??!?  Alternative semantic feature sets PREDEFINED corpus features Mean Acc. 25 verb co-occurrences .79 486 verb co-occurrences .79 50,000 word co-occurences .76 300 Latent Semantic Analysis features .73 50 corpus features from Collobert&Weston ICML08 .78 218 features collected using Mechanical Turk* .83 20 features discovered from the data** .87 * developed by Dean Pommerleau ** developed by Indra Rustandi 9

[Rustandi et al., 2009] Discovering shared semantic basis specific to study/subject  predict representation independent of study/subject  subj 1, word+pict  218 base   20 learned   … … features  latent  features  predict representation subj 9, word+pict  predict representation subj 10, word only  word w … … … … predict representation subj 20, word only  learned*          intermediate semantic  features  * trained using Canonical Correlation Analysis Multi-study (WP+WO) Multi-subject (9+11) CCA Top Stimulus Words component 1 component 2 component 3 component 4 apartment screwdriver telephone pants most church pliers butterfly dress active closet refrigerator bicycle glass house knife beetle coat stimuli barn hammer dog chair things that shelter? manipulation? touch me? 10

Subject 1 (Word-Picture stimuli) Multi-study (WP+WO) Multi-subject (9+11) CCA Component 1 Subject 1 (Word-ONLY stimuli) Multi-study (WP+WO) Multi-subject (9+11) CCA Component 1 11

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department - PDF document

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University March 31, 2011 Today: Readings: Learning representations III Deep Belief Networks ICA CCA Neuroscience example Latent

Machine Learning Machine Learning 10 10- -701/15 701/15- -781, Fall 2006 781, Fall 2006

701 HARRISON Planning Commission Hearing April 30th, 2020 701 HARRISON PROJECT SITE ASSESSOR'S

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

10-701 Machine Learning (Spring 2012) Principal Component Analysis Yang Xu This note is partly

9.1 Overview 9 Deep Learning Alexander Smola Introduction to Machine Learning 10-701

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

Conditionals: between language and reasoning Class 2, part 1 - The meta-linguistic theory April

U nit 2: P robability and distributions L ecture 2: N ormal distribution S tatistics 101 Mine C

T2K oscillation experiment: Getting the neutrino out of the bottle. James Dobson - 1st year

nd Conce e 2 nd The e cont ntext ext : The ncern rn for r innov ovati ation on - Case e

Glass Container Manufacturing & Recycling Overview Scott DeFife President Glass Packaging

Actual Causality: A Survey Joe Halpern Cornell University Includes joint work with Judea Pearl

Single Trapped Proton A. Mooser, K. Blaum, S. Bruninger, K. Franke, H. Kracke, C. Leiteritz, W.

Economics 2 Professor Christina Romer Spring 2016 Professor David Romer LECTURE 3 COMPARATIVE

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department - PDF document

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University March 31, 2011 Today: Readings: Learning representations III Deep Belief Networks ICA CCA Neuroscience example Latent

Machine Learning Machine Learning 10 10- -701/15 701/15- -781, Fall 2006 781, Fall 2006

701 HARRISON Planning Commission Hearing April 30th, 2020 701 HARRISON PROJECT SITE ASSESSOR'S

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

10-701 Machine Learning (Spring 2012) Principal Component Analysis Yang Xu This note is partly

9.1 Overview 9 Deep Learning Alexander Smola Introduction to Machine Learning 10-701

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

Conditionals: between language and reasoning Class 2, part 1 - The meta-linguistic theory April

U nit 2: P robability and distributions L ecture 2: N ormal distribution S tatistics 101 Mine C

T2K oscillation experiment: Getting the neutrino out of the bottle. James Dobson - 1st year

nd Conce e 2 nd The e cont ntext ext : The ncern rn for r innov ovati ation on - Case e

Glass Container Manufacturing &amp; Recycling Overview Scott DeFife President Glass Packaging

Actual Causality: A Survey Joe Halpern Cornell University Includes joint work with Judea Pearl

Single Trapped Proton A. Mooser, K. Blaum, S. Bruninger, K. Franke, H. Kracke, C. Leiteritz, W.

Economics 2 Professor Christina Romer Spring 2016 Professor David Romer LECTURE 3 COMPARATIVE

Glass Container Manufacturing & Recycling Overview Scott DeFife President Glass Packaging