Descriptive and Exploratory Methods L eon Bottou largely copied - PowerPoint PPT Presentation

Descriptive and Exploratory Methods L´ eon Bottou largely copied from Mireille Summa-Gettler lectures (in french) COS 424 – 3/23/2010

Agenda Classification, clustering, regression, other. Goals Parametric vs. kernels vs. nonparametric Probabilistic vs. nonprobabilistic Representation Linear vs. nonlinear Deep vs. shallow Explicit: architecture, feature selection Explicit: regularization, priors Capacity Control Implicit: approximate optimization Implicit: bayesian averaging, ensembles Loss functions Operational Budget constraints Considerations Online vs. offline Exact algorithms for small datasets. Computational Stochastic algorithms for big datasets. Considerations Parallel algorithms. Today’s topic fits poorly in this picture. L´ eon Bottou 2/86 COS 424 – 3/23/2010

Introduction Predictive methods – Construct models using examples (the training set). – Hope that it works well for future situations (e.g. on a testing set.) Descriptive methods – Describe the distribution of examples. – Investigate the geometry of the data. – Hope to acquire insights about the underlying phenomenon. L´ eon Bottou 3/86 COS 424 – 3/23/2010

A catalog of descriptive methods Clustering methods – K-means, K-medioids, Gaussian mixtures. . . – Hierarchical clustering. . . Projection methods �� – Principal component analysis (PCA) [Hotelling, 30s] �� – Correspondence analysis (CA) [Benzecri, 60s] �� – Multiple correspondence analysis (MCA) �� – Canonical correlation analysis (CCA), . . . Embedding methods – Kernel PCA – Locally linear embbedding (LLE) – ISOMAP L´ eon Bottou 4/86 COS 424 – 3/23/2010

I. Principal Component Analysis L´ eon Bottou 5/86 COS 424 – 3/23/2010

Sparkling water springs Observations – 21 sparkling water springs in France. Continuous variables – 8 ion concentrations (calcium, magnesium, . . . ) – price per liter. Categorical variables – Total minerality (low, medium, high) – Compliance with regulations (yes, no) – Region (Alps, Auvergne, Languedoc, . . . ) L´ eon Bottou 6/86 COS 424 – 3/23/2010

Sparkling water springs �� ,��-�� )�� ,��' �� !�� !�"�� #�� $��$ �� %� �� % ��#� �� .��$ %�� # �� # �� & '�� (�$�� $/�� # �� #�� % �� 0��1��1�� & '�� )�� # �� % � �� 2�1��"�� %�� % �� '�� (�$�� # �� & '�� 1��*3�� # �� %�� % �� '�-�-�� % �� %� �% �� & '�� (�$�� 4�-5�� #%� �� $��$ �� 1�� %� �� # �� #� �� (�$�� *6�-�' %� %� #� �� %� �� $��$ �� *+�� # �� # �% �� % �� *�� # ��%� �� %� �� *7�� #% �� % �� $��$ �� 8�� # ��& �� (�$�� 8��9�� % �� 8��$'*�-�� ## � % ��% � �� $��$ �� :��&�� # � � �� (�$�� Active Supplementary variables variables L´ eon Bottou 7/86 COS 424 – 3/23/2010

Elementary planes Pairwise graphs are not informative L´ eon Bottou 8/86 COS 424 – 3/23/2010

Approximate a data cloud by its projection High dimensional cloud. Low dimensional projection. �� L´ eon Bottou 9/86 COS 424 – 3/23/2010

Some projections are more informative � �� The main idea of PCA is the determination of a good projection. L´ eon Bottou 10/86 COS 424 – 3/23/2010

One data table, two data clouds �� L´ eon Bottou 11/86 COS 424 – 3/23/2010

PCA projection of the 21 rows L´ eon Bottou 12/86 COS 424 – 3/23/2010

PCA projection of the 8 columns L´ eon Bottou 13/86 COS 424 – 3/23/2010

Summary Principal component analysis – Table of n observations represented by p continuous variables. – Cloud of n row-points (observations) in dimension p . – Cloud of p column-points (variables) in dimension n . – Search the “best” projection for each cloud. Interpretation – Identify similar observations. – Identify similar variables. Best projection ? L´ eon Bottou 14/86 COS 424 – 3/23/2010

Distance Distances – A good projection reveals whether two points were close or distant. – We would like to use the convenient Euclidian distance. – Variables often have very different numerical ranges. �� Correlation PCA Covariance PCA – Normalize the mean and – Normalize the mean of each variable, x ij = ( z ij − ¯ z j ) , but standard deviation of each variable, x ij = ( z ij − ¯ z j ) /σ j . not the standard deviation. – This is the default and this is – This is sometimes useful. what we discuss today. L´ eon Bottou 15/86 COS 424 – 3/23/2010

Descriptive and Exploratory Methods L eon Bottou largely copied - PowerPoint PPT Presentation

Descriptive and Exploratory Methods L eon Bottou largely copied from Mireille Summa-Gettler lectures (in french) COS 424 3/23/2010 Agenda Classification, clustering, regression, other. Goals Parametric vs. kernels vs. nonparametric

48-175 Descriptive Geometry Basic Concepts of Descriptive Geometry Descriptive geometry is

Descriptive Statistics Descriptive and Inferential Statistics Recall that statistical methods are

Descriptive Epidem iology & Descriptive Epidem iology & Study design Study design

Descriptive Complexity of Jonni Virtema Deterministic Polylogarithmic Time Descriptive

Exploratory Data Analysis Paul Cohen ISTA 370 Spring, 2012 Paul Cohen ISTA 370 () Exploratory

Introduction to Data Science: x (1) x 1 x 2 x ( n ) x i n 1 1 Size: size

CME/STATS 195 CME/STATS 195 Lecture 5: Exploratory Data Analysis Lecture 5: Exploratory Data

Exploratory Monitoring at Bing AUTOMATED SYNTHETIC EXPLORATORY MONITORING OF DYNAMIC WEB SITES

Trademark and Unfair Competition Law Slides 22: Descriptive and Nominative Fair Use LAWS 7341-001

Descriptive combinatorics and ergodic theorems Anush Tserunyan University of Illinois at

Games in Descriptive Set Theory, or: its all fun and games until someone loses the axiom of

48-175 Descriptive Geometry Lines in Descriptive Geometry recap-depicting lines 2 taking

Descriptive statistics P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R Zuzanna

I t Introduction to d t i t Descriptive Descriptive Statistics Statistics 17.871 Spring

Agenda for today 1. Descriptive Data Analysis 2. Graphics XploRe Descriptive Data Analysis 1-2

48-175 Descriptive Geometry Planes in Descriptive Geometry A spatial figure is a plane

Major industrial accidents Eric Marsden <eric.marsden@risk-engineering.org> Learning from

Solar power mirror arrays for radio astronomy Olaf Wucknitz, Alan Roy wucknitz@mpifr-bonn.mpg.de

Importing Data into Protg-OWL Martin OConnor Stanford Center for Biomedical Informatics

Disclosure No conflict of interest to disclose. New Drugs for Diabetes: Which Ones, Which

Opportunities for Reducing Vegetative Ozone Exposure through U.S. Power Plant Carbon Standards

Remote sensing for characterizing mine waste minerology, mine drainage geochemistry, and site

What"is"state?"

- " " AGGREGATION " " BOOTSTRAP BAGGING FOREST RANDOM BOOSTING

Descriptive and Exploratory Methods L eon Bottou largely copied - PowerPoint PPT Presentation

Descriptive and Exploratory Methods L eon Bottou largely copied from Mireille Summa-Gettler lectures (in french) COS 424 3/23/2010 Agenda Classification, clustering, regression, other. Goals Parametric vs. kernels vs. nonparametric

48-175 Descriptive Geometry Basic Concepts of Descriptive Geometry Descriptive geometry is

Descriptive Statistics Descriptive and Inferential Statistics Recall that statistical methods are

Descriptive Epidem iology &amp; Descriptive Epidem iology &amp; Study design Study design

Descriptive Complexity of Jonni Virtema Deterministic Polylogarithmic Time Descriptive

Exploratory Data Analysis Paul Cohen ISTA 370 Spring, 2012 Paul Cohen ISTA 370 () Exploratory

Introduction to Data Science: x (1) x 1 x 2 x ( n ) x i n 1 1 Size: size

CME/STATS 195 CME/STATS 195 Lecture 5: Exploratory Data Analysis Lecture 5: Exploratory Data

Exploratory Monitoring at Bing AUTOMATED SYNTHETIC EXPLORATORY MONITORING OF DYNAMIC WEB SITES

Trademark and Unfair Competition Law Slides 22: Descriptive and Nominative Fair Use LAWS 7341-001

Descriptive combinatorics and ergodic theorems Anush Tserunyan University of Illinois at

Games in Descriptive Set Theory, or: its all fun and games until someone loses the axiom of

48-175 Descriptive Geometry Lines in Descriptive Geometry recap-depicting lines 2 taking

Descriptive statistics P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R Zuzanna

I t Introduction to d t i t Descriptive Descriptive Statistics Statistics 17.871 Spring

Agenda for today 1. Descriptive Data Analysis 2. Graphics XploRe Descriptive Data Analysis 1-2

48-175 Descriptive Geometry Planes in Descriptive Geometry A spatial figure is a plane

Major industrial accidents Eric Marsden &lt;eric.marsden@risk-engineering.org&gt; Learning from

Solar power mirror arrays for radio astronomy Olaf Wucknitz, Alan Roy wucknitz@mpifr-bonn.mpg.de

Importing Data into Protg-OWL Martin OConnor Stanford Center for Biomedical Informatics

Disclosure No conflict of interest to disclose. New Drugs for Diabetes: Which Ones, Which

Opportunities for Reducing Vegetative Ozone Exposure through U.S. Power Plant Carbon Standards

Remote sensing for characterizing mine waste minerology, mine drainage geochemistry, and site

What&quot;is&quot;state?&quot;

- &quot; &quot; AGGREGATION &quot; &quot; BOOTSTRAP BAGGING FOREST RANDOM BOOSTING

Descriptive Epidem iology & Descriptive Epidem iology & Study design Study design

Major industrial accidents Eric Marsden <eric.marsden@risk-engineering.org> Learning from

What"is"state?"

- " " AGGREGATION " " BOOTSTRAP BAGGING FOREST RANDOM BOOSTING