Overview DS GA 1002 Statistical and Mathematical Models - - PowerPoint PPT Presentation

overview
SMART_READER_LITE
LIVE PREVIEW

Overview DS GA 1002 Statistical and Mathematical Models - - PowerPoint PPT Presentation

Overview DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with uncertainty Statistics:


slide-1
SLIDE 1

Overview

DS GA 1002 Statistical and Mathematical Models

http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall16 Carlos Fernandez-Granda

slide-2
SLIDE 2

Probability and statistics

◮ Probability:

Framework for dealing with uncertainty

◮ Statistics:

Framework for extracting information from data making probabilistic assumptions

slide-3
SLIDE 3

Probability

◮ Probability basics: Probability spaces, conditional probability,

independence

◮ Random variables: continuous/discrete, important distributions,

generation of random variables

◮ Random vectors, sequences and processes: continuous/discrete,

Markov chains, graphical models

slide-4
SLIDE 4

Probability

◮ Expectation: definition, Markov and Chebyshev inequalities,

conditional expectation, correlation, covariance matrices, autocovariance function

◮ Convergence of random sequences: convergence in

probability/distribution, law of large numbers, central limit theorem

◮ Estimation: mean square error, maximum likelihood, maximum a

posteriori

slide-5
SLIDE 5

Statistics

Parametric vs nonparametric techniques applied to

◮ Learning statistical models ◮ Confidence intervals ◮ Hypothesis testing

slide-6
SLIDE 6

Statistics

◮ Bayesian statistics ◮ Regression ◮ Supplementary topics: analysis of variance, experimental design,

random projections, robust statistics

slide-7
SLIDE 7

Why should I take this course?

slide-8
SLIDE 8

To understand probabilistic models

slide-9
SLIDE 9

United States presidential election

◮ Indirect election, citizens of the US cast ballots for electors

in the Electoral College

◮ These electors vote for the President and Vice President ◮ Number of electors per state = members of Congress

(Washington D.C. gets 3)

◮ Except in Maine and Nebraska, all electors in a state go to the

candidate who wins the state

slide-10
SLIDE 10

538 probabilistic model (from fivethirtyeight.com)

Aim: Predict the election result using poll data Probabilistic models allow to take into account that

◮ Polls have different sample sizes ◮ Some pollsters are unreliable ◮ In some states there may be few polls (especially at the start of the

campaign)

◮ Historic trends in each state are important ◮ Polls from states with similar demographics are correlated ◮ Additional information (approval ratings, contributions, party

identification, . . . ) can be useful In addition, probabilistic models quantify the uncertainty of the prediction

slide-11
SLIDE 11

538 probabilistic model (from fivethirtyeight.com)

slide-12
SLIDE 12

To understand statistical methodology

slide-13
SLIDE 13

Polio vaccine

◮ Poliomyelitis is an infectious disease, which induces paralysis and can

be lethal

◮ It has almost been eradicated by vaccination (98 cases in 2015 from

350 000 in 1988)

◮ The first vaccine was developed in 1952 by Jonas Salk and

collaborators

◮ Two experiments were carried out to evaluate whether the vaccine was

effective

slide-14
SLIDE 14

Polio vaccine

◮ Experiment 1: Students in 2nd grade with consent of their parents

were vaccinated. Students in 1st and 3rd grade were not.

◮ Experiment 2: A group of children, whose parents consented, was

randomly divided in half to form the treatment and control groups. Experiment 1 Experiment 2 Size Rate Treatment 225 000 25 Control 725 000 54 No consent 125 000 44 Size Rate Treatment 200 000 28 Control 200 000 71 No consent 350 000 46

slide-15
SLIDE 15

To understand machine-learning algorithms

slide-16
SLIDE 16

Quadratic discriminant analysis

Labeled data

slide-17
SLIDE 17

Quadratic discriminant analysis

Aim: Classify unlabeled examples

slide-18
SLIDE 18

Quadratic discriminant analysis

Quadratic discriminant analysis fits a Gaussian distribution to each class

slide-19
SLIDE 19

Quadratic discriminant analysis

Results: red (99.9 %), blue (55.8 %), blue (97.2 %)