SLIDE 1
Overview DS GA 1002 Probability and Statistics for Data Science - - PowerPoint PPT Presentation
Overview DS GA 1002 Probability and Statistics for Data Science - - PowerPoint PPT Presentation
Overview DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with uncertainty Statistics:
SLIDE 2
SLIDE 3
Probability
◮ Probability basics: Probability spaces, conditional probability,
independence
◮ Random variables: continuous/discrete, important distributions,
generating random variables (rejection sampling)
◮ Multivariate random variables: random vectors, continuous/discrete,
independence (conditional independence, graphical models), generating multivariate random variables
SLIDE 4
Probability
◮ Expectation: expectation operator, mean, variance, Markov and
Chebyshev inequalities, covariance, covariance matrices, conditional expectation
◮ Random processes: Definition, mean, autocovariance, important
processes (iid sequences, Gaussian, Poisson, random walk)
◮ Convergence of random sequences: Types of convergence (in
probability/distribution), law of large numbers, central limit theorem, Monte Carlo simulation
◮ Markov chains: Definition, recurrence, periodicity, convergence,
Markov chain Monte Carlo (Metropolis-Hastings)
SLIDE 5
Statistics
◮ Descriptive statistics: Histogram, empirical mean/variance, order
statistics, empirical covariance, empirical covariance matrix (principal component analysis)
◮ Frequentist statistics: iid sampling, mean square error, consistency,
nonparametric model estimation (kernel density estimation), parametric model estimation (method of moments, maximum likelihood)
SLIDE 6
Statistics
◮ Bayesian statistics: Bayesian parametric models, conjugate priors,
Bayesian estimators (minimum MSE estimator, maximum a posteriori)
◮ Hypothesis testing: Hypothesis-testing framework, parametric testing,
nonparametric testing (permutation test), multiple testing
◮ Linear regression: Linear models, least-squares estimation, overfitting
SLIDE 7
Why should I take this course?
SLIDE 8
To understand probabilistic models
SLIDE 9
United States presidential election
◮ Indirect election, citizens of the US cast ballots for electors
in the Electoral College
◮ These electors vote for the President and Vice President ◮ Number of electors per state = members of Congress
(Washington D.C. gets 3)
◮ Except in Maine and Nebraska, all electors in a state go to the
candidate who wins the state
SLIDE 10
538 probabilistic model (from fivethirtyeight.com)
Aim: Predict the election result using poll data Probabilistic models allow to take into account that
◮ Polls have different sample sizes ◮ Some pollsters are unreliable ◮ In some states there may be few polls (especially at the start of the
campaign)
◮ Historic trends in each state are important ◮ Polls from states with similar demographics are correlated ◮ Additional information (approval ratings, contributions, party
identification, . . . ) can be useful In addition, probabilistic models quantify the uncertainty of the prediction
SLIDE 11
538 probabilistic model (from fivethirtyeight.com)
SLIDE 12
To understand statistical methodology
SLIDE 13
Polio vaccine
◮ Poliomyelitis is an infectious disease, which induces paralysis and can
be lethal
◮ It has almost been eradicated by vaccination (98 cases in 2015 from
350 000 in 1988)
◮ The first vaccine was developed in 1952 by Jonas Salk and
collaborators
◮ Two experiments were carried out to evaluate whether the vaccine was
effective
SLIDE 14
Polio vaccine
◮ Experiment 1: Students in 2nd grade with consent of their parents
were vaccinated. Students in 1st and 3rd grade were not.
◮ Experiment 2: A group of children, whose parents consented, was
randomly divided in half to form the treatment and control groups. Experiment 1 Experiment 2 Size Rate Treatment 225 000 25 Control 725 000 54 No consent 125 000 44 Size Rate Treatment 200 000 28 Control 200 000 71 No consent 350 000 46
SLIDE 15
To understand machine-learning algorithms
SLIDE 16
Quadratic discriminant analysis
Labeled data
SLIDE 17
Quadratic discriminant analysis
Aim: Classify unlabeled examples
SLIDE 18
Quadratic discriminant analysis
Quadratic discriminant analysis fits a Gaussian distribution to each class
SLIDE 19