Welcome and Syllabus STAT 432 | UIUC | Fall 2019 | Dalpiaz - - PowerPoint PPT Presentation

welcome and syllabus
SMART_READER_LITE
LIVE PREVIEW

Welcome and Syllabus STAT 432 | UIUC | Fall 2019 | Dalpiaz - - PowerPoint PPT Presentation

Welcome and Syllabus STAT 432 | UIUC | Fall 2019 | Dalpiaz Questions? Comments? Concerns? STAT 432 Basics of Statistical Learning Also ASRM 451. stat432.org DAVE dalpiaz2@illinois.edu David Dalpiaz Room 36, 703 S. Wright David Dalpiaz


slide-1
SLIDE 1

Welcome and Syllabus

STAT 432 | UIUC | Fall 2019 | Dalpiaz

slide-2
SLIDE 2

Questions?

Comments? Concerns?

slide-3
SLIDE 3

STAT 432

Basics of Statistical Learning

Also ASRM 451….

slide-4
SLIDE 4

stat432.org

slide-5
SLIDE 5

DAVE

slide-6
SLIDE 6

dalpiaz2@illinois.edu

David Dalpiaz

Room 36, 703 S. Wright

slide-7
SLIDE 7

David Dalpiaz Instructor Mengchen Wang Teaching Assistant Zihe Liu Teaching Assistant

slide-8
SLIDE 8

Course Logistics

slide-9
SLIDE 9

Prerequisites?

slide-10
SLIDE 10

Course Description

Topics in supervised and unsupervised learning are covered, including logistic regression, support vector machines, classification trees and nonparametric regression. Model building and feature selection are discussed for these techniques, with a focus on regularization methods, such as lasso and ridge regression, as well as methods for model selection and assessment using cross validation. Cluster analysis and principal components analysis are introduced as examples of unsupervised learning.

slide-11
SLIDE 11

Course Description

Machine learning form the perspective of a statistician who uses R.

slide-12
SLIDE 12

Learning Objectives

After this course, students should be expected to be able to …

  • identify supervised (regression and classification) and unsupervised (clustering)

learning problems.

  • understand the fundamental theory behind statistical learning methods.
  • implement learning methods using a statistical computing environment.
  • formulate practical, real-world, problems as statistical learning problems.
  • evaluate effectiveness of learning methods when used as a tool for data analysis.
slide-13
SLIDE 13

Basics of Statistical Learning

slide-14
SLIDE 14

Course Format

  • Three lectures per week. (Unimportant?)
  • Sometimes slides, sometimes board notes, sometimes computing.
  • (Important!) Things you will do:
  • (Practice) Quizzes on PrairieLearn
  • Exams at the CBTF
  • Data Analyses
  • Projects
slide-15
SLIDE 15

Assessment Percentage PrairieLearn Quizzes 20 CBTF Exam I 10 CBTF Exam II 10 CBTF Exam III 20 Practice Data Analyses 10 Data Analyses 10 Group Final Project 15 Graduate Project 5

slide-16
SLIDE 16

A+ A A- B+ B B- C+ C C- D+ D D- TBD 93% 90% 87% 83% 80% 77% 73% 70% 67% 63% 60%

slide-17
SLIDE 17

Computing Resources

slide-18
SLIDE 18

PL and CBTF

slide-19
SLIDE 19

Additional Class Technology

slide-20
SLIDE 20
  • Use @illinois.edu email
  • Begin subject with [STAT 432]
  • Get to the point!
  • Probably just use Piazza…
slide-21
SLIDE 21

Office Hours

Wednesday 4:00 - 7:00

slide-22
SLIDE 22

“I don't know who you are. I don't know what you

  • want. If you're looking for ransom, I can tell you I

don't have money... but what I do have are a very particular set of skills. Skills I have acquired over a very long career. Skills that make me a nightmare for people like you…”

slide-23
SLIDE 23

Not registered?

slide-24
SLIDE 24

“I am altering the deal, pray I don’t alter it any further.”

slide-25
SLIDE 25

Questions?

Comments? Concerns?

slide-26
SLIDE 26

ML in 5 Minutes

slide-27
SLIDE 27

Supervised Learning

Classification

slide-28
SLIDE 28

Let’s train you to be a classifier…

slide-29
SLIDE 29

This is a Snorlax.

slide-30
SLIDE 30

This is a Pikachu.

slide-31
SLIDE 31

This is a Raichu.

slide-32
SLIDE 32

This is a Snorlax.

slide-33
SLIDE 33

This is a Raichu.

slide-34
SLIDE 34

This is a Pikachu.

slide-35
SLIDE 35

Now that you are a classifier, let’s make some predictions…

slide-36
SLIDE 36

What Pokémon is this?

slide-37
SLIDE 37

What Pokémon is this?

slide-38
SLIDE 38

What Pokémon is this?

slide-39
SLIDE 39

What might the “data” look like?

Class (y) Color (x1) Height (x2) Weight (x3) Type (x4) Pikachu Yellow 0.4 m 6.0 kg Electric Snorlax Blue 2.1 m 460.0 kg Normal Raichu Orange 0.8 m 30.0 kg Electric … … … … …
slide-40
SLIDE 40

A non-exhaustive list of questions…

  • How would you go from an image to a data frame?
  • Which predictors should we use in our model?
  • How do we model the response as a function of the predictors?
  • How to we use our model to make predictions?
  • How do we know if our model is working well?
  • Who cares?
slide-41
SLIDE 41

Supervised Learning

Regression

slide-42
SLIDE 42

It’s pretty much the same as classification except you’re predicting a number instead of a category.

slide-43
SLIDE 43

Unsupervised Learning

Clustering

slide-44
SLIDE 44

Can you “group” these Pokémon?

slide-45
SLIDE 45

Maybe like this?

slide-46
SLIDE 46

How about like this?

slide-47
SLIDE 47

Why not like this?

slide-48
SLIDE 48

An non-exhaustive list of questions…

  • How do you measure the similarity between observations?
  • How many groups should there be?
  • How do you assign observations to groups?
  • Who cares?
slide-49
SLIDE 49

The Extended Syllabus

slide-50
SLIDE 50

At the end of the course, I hope that students feel they are…

  • A better statistician.
  • A better programmer.
  • A better learner.
slide-51
SLIDE 51

grade = f(prior knowledge, effort, luck)

slide-52
SLIDE 52

“You must unlearn what you have learned.”

slide-53
SLIDE 53

Things I sort of wish you didn’t know about:

  • R-Squared
  • Leverage
  • Cook’s Distance
  • Variance Inflation Factors
  • P-Values???
slide-54
SLIDE 54

Things I would be happy to never see or talk about in this course.

  • MSE as a model metric. (Hint: use RMSE. MSE is appropriate in theoretical discussions.)
  • Removing outliers based on leverage or Cook’s distance.
  • Removing predictors to reduce variance inflation factors.
  • Calling a standard error a standard deviation or vice versa.
  • Model selection based on p-values of individual coefficients.
  • R-Squared.
  • Causality. (Unless you’re really sure you should. Hint: you shouldn’t.)
  • SAS. (Feel free to bug me about Python though…)
  • Mixing assignment operators. (Or poorly styled code in general.)
  • Using ASRM instead of STAT. (There are eight sections of this course because of this…)
slide-55
SLIDE 55

Facts versus Opinions

slide-56
SLIDE 56

Data Science Big Data Deep Learning Predictive Analytics Artificial Intelligence Machine Learning

slide-57
SLIDE 57

“Won’t you be my neighbor?”

slide-58
SLIDE 58
slide-59
SLIDE 59

“There are known, knowns…”

slide-60
SLIDE 60

–Dan John

“Show up, don’t quit, ask questions.”

slide-61
SLIDE 61

Student Health

  • Diet
  • Exercise
  • Sleep
slide-62
SLIDE 62

Expectations?

slide-63
SLIDE 63

Feedback?

slide-64
SLIDE 64

Questions?

Comments? Concerns?

slide-65
SLIDE 65

Homework

  • Bookmark the course website.
  • Read the full syllabus!!!
  • Read the extended syllabus.
  • Register for course on PrairieLearn.
  • Register for course on Piazza.
  • Register for the CBTF Syllabus Exam.
  • Register for course on RStudio Cloud?
  • We’ll walk through this next time.
slide-66
SLIDE 66