Department of Computer Science CSCI 5622: Machine Learning Chenhao - - PowerPoint PPT Presentation

department of computer science csci 5622 machine learning
SMART_READER_LITE
LIVE PREVIEW

Department of Computer Science CSCI 5622: Machine Learning Chenhao - - PowerPoint PPT Presentation

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 16: Dimensionality Reduction Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1 Midterm A. Review session B. Flipped classroom C. Go over the example


slide-1
SLIDE 1

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 16: Dimensionality Reduction Slides adapted from Jordan Boyd-Graber, Chris Ketelsen

1

slide-2
SLIDE 2

Midterm

  • A. Review session
  • B. Flipped classroom
  • C. Go over the example midterm
  • D. Clustering!

2

slide-3
SLIDE 3

Learning objectives

  • Understand what unsupervised learning is for
  • Learn principal component analysis
  • Learn singular value decomposition

3

slide-4
SLIDE 4

Supervised learning

4

Unsupervised learning

Data: X Labels: Y Data: X

slide-5
SLIDE 5

Supervised learning

5

Unsupervised learning

Data: X Latent structure: Z Data: X Labels: Y

slide-6
SLIDE 6

When do we need unsupervised learning?

6

slide-7
SLIDE 7

When do we need unsupervised learning?

  • Acquiring labels is expensive
  • You may not even know what labels to acquire

7

slide-8
SLIDE 8

When do we need unsupervised learning?

  • Exploratory data analysis
  • Learn patterns/representations that can be useful for supervised

learning (representation learning)

  • Generate data

8

slide-9
SLIDE 9

When do we need unsupervised learning?

9

https://qz.com/1090267/artificial-intelligence-can-now-show-you- how-those-pants-will-fit/

slide-10
SLIDE 10

Unsupervised learning

10

  • Dimensionality reduction
  • Clustering
  • Topic modeling
slide-11
SLIDE 11

Unsupervised learning

11

  • Dimensionality reduction
  • Clustering
  • Topic modeling
slide-12
SLIDE 12

Principal Component Analysis - Motivation

12

slide-13
SLIDE 13

Principal Component Analysis - Motivation

13

Data’s features almost certainly correlated

slide-14
SLIDE 14

Principal Component Analysis - Motivation

14

Makes it hard to see hidden structure

slide-15
SLIDE 15

Principal Component Analysis - Motivation

15

To make this easier, let try to reduce this to 1-dimension

slide-16
SLIDE 16

Principal Component Analysis - Motivation

16

We need to shift our perspective Change the definition of up-down-left-right Choose new features as linear combinations of old features Change of feature-basis

slide-17
SLIDE 17

Principal Component Analysis - Motivation

17

We need to shift our perspective Change the definition of up-down-left-right Choose new features as linear combinations of old features Change of feature-basis Important: Center and normalize data before performing PCA We will assume that this has already been done in this lecture.

slide-18
SLIDE 18

Principal Component Analysis - Motivation

18

Proceed incrementally:

  • If we could choose one combination to describe data?
  • Which combination leads to the least loss of information?
  • Once we've found that one, look for another one, perpendicular

to the first, the retains the next most amount of information-

  • Repeat until done (or good enough)
slide-19
SLIDE 19

Principal Component Analysis - Motivation

19

slide-20
SLIDE 20

Principal Component Analysis - Motivation

20

slide-21
SLIDE 21

Principal Component Analysis - Motivation

21

slide-22
SLIDE 22

Principal Component Analysis - Motivation

22

slide-23
SLIDE 23

Principal Component Analysis - Motivation

23

slide-24
SLIDE 24

Principal Component Analysis - Motivation

24

slide-25
SLIDE 25

Principal Component Analysis - Motivation

25

slide-26
SLIDE 26

Principal Component Analysis - Motivation

26

slide-27
SLIDE 27

Principal Component Analysis - Motivation

27

slide-28
SLIDE 28

Principal Component Analysis - Motivation

28

slide-29
SLIDE 29

Principal Component Analysis - Motivation

29

slide-30
SLIDE 30

Principal Component Analysis - Motivation

30

slide-31
SLIDE 31

Principal Component Analysis - Motivation

31

slide-32
SLIDE 32

Principal Component Analysis - Motivation

32

The best vector to project onto is called the 1st principal component What properties should it have?

slide-33
SLIDE 33

Principal Component Analysis - Motivation

33

The best vector to project onto is called the 1st principal component What properties should it have?

  • Should capture largest variance in data
  • Should probably be a unit vector
slide-34
SLIDE 34

Principal Component Analysis - Motivation

34

The best vector to project onto is called the 1st principal component What properties should it have?

  • Should capture largest variance in data
  • Should probably be a unit vector

After we’ve found the first, look the second which:

  • Captures largest amount of leftover variance
  • Should probably be a unit vector
  • Should be orthogonal to the one that came before it
slide-35
SLIDE 35

Principal Component Analysis - Motivation

35

slide-36
SLIDE 36

Principal Component Analysis - Motivation

36

slide-37
SLIDE 37

Principal Component Analysis - Motivation

37

Main idea: The principal components give a new perpendicular coordinate system to view data where each principle component describes successively less and less information.

slide-38
SLIDE 38

Principal Component Analysis - Motivation

38

Main idea: The principal components give a new perpendicular coordinate system to view data where each principle component describes successively less and less information. So far: All we’ve done is a change of basis on the feature space. But when do we reduce the dimension?

slide-39
SLIDE 39

Principal Component Analysis - Motivation

39

But when do we reduce the dimension? Picture data points in a 3D feature space What if the points lied mostly along a single vector?

slide-40
SLIDE 40

Principal Component Analysis - Motivation

40

The other two principal components are still there But they do not carry much information

slide-41
SLIDE 41

Principal Component Analysis - Motivation

41

The other two principal components are still there But they do not carry much information Throw them away and work with low dimensional representation! Reduce 3D data to 1D

slide-42
SLIDE 42

Principal Component Analysis – The How

42

slide-43
SLIDE 43

Principal Component Analysis – The How

43

slide-44
SLIDE 44

Principal Component Analysis – The How

44

slide-45
SLIDE 45

Principal Component Analysis – The How

45

slide-46
SLIDE 46

Principal Component Analysis – The How

46

But how do we find w?

slide-47
SLIDE 47

Principal Component Analysis – The How

47

But how do we find w?

slide-48
SLIDE 48

Principal Component Analysis – The How

48

slide-49
SLIDE 49

Principal Component Analysis – The How

49

slide-50
SLIDE 50

Principal Component Analysis – The How

50

slide-51
SLIDE 51

Principal Component Analysis – The How

51

slide-52
SLIDE 52

Principal Component Analysis – The How

52

slide-53
SLIDE 53

Principal Component Analysis – The How

53

slide-54
SLIDE 54

Principal Component Analysis – The How

54

slide-55
SLIDE 55

Principal Component Analysis – The How

55

slide-56
SLIDE 56

Principal Component Analysis – The How

56

slide-57
SLIDE 57

Principal Component Analysis – The How

57

slide-58
SLIDE 58

Principal Component Analysis – The How

58

slide-59
SLIDE 59

Principal Component Analysis – The How

59

slide-60
SLIDE 60

PCA – Dimensionality reduction

60

Questions:

  • How do we reduce dimensionality?
  • How much stuff should we keep?
slide-61
SLIDE 61

PCA – Dimensionality reduction

61

slide-62
SLIDE 62

PCA – Dimensionality reduction

62

slide-63
SLIDE 63

Quiz

63

slide-64
SLIDE 64

PCA - applications

64

slide-65
SLIDE 65

PCA - applications

65

slide-66
SLIDE 66

PCA - applications

66

slide-67
SLIDE 67

PCA - applications

67

slide-68
SLIDE 68

PCA - applications

68

slide-69
SLIDE 69

PCA - applications

69

slide-70
SLIDE 70

Connecting PCA and SVD

70

slide-71
SLIDE 71

SVD Applications

71

slide-72
SLIDE 72

Wrap up

72

Dimensionality reduction can be a useful way to

  • explore data
  • visualize data
  • represent data