Machine Learning 1 Machine(Learning(in(a(Nutshell ( Data$ Model$ - - PowerPoint PPT Presentation

machine learning
SMART_READER_LITE
LIVE PREVIEW

Machine Learning 1 Machine(Learning(in(a(Nutshell ( Data$ Model$ - - PowerPoint PPT Presentation

Machine Learning 1 Machine(Learning(in(a(Nutshell ( Data$ Model$ Performance$ Measure$ Machine(Learner( 2 Machine(Learning(in(a(Nutshell ( Data$ Model$ Performance$ Measure$ Machine(Learner( Data$with$a5ributes$ ID( A1( Reflex(


slide-1
SLIDE 1

1

Machine Learning

slide-2
SLIDE 2

Machine(Learning(in(a(Nutshell (

Machine(Learner( Data$ Model$ Performance$ Measure$

2

slide-3
SLIDE 3

Data$with$a5ributes$

Machine(Learning(in(a(Nutshell (

ID( A1( Reflex( RefLow(RefHigh(Label( 1$ 5.6$ Normal$ 3.4$ 7$ No$ 2$ 5.5$ Normal$ 2.4$ 5.7$ No$ 3$ 5.3$ Normal$ 2.4$ 5.7$ Yes$ 4$ 5.3$ Elevated$ 2.4$ 5.7$ No$ 5$ 6.3$ Normal$ 3.4$ 7$ No$ 6$ 3.3$ Normal$ 2.4$ 5.7$ Yes$ 7$ 5.1$ Decreased$ 2.4$ 5.7$ Yes$ 8$ 4.2$ Normal$ 2.4$ 5.7$ Yes$

…$…$ …$ …$ …$ …$

Machine(Learner( Data$ Model$ Performance$ Measure$

Instance$$$ with$label$ xi ∈ X yi ∈ Y

3

slide-4
SLIDE 4

Machine(Learning(in(a(Nutshell (

Machine(Learner( Data$ Model$ Performance$ Measure$ Model$

LogisHc$regression$ Support$vector$$ machines$

x2 x3 x1 x4 x5

Hierarchical$ Bayesian$ Networks$ Mixture$ Models$

f : X 7! Y Data$with$a5ributes$

ID( A1( Reflex( RefLow(RefHigh(Label( 1$ 5.6$ Normal$ 3.4$ 7$ No$ 2$ 5.5$ Normal$ 2.4$ 5.7$ No$ 3$ 5.3$ Normal$ 2.4$ 5.7$ Yes$ 4$ 5.3$ Elevated$ 2.4$ 5.7$ No$ 5$ 6.3$ Normal$ 3.4$ 7$ No$ 6$ 3.3$ Normal$ 2.4$ 5.7$ Yes$ 7$ 5.1$ Decreased$ 2.4$ 5.7$ Yes$ 8$ 4.2$ Normal$ 2.4$ 5.7$ Yes$

…$…$ …$ …$ …$ …$ Instance$$$ with$label$ xi ∈ X yi ∈ Y

4

slide-5
SLIDE 5

Machine(Learning(in(a(Nutshell (

Machine(Learner( Data$ Model$ Performance$ Measure$ EvaluaHon$

Measure$predicted$labels$vs$ actual$labels$on$test$data$ #$Training$Examples$ Performance$

Learning$Curve$

5

Model$

LogisHc$regression$ Support$vector$$ machines$

x2 x3 x1 x4 x5

Hierarchical$ Bayesian$ Networks$ Mixture$ Models$

f : X 7! Y Data$with$a5ributes$

ID( A1( Reflex( RefLow(RefHigh(Label( 1$ 5.6$ Normal$ 3.4$ 7$ No$ 2$ 5.5$ Normal$ 2.4$ 5.7$ No$ 3$ 5.3$ Normal$ 2.4$ 5.7$ Yes$ 4$ 5.3$ Elevated$ 2.4$ 5.7$ No$ 5$ 6.3$ Normal$ 3.4$ 7$ No$ 6$ 3.3$ Normal$ 2.4$ 5.7$ Yes$ 7$ 5.1$ Decreased$ 2.4$ 5.7$ Yes$ 8$ 4.2$ Normal$ 2.4$ 5.7$ Yes$

…$…$ …$ …$ …$ …$ Instance$$$ with$label$ xi ∈ X yi ∈ Y

slide-6
SLIDE 6

6

A training set

slide-7
SLIDE 7

7

ID3-induced decision tree

slide-8
SLIDE 8

8

Model spaces

I + +

  • I

+ +

  • I

+ +

  • Nearest

neighbor Version space Decision tree

slide-9
SLIDE 9

9

Decision tree-induced partition – example

Color Shape Size + +

  • Size

+

  • +

big big small small round square red green blue

I

slide-10
SLIDE 10

19

The Naïve Bayes Classifier

Some material adapted from slides by Tom Mitchell, CMU.

slide-11
SLIDE 11

20

The Naïve Bayes Classifier

) ( ) | ( ) ( ) | (

j i j i j i

X P Y X P Y P X Y P =

! Recall Bayes rule: ! Which is short for: ! We can re-write this as:

) ( ) | ( ) ( ) | (

j i j i j i

x X P y Y x X P y Y P x X y Y P = = = = = = =

= = = = = = = = =

k k k j i j i j i

y Y P y Y x X P y Y x X P y Y P x X y Y P ) ( ) | ( ) | ( ) ( ) | (

slide-12
SLIDE 12

21

Deriving Naïve Bayes

! Idea: use the training data to directly estimate: ! Then, we can use these values to estimate

using Bayes rule.

! Recall that representing the full joint probability

is not practical.

) (Y P

) | ( Y X P

and

) | (

new

X Y P ) | , , , (

2 1

Y X X X P

n

slide-13
SLIDE 13

22

Deriving Naïve Bayes

! However, if we make the assumption that the

attributes are independent, estimation is easy!

! In other words, we assume all attributes are

conditionally independent given Y.

! Often this assumption is violated in practice, but

more on that later…

=

i i n

Y X P Y X X P ) | ( ) | , , (

1 …

slide-14
SLIDE 14

23

Deriving Naïve Bayes

! Let and label Y be discrete. ! Then, we can estimate

and directly from the training data by counting!

n

X X X , ,

1 …

=

) ( i Y P

) | (

i i Y

X P

Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(Sky = sunny | Play = yes) = ? P(Humid = high | Play = yes) = ?

slide-15
SLIDE 15

24

The Naïve Bayes Classifier

! Now we have:

which is just a one-level Bayesian Network

! To classify a new point Xnew:

) (

i

H P

… …

Attributes (evidence) Labels (hypotheses)

1 n i j

X X X Y

) (

j

Y P

) | (

j i Y

X P

∑ ∏ ∏

= = = = = =

k i k i k j i i j n j

y Y X P y Y P y Y X P y Y P X X y Y P ) | ( ) ( ) | ( ) ( ) , , | (

1 …

= = "" ←

i k i k y new

y Y X P y Y P Y

k

) | ( ) ( max arg

slide-16
SLIDE 16

25

The Naïve Bayes Algorithm

! For each value yk

! Estimate P(Y = yk) from the data. ! For each value xij of each attribute Xi

! Estimate P(Xi=xij | Y = yk)

! Classify a new point via: ! In practice, the independence assumption

doesnt often hold true, but Naïve Bayes performs very well despite it.

= = "" ←

i k i k y new

y Y X P y Y P Y

k

) | ( ) ( max arg

slide-17
SLIDE 17

26

Naïve Bayes Applications

! Text classification

! Which e-mails are spam? ! Which e-mails are meeting notices? ! Which author wrote a document?

! Classifying mental states

People Words Animal Words

Learning P(BrainActivity | WordCategory) Pairwise Classification Accuracy: 85%