arti cial neural net w orks read ch 4 recommended
play

Articial Neural Net w orks [Read Ch. 4] [Recommended - PDF document

Articial Neural Net w orks [Read Ch. 4] [Recommended exercises 4.1, 4.2, 4.5, 4.9, 4.11] Threshold units Gradien t descen t Multila y er net w orks Bac kpropagation Hidden la y er represen


  1. Arti�cial Neural Net w orks [Read Ch. 4] [Recommended exercises 4.1, 4.2, 4.5, 4.9, 4.11] � Threshold units � Gradien t descen t � Multila y er net w orks � Bac kpropagation � Hidden la y er represen tations � Example: F ace Recognition � Adv anced topics 74 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

  2. Connectionist Mo dels Consider h umans: � Neuron switc hing time ~ : 001 second 10 � Num b er of neurons ~ 10 4 � 5 � Connections p er neuron ~ 10 � Scene recognition time ~ : 1 second � 100 inference steps do esn't seem lik e enough ! m uc h parallel computation Prop erties of arti�cial neural nets (ANN's): � Man y neuron-lik e threshold switc hing units � Man y w eigh ted in terconnections among units � Highly parallel, distributed pro cess � Emphasis on tuning w eigh ts automatically 75 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

  3. When to Consider Neural Net w orks � Input is high-dimensional discrete or real-v alued (e.g. ra w sensor input) � Output is discrete or real v alued � Output is a v ector of v alues � P ossibly noisy data � F orm of target function is unkno wn � Human readabilit y of result is unimp ortan t Examples: � Sp eec h phoneme recognition [W aib el] � Image classi�cation [Kanade, Baluja, Ro wley] � Financial prediction 76 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

  4. AL VINN driv es 70 mph on high w a ys Sharp Straight Sharp Left Ahead Right 30 Output Units 4 Hidden Units 77 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997 30x32 Sensor Input Retina

  5. P erceptron 8 > > < 1 if w + w x + � � � + w x > 0 0 1 1 n n o ( x ; : : : ; x ) = 1 n > > : � 1 otherwise. x 1 w 1 x 0 =1 Sometimes w e'll use simpler v ector notation: w 0 x 2 w 2 8 > > < Σ 1 if w ~ � ~ x > 0 . o ( ~ x ) = AA > > : n . � 1 otherwise. Σ wi xi n Σ wi xi { . 1 if > 0 i =0 o = w n i =0 -1 otherwise x n 78 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

  6. Decision Surface of a P erceptron Represen ts some useful functions x2 x2 � What w eigh ts represen t + + g ( x ; x ) = AN D ( x ; x )? 1 2 1 2 - - + + But some functions not represen table x1 x1 - - + � e.g., not - linearly separable � Therefore, ( a ) w e'll w an t net w orks of these... ( b ) 79 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

  7. P erceptron training rule w w + � w i i i where � w = � ( t � o ) x i i Where: � t = c ( ~ x ) is target v alue � o is p erceptron output � � is small constan t (e.g., .1) called le arning r ate 80 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

  8. P erceptron training rule Can pro v e it will con v erge � If training data is linearly separable � and � su�cien tly small 81 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

  9. Gradien t Descen t T o understand, consider simpler line ar unit , where o = w + w x + � � � + w x 0 1 1 n n Let's learn w 's that minimize the squared error i 1 X 2 E [ w ~ ] � ( t � o ) d d 2 d 2 D Where D is set of training examples 82 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

  10. Gradien t Descen t Gradien t 25 2 3 @ E @ E @ E 6 7 20 4 5 r E [ w ~ ] � ; ; � � � @ w @ w @ w 0 1 n 15 E[w] T raining rule: 10 5 � w ~ = � � r E [ w ~ ] 0 i.e., 2 @ E 1 -2 � w = � � i -1 0 0 @ w i 1 2 -1 3 w1 w0 83 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

  11. Gradien t Descen t @ E @ 1 X 2 = ( t � o ) d d @ w @ w 2 d i i 1 @ X 2 = ( t � o ) d d 2 @ w d i 1 @ X = 2( t � o ) ( t � o ) d d d d 2 d @ w i @ X = ( t � o ) ( t � w ~ � x ~ ) d d d d @ w d i @ E X = ( t � o )( � x ) d d i;d @ w d i 84 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

  12. Gradien t Descen t Gradient-Descent ( tr aining exampl es; � ) Each tr aining example is a p air of the form h ~ x; t i , wher e ~ x is the ve ctor of input values, and t is the tar get output value. � is the le arning r ate (e.g., .05). � Initiali ze eac h w to some small random v alue i � Un til the termination condition is met, Do { Initiali ze eac h � w to zero. i { F or eac h h ~ x ; t i in tr aining exampl es , Do � Input the instance ~ x to the unit and compute the output o � F or eac h linear unit w eigh t w , Do i � w � w + � ( t � o ) x i i i { F or eac h linear unit w eigh t w , Do i w w + � w i i i 85 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

  13. Summary P erceptron training rule guaran teed to succeed if � T raining examples are linearly separable � Su�cien tly small learning rate � Linear unit training rule uses gradien t descen t � Guaran teed to con v erge to h yp othesis with minim um squared error � Giv en su�cien tly small learning rate � � Ev en when training data con tains noise � Ev en when training data not separable b y H 86 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

  14. Incremen tal (Sto c hastic) Gradien t Descen t Batc h mo de Gradien t Descen t: Do un til satis�ed 1. Compute the gradien t r E [ w ~ ] D 2. w ~ w ~ � � r E [ w ~ ] D Incremen tal mo de Gradien t Descen t: Do un til satis�ed � F or eac h training example d in D 1. Compute the gradien t r E [ w ~ ] d 2. w ~ w ~ � � r E [ w ~ ] d 1 X 2 E [ w ~ ] � ( t � o ) D d d 2 d 2 D 1 2 E [ w ~ ] � ( t � o ) d d d 2 Incr emental Gr adient Desc ent can appro ximate Batch Gr adient Desc ent arbitrarily closely if � made small enough 87 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

  15. Multila y er Net w orks of Sigmoid Units head hid who’d hood ... ... F1 F2 88 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

  16. Sigmoid Unit � ( x ) is the sigmoid function 1 � x 1 + e x1 w1 x0 = 1 A d� ( x ) Nice prop ert y: = � ( x )(1 � � ( x )) x2 w2 w0 dx A Σ . A W e can deriv e gradien t decen t rules to train . n net = Σ wi xi A 1 o = σ (net) = . i =0 � One sigmoid unit -net wn A 1 + e xn � Multilayer networks of sigmoid units ! Bac kpropagation 89 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

  17. Error Gradien t for a Sigmoid Unit @ E @ 1 X 2 = ( t � o ) d d @ w @ w 2 d 2 D i i 1 @ X 2 = ( t � o ) d d 2 d @ w i 1 @ X = 2( t � o ) ( t � o ) d d d d 2 d @ w i 0 1 @ o X d B C @ A = ( t � o ) � d d d @ w i @ o @ net X d d = � ( t � o ) d d d @ net @ w d i But w e kno w: @ o @ � ( net ) d d = = o (1 � o ) d d @ net @ net d d @ net @ ( w ~ � x ~ ) d d = = x i;d @ w @ w i i So: @ E X = � ( t � o ) o (1 � o ) x d d d d i;d @ w d 2 D i 90 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

  18. Bac kpropagation Algorithm Initiali ze all w eigh ts to small random n um b ers. Un til satis�ed, Do � F or eac h training example, Do 1. Input the training example to the net w ork and compute the net w ork outputs 2. F or eac h output unit k � o (1 � o )( t � o ) k k k k k 3. F or eac h hidden unit h X � o (1 � o ) w � h h h h;k k k 2 outputs 4. Up date eac h net w ork w eigh t w i;j w w + � w i;j i;j i;j where � w = � � x i;j j i;j 91 lecture slides for textb o ok Machine L e arning , T. Mitc hell, McGra w Hill, 1997

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend