an artificial neuron artificial neural networks
play

Anartificialneuron Artificialneuralnetworks y = f ( S ) x 0 =+1 - PDF document

Anartificialneuron Artificialneuralnetworks y = f ( S ) x 0 =+1 Background n n = = w 0 = S w x w x x 1 i i i i


  1. ✂ ✂ An�artificial�neuron Artificial�neural�networks y = f ( S ) x 0 =�+1 • Background n n = �✁� − θ = w 0 =�– θ S w x w x x 1 i i i i • Artificial�neurons,�what�they�can�and�cannot�do = = w 1 i 1 i 0 • The�multilayer�perceptron�(MLP) w 2 Σ f ( S )�=�any non-linear,� x 2 f y saturating�function,�e.g.�a� • Three�forms�of�learning step�function�or�a�sigmoid: w n x n • The�back�propagation�algorithm 1 = 1 • Radial�basis�function�networks f ( S ) + − S e • Competitive�learning�(and�relatives) 1 2 A�single�neuron�as�a�classifier The�XOR�problem The�neuron�can�be�used�as�a�classifier Not�linearly�separable�– must�combine�two�linear�discriminants. y<0.5� class�0 y>0.5� class�1 x 2 Linear�discriminant�=�a�hyper plane θ w 1 x 2 = − + 1 x x 2 1 w w 2 2 2D�example: Only�linearly�separable� A�line classification�problems� 0 x 1 0 1 can�be�solved. AND Two�sigmoids�implement�fuzzy� NOR x 1 AND�and�NOR 3 4 The�multilayer�perceptron Artificial�neural�networks�... • store�information�in�the�weights,�not�in�the�nodes • are�trained,�by�adjusting�the�weights,�not� programmed Inputs Outputs • can�generalize�to�previously�unseen�data • are�adaptive • are�fast�computational�devices,�well�suited�for� Linear�(func.�approx.)�or Sigmoidal�(classification) parallel�simulation�and/or�hardware�implementation Can�implement�any�function,�given�a�sufficiently�rich� • are�fault�tolerant internal�structure�(number�of�nodes�and�layers) 5 6 1

  2. ✁✂ � ✁ ✆ ✄ ☎ Why�neural�networks? Application�areas (statistical�methods�are�always�at�least�as�good,�right?) Finance Industry • Neural�networks�are statistical�methods • Forecasting • Adaptive�control • Model�independence • Fraud�detection • Signal�analysis • Adaptivity/Flexibility Medicine • Data�mining • Image�analysis • Concurrency Consumer market • Economical�reasons�(rapid�prototyping) • Household�equipment • Character�recognition • Speech�recognition 7 8 Three�forms�of�learning Back�propagation Output�( y ) Unsupervised Supervised Reinforcement Input Desired�output�( d ) Environment Target� Input Error�function function Reward State Action Agent Learning� The�contribution�to�the�error� E – ∂ system E from�a�particular�weight� w ji is� ∂ w ji Error Learning� Action� The�weight�should�be�moved�in� Error�function�and�transfer�function� Suggested� system actions selector proportion�to�that�contribution,�but�in�the� must�both�be�differentiable. other�direction: ∂ E ∆ = − η w ∂ ji w ji 9 10 Back�propagation�update�rule Training�procedure�(1) Network�is�initialised�with�small�random�weights w ji i j k Split�data�in�two�– a� training�set and�a� test�set Assumptions ∂ E ∆ = − η = ηδ w x Error�is�squared�error: ji ∂ j i w ji n The�training�set�is�used�for�training� 1 The�test�set�is�used�to�test�for� = − − 2 ( ) derivative�of�error E d y j j 2 and�is�passed�through�many�times. generalization�(to�see�how� = j 1 − − well�the�net�does�on� y ( 1 y )( d y ) If�node� j is�an�output�node Transfer�(activation)� j j j j δ = Update�weights�after�each� previously�unseen�data). − δ function�is�sigmoid: y ( 1 y ) w Otherwise j j j kj k presentation�( pattern�learning ) k 1 = = or This�is�the�result�the�counts! y f ( S ) − j j + S sum�over�all�nodes�in�the� 1 e j Accumulate�weight�changes�( ∆ w )� derivative�of�sigmoid ’next’�layer�(closer�to�the� until�the�end�of�the�training�set�is� outputs) reached�( epoch or� batch�learning ) 11 12 2

  3. � � � � � Overtraining Network�size E Typical�error�curves • Overtraining�is�more�likely�to�occur�… if�we�train�on�too�little�data Test�or�validation�set�error if�the�network�has�too�many�hidden�nodes if�we�train�for�too�long Training�set�error • The�network�should�be� slightly larger�than�the�size� necessary�to�represent�the�target�function Time�(epochs) • Unfortunately,�the�target�function�is�unknown�... Overtraining • Need�much�more�training�data�than�the�number�of� weights! Cross�validation:�Use�a�third�set,�a� validation�set ,�to�decide�when�to�stop�(find� the�minimum�for�this�set,�and�retrain�for�that�number�of�epochs) 13 14 Training�procedure�(2) Practical�considerations 1. Start�with�a�small�network,�train,�increase�the�size,� • What�happens�if�the�mapping�represented�by�the�data�is�not�a� train�again,�etc.,�until�the�error�on�the� training�set can� function?�For�example,�what�if�the�same�input�does�not�always� be�reduced�to�acceptable�levels. lead�to�the�same�output? • In�what�order�should�data�be�presented?�Sequentially?�At� 2. If�an�acceptable�error�level�was�found,�increase�the� random? size�by�a�few�percent�and�retrain�again,�this�time�using� • How�should�data�be�represented?�Compact?�Distributed? the�cross-validation�procedure�to�decide�when�to�stop.� Publish�the�result�on�the�independent�test�set. • What�can�be�done�about�missing�data? 3. If�the�network�failed�to�reduce�the�error�on�the�training� • Trick�of�the�trade:�Monotonic�functions�are�easier�to�learn�than non-monotonic�functions!�(at�least�for�the�MLP) set,�despite�a�large�number�of�nodes�and�attempts,� something�is�likely�to�be�wrong�with�the�data. 15 16 Radial�basis�functions�(RBF) Geometric�interpretation • Layered�structure,�like�the�MLP,�with�one�hidden�layer • The�input�space�is�covered�with�overlapping�Gaussians. • Output�nodes�are�conventional • Each�hidden�node�… measures�the�distance�between�its�weight�vector�and�the� input�vector�(instead�of�a�weighted�sum) • In�classification,�the�discriminants�become�hyper�spheres� (circles�in�2D). feeds�that�through�a�Gaussian�(instead�of�a�sigmoid) Inputs Outputs 17 18 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend