Anartificialneuron Artificialneuralnetworks y = f ( S ) x 0 =+1 - - PDF document

an artificial neuron artificial neural networks
SMART_READER_LITE
LIVE PREVIEW

Anartificialneuron Artificialneuralnetworks y = f ( S ) x 0 =+1 - - PDF document

Anartificialneuron Artificialneuralnetworks y = f ( S ) x 0 =+1 Background n n = = w 0 = S w x w x x 1 i i i i


slide-1
SLIDE 1

1

1

Artificialneuralnetworks

  • Background
  • Artificialneurons,whattheycanandcannotdo
  • Themultilayerperceptron(MLP)
  • Threeformsoflearning
  • Thebackpropagationalgorithm
  • Radialbasisfunctionnetworks
  • Competitivelearning(andrelatives)

2

Anartificialneuron

) (S f y =

= =

= − =

n i n i i i i i

x w x w S

1

θ f(S)=any non-linear, saturatingfunction,e.g.a stepfunctionorasigmoid: x0 =+1

Σ

f x1 x2 xn w0 =–θ w1 w2 wn y

S

e S f

+ = 1 1 ) (

3

Asingleneuronasaclassifier

2 1 2 1 2

w x w w x θ + − = x1 x2 Theneuroncanbeusedasaclassifier y<0.5

class0 y>0.5

class1 Onlylinearlyseparable classificationproblems canbesolved. Lineardiscriminant=ahyper plane 2Dexample: Aline

4

TheXORproblem

Notlinearlyseparable– mustcombinetwolineardiscriminants. x1 x2 1 1 NOR AND

Twosigmoidsimplementfuzzy ANDandNOR

5

Themultilayerperceptron

Inputs Outputs Canimplementanyfunction,givenasufficientlyrich internalstructure(numberofnodesandlayers)

Linear(func.approx.)or Sigmoidal(classification)

6

Artificialneuralnetworks...

  • storeinformationintheweights,notinthenodes
  • aretrained,byadjustingtheweights,not

programmed

  • cangeneralizetopreviouslyunseendata
  • areadaptive
  • arefastcomputationaldevices,wellsuitedfor

parallelsimulationand/orhardwareimplementation

  • arefaulttolerant
slide-2
SLIDE 2

2

7

Applicationareas

Finance

  • Forecasting
  • Frauddetection

Medicine

  • Imageanalysis

Consumer market

  • Householdequipment
  • Characterrecognition
  • Speechrecognition

Industry

  • Adaptivecontrol
  • Signalanalysis
  • Datamining

8

Whyneuralnetworks?

(statisticalmethodsarealwaysatleastasgood,right?)

  • Neuralnetworksare statisticalmethods
  • Modelindependence
  • Adaptivity/Flexibility
  • Concurrency
  • Economicalreasons(rapidprototyping)

9

Input Target function Learning system

Error

Threeformsoflearning

Supervised Unsupervised Reinforcement

Environment Agent State Action Reward Learning system Action selector

Suggested actions 10

Backpropagation

Input Output(y) Errorfunction Desiredoutput(d)

ThecontributiontotheerrorE fromaparticularweightwji is

ji

w E ∂ ∂

Theweightshouldbemovedin proportiontothatcontribution,butinthe

  • therdirection:

Errorfunctionandtransferfunction mustbothbedifferentiable.

ji ji

w E w ∂ ∂ − = ∆ η

11

Backpropagationupdaterule

Assumptions

Errorissquarederror: Transfer(activation) functionissigmoid:

  • =

− − =

n j j j

y d E

1 2

) ( 2 1

j

S j j

e S f y

+ = = 1 1 ) ( i j k

i j ji ji

x w E w ηδ η = ∂ ∂ − = ∆

wji

✁✂ ✁ ✄ ☎

− − − =

k k kj j j j j j j j

w y y y d y y δ δ ) 1 ( ) )( 1 (

Ifnodej isanoutputnode Otherwise derivativeofsigmoid sumoverallnodesinthe ’next’layer(closertothe

  • utputs)

derivativeoferror

12

Trainingprocedure(1)

Networkisinitialisedwithsmallrandomweights Splitdataintwo– atrainingset andatestset Thetrainingsetisusedfortraining andispassedthroughmanytimes. Updateweightsaftereach presentation(patternlearning)

  • r

Accumulateweightchanges(∆w) untiltheendofthetrainingsetis reached(epoch orbatchlearning) Thetestsetisusedtotestfor generalization(toseehow wellthenetdoeson previouslyunseendata). Thisistheresultthecounts!

slide-3
SLIDE 3

3

13

Overtraining

Overtraining Typicalerrorcurves Trainingseterror Testorvalidationseterror Time(epochs) E Crossvalidation:Useathirdset,avalidationset,todecidewhentostop(find theminimumforthisset,andretrainforthatnumberofepochs)

14

Networksize

  • Overtrainingismorelikelytooccur…
  • ifwetrainontoolittledata
  • ifthenetworkhastoomanyhiddennodes
  • ifwetrainfortoolong
  • Thenetworkshouldbeslightly largerthanthesize

necessarytorepresentthetargetfunction

  • Unfortunately,thetargetfunctionisunknown...
  • Needmuchmoretrainingdatathanthenumberof

weights!

15

Trainingprocedure(2)

  • 1. Startwithasmallnetwork,train,increasethesize,

trainagain,etc.,untiltheerroronthetrainingset can bereducedtoacceptablelevels.

  • 2. Ifanacceptableerrorlevelwasfound,increasethe

sizebyafewpercentandretrainagain,thistimeusing thecross-validationproceduretodecidewhentostop. Publishtheresultontheindependenttestset.

  • 3. Ifthenetworkfailedtoreducetheerroronthetraining

set,despitealargenumberofnodesandattempts, somethingislikelytobewrongwiththedata.

16

Practicalconsiderations

  • Whathappensifthemappingrepresentedbythedataisnota

function?Forexample,whatifthesameinputdoesnotalways leadtothesameoutput?

  • Inwhatordershoulddatabepresented?Sequentially?At

random?

  • Howshoulddataberepresented?Compact?Distributed?
  • Whatcanbedoneaboutmissingdata?
  • Trickofthetrade:Monotonicfunctionsareeasiertolearnthan

non-monotonicfunctions!(atleastfortheMLP)

17

Radialbasisfunctions(RBF)

  • Layeredstructure,liketheMLP,withonehiddenlayer
  • Outputnodesareconventional
  • Eachhiddennode…
  • measuresthedistancebetweenitsweightvectorandthe

inputvector(insteadofaweightedsum)

  • feedsthatthroughaGaussian(insteadofasigmoid)

Inputs Outputs

18

Geometricinterpretation

  • TheinputspaceiscoveredwithoverlappingGaussians.
  • Inclassification,thediscriminantsbecomehyperspheres

(circlesin2D).

slide-4
SLIDE 4

4

19

RBFtraining

  • Couldusebackprop(transferfunctionstill

differentiable)

  • Better:Trainlayersseparately
Hiddenlayer:FindpositionandsizeofGaussians

byunsupervisedlearning(e.g.competitivelearning, K-means)

Outputlayer:Supervised,e.g.Delta-rule,LMS,

backprop

20

MLPvs.RBF

  • RBF(hidden)nodesworkinalocalregion,MLP

nodesareglobal

  • MLPsdobetterinhigh-dimensionalspaces
  • MLPsrequirefewernodesandgeneralizesbetter
  • RBFscanlearnfaster
  • RBFsarelesssensitivetotheorderinwhichdatais

presented

  • RBFsmakelessfalse-yesclassificationerrors
  • MLPsextrapolatebetter

21

Unsupervisedlearning

Classifyingunlabeleddata Nearestneighbourclassifiers

  • Classifytheunknownsample(vector)x totheclassofitsclosest

previouslyclassifiedneighbour

  • Problem1:Theclosestneighbourmaybeanoutlierfromthe

wrongclass

  • Problem2:Muststorelotsofsamplesandcomputedistanceto

eachone,foreverynewsample

Thenewpattern,x,will beclassifiedasa. x

K-means

K-means,forK=2

  • Makeacodebookoftwovectors,c1 andc2
  • Sample(atrandom)twovectorsfromthedataas

initialvaluesofc1 andc2

  • Splitthedataintwosubsets,D1 andD2 whereD1

isthesetofallpointswithc1 astheirclosest codebookvector,andviceversa

  • Movec1 towardsthemeaninD1 andc2 towards

themeaninD2

  • Repeatfrom3untilconvergence(untilthe

codebookvectorsstopmoving)

23

Voroniregions

  • K-meansformsocalledVoroniregions intheinputspace
  • TheVoroniregionaroundacodebookvectorci istheregionin

whichci istheclosestcodebookvector

Voroniregionsaround10codebookvectors

24

Competitivelearning

M linear,thresholdless,nodes (onlyweightedsums) N inputs 1. Presentapattern(sample),x 2. Thenodewiththelargestoutput (nodek)isdeclaredwinner 3. Theweightsofthewinneris updatedsothatitwillbecomeeven strongerthenexttimethesame patternispresented.Allother weightsareleftunchanged Withnormalisedweights,thisis equivalenttofindingthenodewiththe minimumdistancebetweenitsweight vectorandtheinputvector Networknode=Codebookvector Thestandardcompetitivelearningrule

) (

ki i ki

w x w − = ∆ η

N i ≤ ≤ 1

Competitivelearning+batchlearning=K-means

slide-5
SLIDE 5

5

25

Thewinnertakesitall

  • Poorinitialisation:Theweightvectorshavebeeninitialisedto smallrandom

numbers(inW),butthesearefarfromthedata(A andB)

  • ThefirstnodetowinwillmovefromW towardsA orB andwillalwayswin,

henceforth

  • Solution:Usethedatatoinitialisetheweights(asinK-means),orincludethe

winning-frequencyinthedistancemeasure,ormovemorenodesthanonly thewinner.

Problemwithcompetitivelearning:Anodemaybecomeinvincible W A B

26

Selforganisingmaps

Thecerebralcortexisatwo- dimensionalstructure,yetwe canreasoninmorethantwo dimensions Differentneuronsintheauditory cortexrespondtodifferent frequencies.Theseneuronsare locatedinfrequencyorder! Dimensionalreduction Topologicalpreservation/ topographicmap Kohonen’sself-organisingfeaturemap(SOFM orSOM)

Non-linear,topologicallypreserving,dimensionalreduction(like pressingaflower)

27

SOM

Competitivelearning,extendedintwoways: 1.Thenodesareorganisedinatwo-dimensionalgrid 2.Aneighbourhoodfunctionisintroduced

(incompetitivelearning,thereisnodefinedorderbetweennodes) (notonlythewinnerisupdated,butalsoitsclosestneighboursinthegrid) A3x3grid,makingatwo- dimensionalmapofthefour- dimensionalinput space

28

SOMupdaterule

  • Findthewinner,nodek,andthenupdateall weightsby:
  • f(j,k)isaneighbourhoodfunctionintherange[0,1],witha

maximumforthewinner(j=k)anddecreasingwithdistance fromthewinner,e.g.aGaussian

  • Graduallydecreaseneighbourhoodradius(widthofthe

Gaussians)andlearningrate(η)overtime.

  • Result:Vectorsthatarecloseinthehigh-dimensionalinput

spacewillactivateareasthatarecloseonthegrid.

) )( , (

ki i ki

w x k j f w − = ∆ η N i ≤ ≤ 1

29

  • A10x10SOM,istrainedonachemicalanalysisof178winesfromoneregionin

Italy,wherethegrapeshavegrownonthreedifferenttypesofsoil.Theinputis 13-dimensional.

  • Aftertraining,winesfromdifferentsoiltypesactivatedifferentregionsofthe

SOM.Forexample:

  • Notethatthenetworkisnottoldthatthedifferencebetweenthewinesisthesoil

type,norhowmanysuchtypes(howmanyclasses)thereare.

SOMonlineexample SOMoffline example

http://websom.hut.fi Atwo-dimensional,clickable, mapofUsenetnewsarticles (fromcomp.ai.neural-nets)

slide-6
SLIDE 6

6

31

Growingneuralgas

  • Growingunsupervisednetwork(startingfrom

twonodes)

  • Dynamicneighbourhood
  • Constantparameters
  • Verygoodatfollowingmovingtargets
  • Canalsofollowjumpingtargets
  • Currentwork:UsingGNGtodefineandtrain

thehiddenlayerofGaussiansinaRBFnetwork

32

Nodepositions

  • Startwithtwonodes
  • Eachnodehasasetofneighbours,

indicatedbyedges

  • Theedgesarecreatedand

destroyeddynamicallyduring training

  • Foreachsample,theclosestnode,

k,andallitscurrentneighboursare movedtowardstheinput

33

Nodecreation

  • Anewnode(blue)iscreatedevery

λ’thtimestep,unlessthemaximum numberofnodeshasbeenreached

  • Thenewnodeisplacedhalfway

betweenthenodewiththegreatest errorandthenodeamongits currentneighbourswiththe greatesterror

  • Thenodewiththegreatesterroris

themostunstableone

34

Nodecreation(contd.)

  • Here,afourthnodehasjustbeen

created

  • Ineffect,newnodesarecreated

closetowheretheyaremostlikely needed

  • Theexactpositionofthenewnode

isnotcrucial,sincenodesmove around

35

Afterawhile…

7nodes 50nodes (Voroniregionsinred)

36

Neighbourhood

  • Foreachsample,letk denotethewinner(thenode

closesttothesample)andr therunner-up(thesecond closest)

  • Ifanedgeexistsbetweenk andr,resetitsageto0

– Otherwise,createsuchanedgeandsetitsageto0

  • Incrementtheageofallotheredgesemanatingfrom

nodek

  • Edgesolderthanamax areremoved,asareanynodes

thatinthiswaylosesitslastremainingedge Neighbourhoodedgesarecreatedanddestroyedasfollows:

slide-7
SLIDE 7

7

37

Delaunaytriangulation

Voroniregions(red)andDelaunay triangulation(yellow)

ConnectthecodebookvectorsinalladjacentVoroniregions

ThegraphofGNGedgesisasubsetof theDelaunaytriangulation

38

Deadunits

  • Thereisonlyonewayforanedgetoget’younger’– whenthetwo

nodesitinterconnectsarethetwoclosesttotheinput

  • Ifoneofthetwonodeswins,buttheotheroneisnot therunner-up,

then,andonlythen,theedgeages

  • Ifneither ofthetwonodeswin,theedgedoesnotage!

Theinputdistributionhasjumpedfromthelower lefttotheupperrightcorner

39

Thelab

(inroom1515!)

  • Classificationofbitmaps,bysupervisedlearning(back

propagation),usingtheSNNSsimulator

  • Anillustrationofsomeunsupervisedlearningalgorithms,using

theGNGdemoapplet

– LBG/LBG-U(≈ K-means) – HCL(Hardcompetitivelearning) – Neuralgas – CHL(CompetitiveHebbianlearning) – NeuralgaswithCHL – GNG/GNG-U(Growingneuralgas) – SOM(Selforganisingmap) – Growinggrid