What is modeling? NEU 466M Instructor: Professor Ila R. - PowerPoint PPT Presentation

What ¡is ¡modeling? ¡ ¡ NEU ¡466M ¡ Instructor: ¡Professor ¡Ila ¡R. ¡Fiete ¡ Spring ¡2016 ¡

Reference: ¡ NEURAL ¡NETWORKS ¡FOR ¡PATTERN ¡ RECOGNITION, ¡CHRISOPHER ¡BISHOP ¡ hEp://cs.du.edu/~mitchell/mario_books/Neural_Networks_for_PaEern_RecogniLon_-‑_Christopher_Bishop.pdf ¡

What ¡does ¡modeling ¡mean? ¡ example of ‘a’ example of ‘b’ Pixels x i with values 1 or 0 (black or white).

What ¡does ¡modeling ¡mean? ¡ example of ‘a’ example of ‘b’ What is ‘a’-ness, versus ‘b’-ness?

Equivalent ¡problem ¡encountered ¡by ¡electrophysiologists ¡ figure ¡from ¡Quian ¡Quiroga ¡ → ‘ a ’ ‘ b ’ Categorize ¡recorded ¡spike ¡as ¡coming ¡from ¡neuron ¡a ¡or ¡b ¡

What ¡does ¡modeling ¡mean? ¡ example of ‘a’ example of ‘b’ What is ‘a’-ness, versus ‘b’-ness?

Model: ¡relaLonship ¡between ¡data ¡and ¡ its ¡category ¡ { x 1 , x 2 , · · · , x N } → ‘ a ’ { x 0 1 , x 0 2 , · · · , x 0 N } → ‘ b ’ 256 × 256 pixels : N = 65536 Store every image with its letter label?

Model: ¡store ¡every ¡possible ¡image ¡ with ¡corresponding ¡leEer ¡label? ¡ → ‘ a ’ ‘ b ’ Number of 256 × 256 bw images: 2 65536 ∼ 10 20000 256 × 256 pixels : N = 65536 Atoms in universe: ∼ 10 80 Houston, ¡we ¡have ¡a ¡problem. ¡ ¡

Storing ¡each ¡data, ¡category ¡pair ¡ • Need ¡too ¡many ¡examples/data ¡to ¡fill ¡grid ¡between ¡ inputs ¡to ¡categories! ¡“Curse ¡of ¡dimensionality” ¡ • Too ¡much ¡data ¡to ¡store! ¡ ¡ à ¡Compactness ¡ ¡ • Not ¡predicLve: ¡What ¡to ¡do ¡with ¡new ¡example? ¡ ¡ ¡ à ¡Generalizability ¡ ¡

What ¡we ¡want ¡from ¡a ¡model: ¡compactness ¡and ¡ generalizability. ¡

One ¡soluLon: ¡feature ¡selecLon ¡ • Look ¡at ¡some ¡much ¡smaller ¡set ¡of ¡ characterisLc ¡features ¡that ¡define ¡the ¡classes. ¡ • How ¡to ¡choose ¡these? ¡ ¡ ¡-‑ ¡by ¡“hand” ¡ ¡-‑ ¡some ¡“automaLc” ¡technique ¡ (sounds ¡magical ¡but ¡this ¡is ¡goal ¡of ¡much ¡staLsLcs ¡and ¡machine ¡learning; ¡ ¡ we ¡will ¡consider ¡how ¡automaLcally ¡find ¡features ¡in ¡this ¡class) ¡

Features ¡ x 1 : height-to-width ratio of object ˜ x 2 : some other feature ˜

Features ¡ � : ‘ a ’ × : ‘ b ’ x 1 : height-to-width ratio of object ˜ x 2 : some other feature ˜

Features ¡ � : ‘ a ’ × : ‘ b ’ More features can be helpful: x 1 only would lead to poor categorization ˜

Features ¡ • If ¡adding ¡features ¡improves ¡performance, ¡ keep ¡adding ¡independent ¡features? ¡ • Will ¡this ¡conLnue ¡to ¡improve ¡performance? ¡ At ¡some ¡point, ¡NO! ¡Performance ¡will ¡get ¡worse. ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡WHY? ¡

A ¡more ¡familiar ¡example: ¡regression ¡ • Instead ¡of ¡discrete ¡categories ¡( ‘a’, ¡’b’ ), ¡each ¡ datapoint ¡(or ¡data ¡vector) ¡maps ¡to ¡some ¡value ¡ of ¡a ¡conLnuous ¡variable ¡( y ). ¡ ¡ ¡ { ( x 1 , y 1 ) , ( x 2 , y 2 ) , · · · , ( x N , y N ) }

{ ( x 1 , y 1 ) , ( x 2 , y 2 ) , · · · , ( x N , y N ) } x 1 independent variable y 1 response or dependent variable

Modeling ¡as ¡regression ¡ { ( x 1 , y 1 ) , ( x 2 , y 2 ) , · · · , ( x N , y N ) } What ¡does ¡it ¡mean ¡to ¡model ¡this ¡data? ¡ ¡ -‑ ¡ ¡Want ¡to ¡write ¡ y ¡as ¡some ¡funcLon ¡of ¡ x ¡ -‑ Want ¡to ¡fit ¡a ¡funcLon ¡through ¡x, ¡y ¡ ¡ -‑ Given ¡ x ¡want ¡to ¡predict ¡ y ¡

Regression: ¡curve-‑fieng ¡ { ( x 1 , y 1 ) , ( x 2 , y 2 ) , · · · , ( x N , y N ) } M y ( x ) = w 0 + w 1 x + · · · + w M x M = X w j x j ˜ j =0 free parameters: ( w 0 , w 1 , · · · , w M )

Polynomial ¡regression ¡ • The ¡larger ¡M, ¡the ¡higher-‑degree ¡the ¡polynomial ¡ à ¡more ¡complex ¡model/more ¡features. ¡ ¡ • Expect ¡fit ¡to ¡get ¡beEer ¡with ¡increasing ¡M. ¡ ¡ When ¡M ¡= ¡N, ¡then ¡exact ¡fit ¡to ¡all ¡datapoints ¡(b/c ¡ M th ¡order ¡polynomial ¡has ¡M+1 ¡parameters, ¡M ¡ roots). ¡ ¡ • So ¡are ¡the ¡more-‑complex ¡models ¡beEer? ¡ ¡

Parameters ¡chosen ¡to ¡minimize ¡fit ¡error ¡ Common ¡error ¡funcLon: ¡sum-‑of-‑squares: ¡ ¡ N E = 1 X [˜ y ( x n ; w ) − y n ] 2 n =1 (Is ¡this ¡the ¡only ¡choice? ¡No. ¡Best ¡choice? ¡InteresLng ¡q: ¡we’ll ¡get ¡to ¡it.) ¡ N 1 w ∗ = arg min X [˜ y ( x n ; w ) − y n ] 2 w n =1 (How ¡to ¡implement? ¡Matlab: ¡polyfit. ¡Theory: ¡we’ll ¡get ¡to ¡it.) ¡

Linear ¡fit ¡(M=1) ¡ Degree 1, squared error = 0.45126 1 N ¡= ¡11 ¡datapoints ¡ 0.9 dashed ¡= ¡true ¡fxn ¡ 0.8 0.7 0.6 y 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x

QuadraLc ¡(M=2) ¡ Degree 2, squared error = 0.45126 1 0.9 N ¡= ¡11 ¡datapoints ¡ dashed ¡= ¡true ¡fxn ¡ 0.8 0.7 0.6 y 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x

Cubic ¡ Degree 3, squared error = 0.02289 1 0.9 N ¡= ¡11 ¡datapoints ¡ dashed ¡= ¡true ¡fxn ¡ 0.8 0.7 0.6 y 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x

M=9 ¡ Degree 9, squared error = 0.0023272 1 N ¡= ¡11 ¡datapoints ¡ 0.9 dashed ¡= ¡true ¡fxn ¡ 0.8 0.7 0.6 y 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x

M ¡= ¡11 ¡ Degree 11, squared error = 1.184e − 20 1 N ¡= ¡11 ¡datapoints ¡ 0.8 dashed ¡= ¡true ¡fxn ¡ 0.6 0.4 y 0.2 0 − 0.2 − 0.4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x

Sum-‑of-‑squares ¡error ¡ fit ¡error ¡on ¡training/new ¡data ¡ 0.5 0.45 squared error 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 1 2 3 4 5 6 7 8 9 10 11 M

Predictability ¡ • Error ¡on ¡fieng ¡the ¡specific ¡training ¡data ¡keeps ¡decreasing ¡with ¡ model ¡complexity ¡(M). ¡ • Error ¡of ¡fit ¡to ¡previously ¡un-‑fit/unseen ¡data ¡improves ¡but ¡then ¡ worsens ¡with ¡increasing ¡M. ¡ • Model ¡is ¡ overfi.ng ¡to ¡foibles ¡of ¡training ¡data ¡(noise) ¡ajer ¡M ¡= ¡3. ¡ ¡ • Model ¡becomes ¡both ¡ more ¡complex ¡and ¡ less ¡predic8ve ¡beyond ¡M ¡= ¡ 3 ¡features. ¡ ¡ • Key ¡technique: ¡cross-‑validaLon. ¡Test ¡model ¡on ¡previously ¡unseen ¡ data. ¡Hold-‑out ¡dataset ¡or ¡jack-‑knife/leave-‑one-‑out ¡approaches. ¡ ¡ (There ¡are ¡other ¡ways ¡to ¡improve ¡predictability ¡by ¡reducing ¡complexity, ¡ ¡ e.g. ¡by ¡directly ¡constraining ¡the ¡complexity ¡of ¡the ¡model: ¡“regularizaLon”) ¡ ¡

Back ¡to ¡categorizaLon ¡example ¡ simplest ¡ intermediate ¡ most ¡flexible/complex ¡ exhibits ¡overfieng ¡

BeEer ¡features: ¡admit ¡simpler ¡model ¡ → ‘ a ’ ‘ b ’ beEer ¡choice ¡of ¡features ¡ poor ¡choice ¡of ¡features ¡ (In ¡regression ¡example, ¡data ¡were ¡generated ¡from ¡a ¡sine ¡wave. ¡ ¡ Using ¡sines ¡instead ¡of ¡polynomials ¡would ¡have ¡produced ¡an ¡excellent ¡2-‑parameter ¡fit.) ¡

Summary ¡ • A ¡good ¡model ¡can ¡describe ¡the ¡data ¡in ¡a ¡ relaLvely ¡simple/low-‑complexity/compact ¡way ¡ (but ¡not ¡too ¡low! ¡Einstein: ¡as ¡simple ¡as ¡possible, ¡ but ¡no ¡simpler) ¡and ¡has ¡good ¡predicLon ¡ performance. ¡ ¡ ¡ • ExtracLng ¡“features” ¡of ¡data ¡as ¡a ¡way ¡to ¡model ¡it. ¡ ¡ • To ¡determine ¡predictability, ¡important ¡to ¡cross-‑ validate ¡models/fits. ¡ ¡

What is modeling? NEU 466M Instructor: Professor Ila R. - PowerPoint PPT Presentation

What is modeling? NEU 466M Instructor: Professor Ila R. Fiete Spring 2016 Reference: NEURAL NETWORKS FOR PATTERN RECOGNITION, CHRISOPHER BISHOP

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Topics Why E Field Modeling What is E Field Modeling Case Studies Questions 2 Why

Outline 1 The topic 2 Decision support systems 3 Modeling 3.3 Advanced modeling

Verilog HDL:Digital Design and Modeling Chapter 5 Gate-Level Modeling Chapter 5 Gate-Level

Modeling Offsets and Linkage in a Modeling Offsets and Linkage in a Modeling Offsets and Linkage

Modeling Land Competition Modeling Land Competition Modeling Land Competition Ron Sands Ron

Importance of Soft Tissue Modeling Importance of Soft Tissue Modeling Most medical procedures

Verilog HDL:Digital Design and Modeling Chapter 8 Behavioral Modeling Chapter 8 Behavioral

Why choice modeling? Elea McDonnell Feit Instructor DataCamp Marketing Analytics in R: Choice

Mixed Eect Models Danielle Quinn PhD Candidate, Memorial University Regression Modeling in R:

Verilog HDL:Digital Design and Modeling Chapter 9 Structural Modeling Chapter 9 Structural

Language Modeling CSE392 - Spring 2019 Special Topic in CS Task Probabilistic Modeling

Modeling with UML Chapter 2, lecture 1, Overview: modeling with UML What is modeling?

Computer Simulation Modeling Jonathan Thaler Department of Computer Science 1 / 61 Modeling

Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support

CPSC 340: Machine Learning and Data Mining More Regularization Summer 2020 Admin

CS4501: Introduction to Computer Vision Max-Margin Classifier, Regularization, Generalization,

Stochastic Gradient Descent (SGD) Todays Class Stochastic Gradient Descent (SGD) SGD Recap

Modeling Performance and Energy Efficiency of Applica5on Codes

Cumulant Signal Processing, Tensors and some Recurring Problems Phil Regalia Department of

Two- and Multi-particle Cumulant Measurements of v n and Isolation of Flow and Nonflow in

Correlations and order parameters in infinite matrix product states Ian McCulloch Jason Pillay