performance evaluation and hyperparameter tuning of
play

Performance evaluation and hyperparameter tuning of statistical and - PowerPoint PPT Presentation

Performance evaluation and hyperparameter tuning of statistical and machine-learning models using spatial data P a tr ic k S ch r a tz 1 , J a nn e s M u e n ch ow 1 , J a ko b R ich t e r 2 , A l e x a n de r B r e nn i n g 1 GIS cie n ce S e m i


  1. Performance evaluation and hyperparameter tuning of statistical and machine-learning models using spatial data P a tr ic k S ch r a tz 1 , J a nn e s M u e n ch ow 1 , J a ko b R ich t e r 2 , A l e x a n de r B r e nn i n g 1 GIS cie n ce S e m i n a r S e r ie s , J e n a , 14 F eb 2018 1 D e p a rtm e nt o f G e o g r a p h y , GIS cie n e g roup , U n i v e rs i ty o f J e n a 2 D e p a rtm e nt o f S t a t i st ic s , T U D ortmun d h ttps :// p a t - s . gi t h u b . i o @ pjs _ 228 @ p a t - s @ pjs _ 228 p a tr ic k . s ch r a tz @ un i - j e n a . de P a tr ic k S ch r a tz

  2. Outline 1. I ntro d u c t i on 2. D a t a a n d stu d y a r ea 3. M e t h o d s 4. R e sults 5. D i s c uss i on 2 / 29

  3. Introduction 3 / 29

  4. Introduction LIFE Healthy Forest E a rly de t ec t i on a n d ad v a n ced m a n age m e nt syst e ms to r ed u ce f or e st dec l i n e b y i nv a s i v e a n d p a t h o ge n ic age nts . Main task : S p a t ia l ( mo de l i n g ) a n a lys i s to support t he ea rly de t ec t i on o f v a r i ous p a t h o ge ns a n d pr edic t i on to ot he r a r ea s . Pathogens F us a r i um ci r ci n a tum Diplodia pinea ( n eed l e b l igh t ) A rm i ll a r ia root di s ea s e Fig. 1: N eed l e b l igh t ca us ed b y Diplodia pinea H e t e ro ba s idi on a nnosum 4 / 29

  5. Introduction Motivation F i n d t he mo de l w i t h t he highest predictive performance f or our da t a s e t . R e sults a r e a ssum ed to be r e pr e s e nt a t i v e f or da t a s e ts w i t h s i m i l a r pr edic tors a n d di � e r e nt p a t h o ge ns a s r e spons e . B e a w a r e o f spatial autocorrelation C on d u c t " opt i m a l " h yp e rp a r a m e t e r tun i n g f or m achi n e - l ea rn i n g mo de ls . S h ow a n d a n a lyz e di � e r e n ce s i n p e r f orm a n ce s be tw ee n sp a t ia l c ross - v a l ida t i on a n d non - sp a t ia l c ross - v a l ida t i on . 5 / 29

  6. Data & Study Area 6 / 29

  7. Data & Study Area � � Skim summar y statistics � � n obs : 926 � � n variables : 12 � � � � Variable t y pe : factor � � � � variable missing n n _ unique top _ counts � � ----------- --------- ----- ---------- -------------------------------------------- � � diplo 01 0 926 2 0� 703, 1� 223, NA � 0 � � litholog y 0 926 5 clas : 602, chem : 143, biol : 136, surf : 32 � � soil 0 926 7 soil : 672, soil : 151, soil : 35, pron : 22 � � y ear 0 926 4 2009� 401, 2010� 261, 2012� 162, 2011� 102 � � � � Variable t y pe : numeric � � � � variable missing n mean p 0 p 50 p 100 hist � � --------------- --------- ----- ---------- ------- -------- -------- ---------- � � age 0 926 18.94 2 20 40 ▂▃▅▆▇▂▂▁ � � elevation 0 926 338.74 0.58 327.22 885.91 ▃▇▇▇▅▅▂▁ � � hail _ prob 0 926 0.45 0.018 0.55 1 ▇▅▁▂▆▇▃▁ � � p _ sum 0 926 234.17 124.4 224.55 496.6 ▅▆▇▂▂▁▁▁ � � ph 0 926 4.63 3.97 4.6 6.02 ▃▅▇▂▂▁▁▁ � � r _ sum 0 926 - 0.00004 - 0.1 0.0086 0.082 ▁▂▅▃▅▇▃▂ � � slope _ degrees 0 926 19.81 0.17 19.47 55.11 ▃▆▇▆▅▂▁▁ � � temp 0 926 15.13 12.59 15.23 16.8 ▁▁▃▃▆▇▅▁ 7 / 29

  8. Data & Study Area Fig. 2: S tu d y a r ea ( B a squ e C ountry , S p ai n ) 8 / 29

  9. Methods 9 / 29

  10. Methods Machine-learning models B oost ed R eg r e ss i on T r ee s ( BRT ) R a n d om F or e st ( RF ) S upport Vec tor M achi n e ( SVM ) Weigh t ed k - n ea r e st N eighb or ( WKNN ) Parametric models G e n e r e l i z ed A dd t i t i v e M o de l ( GAM ) G e n e r a l i z ed L i n ea r M o de l ( GLM ) Performance Measure A r ea un de r t he R ecei v e r O p e r a t i n g C urv e ( A U ROC ) 10 / 29

  11. Methods Nested Cross-Validation C ross - v a l ida t i on f or performance estimation [outer level] C ross - v a l ida t i on f or hyperparameter tuning ( r a n d om s ea r ch ) [inner level] D i � e r e nt s a mpl i n g str a t egie s ( P e r f orm a n ce e st i m a t i on / T un i n g ): N on - S p a t ia l / N on - S p a t ia l S p a t ia l / N on - S p a t ia l S p a t ia l / S p a t ia l N on - S p a t ia l / N o T un i n g S p a t ia l / N o T un i n g 11 / 29

  12. Methods Nested (spatial) Cross-Validation Fig. 3: N e st ed sp a t ia l / non - sp a t ia l c ross - v a l ida t i on 12 / 29

  13. Methods Nested (spatial) Cross-Validation Fig. 4: C omp a r i son o f sp a t ia l a n d non - sp a t ia l p a rt i t i on i n g o f t he da t a s e t . 13 / 29

  14. Methods Hyperparameter tuning Random search ha s de s i r ab l e prop e rt ie s i n high di m e ns i on a l a n d no di s ad v a nt age s i n low di m e ns i on a l s i tu a t i ons c omp a r ed to grid search ( B e r g str a & B e n gi o , 2012). 14 / 29

  15. Results 15 / 29

  16. Results Hyperparameter tuning Fig 4: H yp e rp a r a m e t e r tun i n g r e sults o f t he sp a t ia l / sp a t ia l C V s e tt i n g f or BRT , W KNN , RF a n d S V M : N um be r o f tun i n g i t e r a t i ons (1 i t e r a t i on = 1 r a n d om h yp e rp a r a m e t e r s e tt i n g ) vs . pr edic t i v e p e r f orm a n ce ( A U ROC ). 16 / 29

  17. Results (Predictive Performance) Fig 5: ( N e st ed ) C V e st i m a t e s o f mo de l p e r f orm a n ce a t t he r e p e t i t i on l e v e l us i n g 200 r a n d om s ea r ch i t e r a t i ons . C V s e tt i n g r efe rs to p e r f om a n ce e st i m a t i on / h yp e rp a r a m e t e r tun i n g o f t he r e sp ec t i v e ( n e st ed ) C V , e . g . ” S p a t ia l / N on - S p a t ia l ” m ea ns t ha t sp a t ia l 17 / 29 p a rt i t i on i n g w a s us ed f or p e r f orm a n ce e st i m a t i on a n d non - sp a t ia l p a rt i t i on i n g f or h yp e rp a r a m e t e r tun i n g .

  18. Discussion 18 / 29

  19. Discussion Predictive performance RF a n d GAM s h ow ed t he be st pr edic t i v e p e r f orm a n ce 19 / 29

  20. Discussion Predictive performance RF a n d GAM s h ow ed t he be st pr edic t i v e p e r f orm a n ce H igh bia s i n p e r f orm a n ce w he n us i n g non - sp a t ia l C V 19 / 29

  21. Discussion (Performance) Fig 6: ( N e st ed ) C V e st i m a t e s o f mo de l p e r f orm a n ce a t t he r e p e t i t i on l e v e l us i n g 200 r a n d om s ea r ch i t e r a t i ons . C V s e tt i n g r efe rs to p e r f om a n ce e st i m a t i on / h yp e rp a r a m e t e r tun i n g o f t he r e sp ec t i v e ( n e st ed ) C V , e . g . ” S p a t ia l / N on - S p a t ia l ” m ea ns t ha t sp a t ia l 20 / 29 p a rt i t i on i n g w a s us ed f or p e r f orm a n ce e st i m a t i on a n d non - sp a t ia l p a rt i t i on i n g f or h yp e rp a r a m e t e r tun i n g .

  22. Discussion Predictive Performance RF a n d GAM s h ow ed t he be st pr edic t i v e p e r f orm a n ce H igh bia s i n p e r f orm a n ce w he n us i n g non - sp a t ia l C V P a r a m e tr ic mo de ls ( GLM , GAM ) s h ow e qu a lly g oo d p e r f orm a n ce e st i m a t e s a s t he be st ML a l g or i t h m ( RF ) 21 / 29

  23. Discussion Iturritxa et al. (2014) GLM : 0.65 A U ROC ( w i t h out pr edic tor hail ) GLM : 0.96 A U ROC ( w i t h pr edic tor hail ) This work GLM : 0.66 A U ROC ( w i t h out pr edic tor hail _ prob ) + slop e , p H , l i t h olo g y , so i l GLM : 0.694 ( w i t h pr edic tor hail _ prob ) + slop e , p H , l i t h olo g y , so i l 22 / 29

  24. Discussion Hyperparameter tuning S a tur a t e s a t 50 r e p e t i t i ons a n d ha s a sm a ll e � ec t f or SVM a n d BRT ( a r bi tr a ry defa ults ). 23 / 29

  25. Discussion Hyperparameter tuning S a tur a t e s a t 50 r e p e t i t i ons a n d ha s a sm a ll e � ec t f or SVM a n d BRT ( a r bi tr a ry defa ults ). A lmost no e � ec t on pr edic t i v e p e r f orm a n ce f or W KNN a n d RF ( r ea son ab l e defa ults ). 23 / 29

  26. Discussion Hyperparameter tuning S a tur a t e s a t 50 r e p e t i t i ons a n d ha s a sm a ll e � ec t f or SVM a n d BRT ( a r bi tr a ry defa ults ). A lmost no e � ec t on pr edic t i v e p e r f orm a n ce f or W KNN a n d RF ( r ea son ab l e defa ults ). D efa ult h yp e rp a r a m e t e rs o f RF ( a n d a ll ot he r l ea rn e rs ) a r e not su i t ab l e f or sp a t ia l da t a 23 / 29

  27. Discussion (Tuning) 24 / 29

  28. Discussion Hyperparameter tuning S a tur a t e s a t ~ 50 r e p e t i t i ons a n d ha s a sm a ll e � ec t f or SVM a n d BRT ( a r bi tr a ry defa ults ). A lmost no e � ec t f or WKNN a n d RF ( r ea son ab l e defa ults ). D efa ult h yp e rp a r a m e t e rs o f RF ( a n d a ll ot he r l ea rn e rs ) a r e not su i t ab l e f or sp a t ia l da t a T he y possibly l ead to bia s ed p e r f orm a n ce e st i m a t e s a s t he y ca us e fi tt ed mo de ls to m a k e us e o f t he r e m ai n i n g sp a t ia l a uto c orr e l a t i on i n t he da t a . M ea n i n gf ul defa ult v a lu e s ( RF , WKNN ) ha v e bee n e st i m a t ed on non - sp a t ia l da t a s e ts . A lw a ys p e r f orm a sp a t ia l h yp e rp a r a m e t e r tun i n g f or sp a t ia l da t a s e ts , e v e n if i t d o e s not i mprov e acc ur ac y 25 / 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend