how can machine learning help to predict changes in size
play

How can machine learning help to predict changes in size of Atlantic - PowerPoint PPT Presentation

How can machine learning help to predict changes in size of Atlantic herring? Olga Lyashevska Clementine Harma Deirdre Brophy Coilin Minto Maurice Clarke * Marine and Freshwater Research Centre Galway-Mayo Institute of Technology (GMIT)


  1. How can machine learning help to predict changes in size of Atlantic herring? Olga Lyashevska ∗ Clementine Harma Deirdre Brophy Coilin Minto Maurice Clarke * Marine and Freshwater Research Centre Galway-Mayo Institute of Technology (GMIT) Galway, Ireland olga.lyashevska@gmit.ie July, 22 2016 1 / 15

  2. Background Avg decline of 4 cm 35 30 Fish length, cm 25 20 15 1960 1970 1980 1990 2000 2010 Years 2 / 15

  3. Problem ◮ Herring are one of the most important pelagic species exploited by fisheries; 3 / 15

  4. Problem ◮ Herring are one of the most important pelagic species exploited by fisheries; ◮ Reductions in growth have consequences for stock productivity; 3 / 15

  5. Problem ◮ Herring are one of the most important pelagic species exploited by fisheries; ◮ Reductions in growth have consequences for stock productivity; ◮ The cause of the decline remains largely unexplained; 3 / 15

  6. Problem ◮ Herring are one of the most important pelagic species exploited by fisheries; ◮ Reductions in growth have consequences for stock productivity; ◮ The cause of the decline remains largely unexplained; ◮ Likely to be driven by the interactive effect of various factors: ◮ sea surface temperature; ◮ zooplankton abundance; ◮ fish abundance; ◮ fishing pressure; 3 / 15

  7. Data ◮ 1959 – 2012; ◮ throughout the year; ◮ random sampling (n = 50 to 100) from commercial vessels; ◮ pelagic trawling; ◮ age and weight-at-length; ◮ total sample size 50,000; 4 / 15

  8. Study Area Celtic Sea 5 / 15

  9. Objective To identify important variables underlying changes in growth using Gradient Boosting Regression Trees (GBRT) residuals residuals 6 / 15

  10. GBRT ◮ Advantages: ◮ Detection of (non-linear) feature interactions; ◮ Resistance to inclusion of irrelevant features; ◮ Heterogeneous data (features measured on different scale); ◮ Robustness to outliers; ◮ Accuracy; ◮ Different loss functions 7 / 15

  11. GBRT ◮ Advantages: ◮ Detection of (non-linear) feature interactions; ◮ Resistance to inclusion of irrelevant features; ◮ Heterogeneous data (features measured on different scale); ◮ Robustness to outliers; ◮ Accuracy; ◮ Different loss functions ◮ Disdvantages: ◮ Requires careful tuning; ◮ Slow to train (but fast to predict); 7 / 15

  12. Formal Specification M � F m ( x ) = γ m h m ( x ) (1) m =1 where γ m is a weight and h m ( x ) are weak learners. 8 / 15

  13. Formal Specification M � F m ( x ) = γ m h m ( x ) (1) m =1 where γ m is a weight and h m ( x ) are weak learners. GBRT builds the additive model in a forward stagewise fashion: F m ( x ) = F m − 1 ( x ) + ǫγ m h m ( x ) (2) where ǫ is a shrinkage. 8 / 15

  14. Formal Specification M � F m ( x ) = γ m h m ( x ) (1) m =1 where γ m is a weight and h m ( x ) are weak learners. GBRT builds the additive model in a forward stagewise fashion: F m ( x ) = F m − 1 ( x ) + ǫγ m h m ( x ) (2) where ǫ is a shrinkage. At each stage the weak learner h m ( x ) is chosen to minimize the loss function L given the current model F m − 1 and its fit F m − 1 ( x i ) n � F m ( x ) = F m − 1 ( x ) + arg min L ( y i , F m − 1 ( x i ) − h ( x )) (3) h i =1 8 / 15

  15. GBRT hyperparameters ◮ number of iterations = 500; ◮ shrinkage (learning rate) = 0.05; ◮ max tree depth = 6; ◮ subsample = 0.75; ◮ loss function = Least Squares; scikit 9 / 15

  16. Model estimation 2.8 2.6 Train 2.4 Test Validation ◮ MSE: 1.31 2.2 ◮ R 2 train: 54.5% MSE ◮ R 2 test: 51.7% 2.0 ◮ R 2 val: 52.6% 1.8 1.6 1.4 1.2 0 100 200 300 400 500 Boosting Iterations Low R 2 due to individual variability 10 / 15

  17. True vs Predicted 34 32 30 Predicted length, cm 28 26 24 22 20 18 18 20 22 24 26 28 30 32 34 True length, cm 11 / 15

  18. Variable Importance Plot nao sal 0.0 0.2 0.4 0.6 0.8 1.0 trend temperature abundance food 12 / 15

  19. Partial Dependence Plots 0.5 0.5 0.5 Partial dependence Partial dependence Partial dependence 0.0 0.0 0.0 − 0.5 − 0.5 − 0.5 − 1.0 − 1.0 − 1.0 2 4 6 8 10 12 13.0 13.5 14.0 14.5 15.0 15 30 45 60 75 xmonth sst chel1 70 70 - 14.5 -1. 3 1 4 . 60 60 ✁ � . 0 1 1 -0.39 - 0 . 8 0 0 . 0 50 50 14.0 ✂ chel1 chel1 40 40 0.43 sst 0.43 0.43 30 30 13.5 0.85 ✷ 0 ✷ 0 0 . 4 3 10 10 13.0 -0.39 - -1. 0.0 0.0 0 0.43 0.0 0 0 . ☎ 8 ✆ 4 6 8 10 1 4 6 8 10 1 1 ✷ .8 13. 13.6 14.0 ✝ 14.4 ✄ 0 1 ✷ ✷ ✷ ✷ ✷ xmonth xmonth sst 13 / 15

  20. Conclusions ◮ trend, sea surface temperature and food availability are three most importans features; 14 / 15

  21. Conclusions ◮ trend, sea surface temperature and food availability are three most importans features; ◮ sea surface temperature above 14 degrees negatively relates to fish length, whereas food availability is invariant; 14 / 15

  22. Conclusions ◮ trend, sea surface temperature and food availability are three most importans features; ◮ sea surface temperature above 14 degrees negatively relates to fish length, whereas food availability is invariant; ◮ there is a high degree of interaction between all features; 14 / 15

  23. Conclusions ◮ trend, sea surface temperature and food availability are three most importans features; ◮ sea surface temperature above 14 degrees negatively relates to fish length, whereas food availability is invariant; ◮ there is a high degree of interaction between all features; ◮ not a cause-effect relationship, but a relative importance of the variables; 14 / 15

  24. lyashevska linked.in/lyashevska s ci kit Acknowledgements: This research was carried out with the support of the Irish Environmental Protection Agency grant (Ecosystem tipping points: learning from the past to manage for the future, project code 2015-NC-MS-3) and the support of the Marine Institute under the Marine Research Sub-programme funded by the Irish Government. 15 / 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend