How can machine learning help to predict changes in size of Atlantic - - PowerPoint PPT Presentation

how can machine learning help to predict changes in size
SMART_READER_LITE
LIVE PREVIEW

How can machine learning help to predict changes in size of Atlantic - - PowerPoint PPT Presentation

How can machine learning help to predict changes in size of Atlantic herring? Olga Lyashevska Clementine Harma Deirdre Brophy Coilin Minto Maurice Clarke * Marine and Freshwater Research Centre Galway-Mayo Institute of Technology (GMIT)


slide-1
SLIDE 1

How can machine learning help to predict changes in size of Atlantic herring?

Olga Lyashevska∗ Clementine Harma Deirdre Brophy Coilin Minto Maurice Clarke

* Marine and Freshwater Research Centre Galway-Mayo Institute of Technology (GMIT) Galway, Ireland

  • lga.lyashevska@gmit.ie

July, 22 2016

1 / 15

slide-2
SLIDE 2

Background

1960 1970 1980 1990 2000 2010

Years

15 20 25 30 35

Fish length, cm

Avg decline of 4 cm

2 / 15

slide-3
SLIDE 3

Problem

◮ Herring are one of the most important pelagic species

exploited by fisheries;

3 / 15

slide-4
SLIDE 4

Problem

◮ Herring are one of the most important pelagic species

exploited by fisheries;

◮ Reductions in growth have consequences for stock

productivity;

3 / 15

slide-5
SLIDE 5

Problem

◮ Herring are one of the most important pelagic species

exploited by fisheries;

◮ Reductions in growth have consequences for stock

productivity;

◮ The cause of the decline remains largely unexplained;

3 / 15

slide-6
SLIDE 6

Problem

◮ Herring are one of the most important pelagic species

exploited by fisheries;

◮ Reductions in growth have consequences for stock

productivity;

◮ The cause of the decline remains largely unexplained; ◮ Likely to be driven by the interactive effect of various

factors:

◮ sea surface temperature; ◮ zooplankton abundance; ◮ fish abundance; ◮ fishing pressure; 3 / 15

slide-7
SLIDE 7

Data

◮ 1959 – 2012; ◮ throughout the year; ◮ random sampling (n = 50 to 100) from commercial vessels; ◮ pelagic trawling; ◮ age and weight-at-length; ◮ total sample size 50,000;

4 / 15

slide-8
SLIDE 8

Study Area

Celtic Sea

5 / 15

slide-9
SLIDE 9

Objective

To identify important variables underlying changes in growth using Gradient Boosting Regression Trees (GBRT)

residuals residuals

6 / 15

slide-10
SLIDE 10

GBRT

◮ Advantages:

◮ Detection of (non-linear) feature interactions; ◮ Resistance to inclusion of irrelevant features; ◮ Heterogeneous data (features measured on different scale); ◮ Robustness to outliers; ◮ Accuracy; ◮ Different loss functions 7 / 15

slide-11
SLIDE 11

GBRT

◮ Advantages:

◮ Detection of (non-linear) feature interactions; ◮ Resistance to inclusion of irrelevant features; ◮ Heterogeneous data (features measured on different scale); ◮ Robustness to outliers; ◮ Accuracy; ◮ Different loss functions

◮ Disdvantages:

◮ Requires careful tuning; ◮ Slow to train (but fast to predict); 7 / 15

slide-12
SLIDE 12

Formal Specification

Fm(x) =

M

  • m=1

γmhm(x) (1) where γm is a weight and hm(x) are weak learners.

8 / 15

slide-13
SLIDE 13

Formal Specification

Fm(x) =

M

  • m=1

γmhm(x) (1) where γm is a weight and hm(x) are weak learners. GBRT builds the additive model in a forward stagewise fashion: Fm(x) = Fm−1(x) + ǫγmhm(x) (2) where ǫ is a shrinkage.

8 / 15

slide-14
SLIDE 14

Formal Specification

Fm(x) =

M

  • m=1

γmhm(x) (1) where γm is a weight and hm(x) are weak learners. GBRT builds the additive model in a forward stagewise fashion: Fm(x) = Fm−1(x) + ǫγmhm(x) (2) where ǫ is a shrinkage. At each stage the weak learner hm(x) is chosen to minimize the loss function L given the current model Fm−1 and its fit Fm−1(xi) Fm(x) = Fm−1(x) + arg min

h n

  • i=1

L(yi, Fm−1(xi) − h(x)) (3)

8 / 15

slide-15
SLIDE 15

GBRT hyperparameters

◮ number of iterations = 500; ◮ shrinkage (learning rate) = 0.05; ◮ max tree depth = 6; ◮ subsample = 0.75; ◮ loss function = Least Squares;

scikit

9 / 15

slide-16
SLIDE 16

Model estimation

100 200 300 400 500 Boosting Iterations 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 MSE

Train Test Validation

◮ MSE: 1.31 ◮ R2 train: 54.5% ◮ R2 test: 51.7% ◮ R2 val: 52.6%

Low R2 due to individual variability

10 / 15

slide-17
SLIDE 17

True vs Predicted

18 20 22 24 26 28 30 32 34 True length, cm 18 20 22 24 26 28 30 32 34 Predicted length, cm

11 / 15

slide-18
SLIDE 18

Variable Importance Plot

0.0 0.2 0.4 0.6 0.8 1.0 sal nao

trend temperature food abundance

12 / 15

slide-19
SLIDE 19

Partial Dependence Plots

2 4 6 8 10 12 xmonth − 1.0 − 0.5 0.0 0.5 Partial dependence 13.0 13.5 14.0 14.5 15.0 sst − 1.0 − 0.5 0.0 0.5 Partial dependence 15 30 45 60 75 chel1 − 1.0 − 0.5 0.0 0.5 Partial dependence

4 6 8 10 1

xmonth 13.0 13.5 14.0 14.5 sst

  • 1

.

  • 1
  • 1.

1

  • .

8

  • 0.39

.

0.43 0.43

4 6 8 10 1

xmonth 10

✷0

30 40 50 60 70 chel1 0.0

0.0

. 4 3 0.43 1

✷.8

13.

13.6 14.0 14.4 sst 10

✷0

30 40 50 60 70 chel1

  • 1.

1

  • .

8

  • 0.39

0.0

. 4 3 0.43 0.85

13 / 15

slide-20
SLIDE 20

Conclusions

◮ trend, sea surface temperature and food availability are three

most importans features;

14 / 15

slide-21
SLIDE 21

Conclusions

◮ trend, sea surface temperature and food availability are three

most importans features;

◮ sea surface temperature above 14 degrees negatively relates to

fish length, whereas food availability is invariant;

14 / 15

slide-22
SLIDE 22

Conclusions

◮ trend, sea surface temperature and food availability are three

most importans features;

◮ sea surface temperature above 14 degrees negatively relates to

fish length, whereas food availability is invariant;

◮ there is a high degree of interaction between all features;

14 / 15

slide-23
SLIDE 23

Conclusions

◮ trend, sea surface temperature and food availability are three

most importans features;

◮ sea surface temperature above 14 degrees negatively relates to

fish length, whereas food availability is invariant;

◮ there is a high degree of interaction between all features; ◮ not a cause-effect relationship, but a relative importance of

the variables;

14 / 15

slide-24
SLIDE 24

linked.in/lyashevska lyashevska

scikit Acknowledgements: This research was carried out with the support of the Irish Environmental Protection Agency grant (Ecosystem tipping points: learning from the past to manage for the future, project code 2015-NC-MS-3) and the support of the Marine Institute under the Marine Research Sub-programme funded by the Irish Government. 15 / 15