ACTUARIES & DATA SCIENCE Jerome Tuttle, FCAS, CPCU - - PowerPoint PPT Presentation

actuaries data science
SMART_READER_LITE
LIVE PREVIEW

ACTUARIES & DATA SCIENCE Jerome Tuttle, FCAS, CPCU - - PowerPoint PPT Presentation

ACTUARIES & DATA SCIENCE Jerome Tuttle, FCAS, CPCU Retired Actuary 1 What i is an an act actuar ary? The m mathematicia icians ns o of the i insurance ce i industry. A business p professiona nal who wh


slide-1
SLIDE 1

ACTUARIES & DATA SCIENCE

Jerome Tuttle, FCAS, CPCU Retired Actuary

1

slide-2
SLIDE 2

What i is an an act actuar ary?

  • The

m mathematicia icians ns

  • f

the i insurance ce i industry.

  • A

business p professiona nal wh who deals wit with t the f fina inanc ncial im impact

  • f

ris isk a and nd unce ncertaint nty.

  • Anal

alyz yzes, m man anag ages, an and meas asures t the f finan ancial i impac pact

  • f

ri risk an and u uncertai ainty. y.

  • Develo

elops a and validates es m models ls a and commu municates es r result lts t to guide d decis isio ion-making king.

  • Act

ctuaries in in movie ies:

2

Jack N k Nicholson n – Ben S Stille ller – About Sc Schmidt (200 2002) A Alo long Ca Came Polly lly (200 2004)

slide-3
SLIDE 3

Insu nsuranc nce is is a uniq nique b busine siness ss

  • We

d don’t n’t kno know

  • ur

co cost (cl (claims) wh when we we sell t the p policy icy, a and nd wit with s some cl claim ims we we don’t n’t kno know for many ny years.

  • We

We a are not required to to se sell to to everyone – similar ar t to ban ank l loan ans and college a admissi ssions. s.

  • We

do do not char arge t the s sam ame pri price t to everyo ryone. This i is RE REQUIRE RED b by law aw, e e.g., F FL Statute 6 627.062:

Ra Rates m may ay not be u unfai airl rly di discri riminatory. ry.

A rate is is unf nfair irly d dis iscr criminatory t to a group

  • f

ris isks ks i if the rate te d does n not be bear a a reasonable r relationship to to the the expe xpected l loss e expe xperience am among t the ri risks.

3

$$$$$ $$$

slide-4
SLIDE 4

The he i interse secti tion a among math/ th/sta stats, ts, c compute ter s sci, & subject ct m matter k knowledge t to extract ct m meaning ningful i insig ights from data ta tr transl slating i into to ta tangible bu business v value.

What is da data ta s science?

4

slide-5
SLIDE 5

Exam amples

  • f
  • f

data sci cience ce e

  • Interne

net s search ch e engine ine a algorit ithms.

  • Tar

argeted adv advert rtising an and recommenda dations.

  • Tar

arget S Stores s sent di diape aper c coupo pons t to the pr pregnan ant t teenager be before she she to told he her f fath ther. (Fo Folklore?)

  • Moneyb

yball an and spo port rts an anal alyt ytics.

  • Bette

tter s singles m matchi hing

  • n

dating w websi bsite tes. s.

  • Diseas

ase di diag agnosis, pe pers rsonal alized h heal althcar are r recs.

  • Data

dri driven c cri rime pr predi diction, f fac acial al r recognition, t terr rrori rist forecasts. sts.

  • Which

t twe weets d did id Trump wr writ ite, a and nd wh whic ich d did id his is s staff wr writ ite?

5

slide-6
SLIDE 6

Act Actuar aries an and data sci cience ce

  • “Act

ctuaries we were a among ng t the f fir irst d data sci cient ntists.” (C (Colin P Prie iest, ac actuary ary turn rned da data scientist a at Data Ro Robot, S Singapo pore.)

  • Actuari

aries ar are strongest a at math/stat an and do domai ain k knowledg dge (we s study i insurance, b besi sides m math/ h/sta stat). t).

  • Data

s scient ntis ists a are strong ngest a at computer s science nce, e especia cially co coding ing, d data m manip nipulation a and nd joinin ining t tables, t theory

  • f

machine ine l learning ning ( (training ing v versus t testing ing,

  • vertraining

ining), a and machine ine l learning ning a algorit ithms.

  • Actuari

arial exam xams n now i include de: Gener erali lized ed linea ear m models ls, K K-nearest st n neighbo bors, s, K K-mea means clusteri ring, B Baye ayes c clas assifier, r, de decision t trees, r ran ando dom f forest, pri principa pal c compo ponent an anal alys ysis. Also a a pr predi dictive an anal al spe pecialty. y.

6

slide-7
SLIDE 7

Rando domly s spl plit da data ta into to tr training versus us te testi ting da data ta

7

RMSE

  • n

n test d data = = √[∑ (A (Act ctual – Predict icted)2 / n] n]

slide-8
SLIDE 8

8

Som

  • me

act actuar arial e exam amples

  • f
  • f

data s sci cience ce t tech chniques

  • If

predict ctiv ive m modeling ing r refers t to estim imating ing i insurance nce c costs, t then ac actuari aries h hav ave b been do doing t this f forever.

  • Today

p predict ctiv ive m modeling ng i is computatio iona nally i intens nsiv ive,

  • ften

testi sting a all possibl ble p permuta tati tions

  • f

variabl bles, s, t transformati tions, s, etc

  • The

2 2 broad ad c categori ries i in da data science ar are pr predi diction an and classifica ication. Classif ifica icatio ion i is p predict cting ng a a category.

  • Pr

Prediction

  • n
  • ft
  • ften

i invol volve ves t types

  • f
  • f

regression. . Linear r regression i is bei eing r rep eplaced b by mo more f flex lexible Ge Generalized L Linear M Models els.

  • Classif

ifica icatio ion i includes: Decis cisio ion t trees: underwr writ iting ing Clustering ing: territ itorie ies Pri rincipa pal c compo ponent an anal alys ysis: de detect f frau aud

  • In

the f following e exam xamples, as assume n n inde dependent v vari ariab ables an and p da data v val alues.

slide-9
SLIDE 9

9

For

  • r

i insuran ance ce rating, we grou

  • up

(h (hop

  • pefully)

similar cu custom

  • mers

into cl clas asses an and ch char arge an an av averag age rate for

  • r

the cl clas ass. Clas assifica cation is rar arely p perfect ct.

Before classif ifica icatio ion After classif ifica icatio ion

slide-10
SLIDE 10

Insu nsuranc nce cla lasses sses may inc inclu lude age, e, gend ender, urban / / rur ural te territory, y, marita tal sta tatus tus, miles dr driven, claims histor

  • ry,

car car type, car car ag age, e etc. c.

But w within in e each ch n n-dim imensiona nal s slic ice, t there is is s stil ill co cons nsiderable vari ariab ability. y. A compan pany w wan ants t to choose t the b better t than an av averag age c customers within e eac ach c clas ass t to mak ake a a pr profit.

10

slide-11
SLIDE 11

Gener eneralized L Linea inear M Models: els: pric icin ing

  • Tradit

itionally we we u used cl classical l line inear r regression, n, a and nd we we t treated

  • ur

pricing ng b by class a as multip iplica icativ ive:

  • Ba

Base se rate te = = $ $100 Times fa factor

  • r

f for

  • r

Age i i = 1.5 .50 Times fa factor

  • r

f for

  • r

Gender j j = 1.2 .20 Times fa factor

  • r

f for

  • r

Territor

  • ry

k k = 1.4 .40, ... ... , etc.

  • This

di disregar ards ds i interac actions b between c clas asses an and mak akes as assumptions

  • n

norm rmal ality an and common v vari arian ances.

  • GLMs

co cons nsis ist

  • f

wid wider r rang nge

  • f

models wit with r response v varia iable as assumed t to be a a member

  • f

expo xponential f fam amily. y.

  • Re

Results i in some f fac actors rs b being r redu duced,

  • thers

i increas ased. d.

  • Other

appl applications

  • f

GLM LM: Effect ct

  • f

telematics ics

  • n

claim ims Unde derw rwri riting s score c car ards ds Predict ict c claims l like kely t to settle f far above t their ir i initia ial e estim imate

11

slide-12
SLIDE 12

12

Dec ecisio ision t trees: ees: und nder erwritin ing

  • Sequ

quen entially lly s splits d data into c categ egories es h having s simi mila lar v values es f for de depe pendent v vari ariab ables.

  • Use

Uses sta statistic su such a as Gini In Index to to do sp split.

  • Possible

v vari ariab ables:

  • no. ye

years ars r renewed,

  • ccupa

pation, pr premium paym payment h history, ry, t telematics ( (spe peed, b brak aking, t time

  • f

day, day, etc.)

slide-13
SLIDE 13

13

Clus uste tering: te territories

  • Parti

titions d data ta into to c classes ba base sed

  • n

ho how c close sely d data ta i is grouped ed. Iter eratively ely u updates es c center ers a and re-parti titi tions. s.

  • There

i is n no de depe pende dent v vari ariab able.

  • Ano

nother u use is is cl clustering s sim imilar

  • ccu

ccupations ns.

  • Flori

rida da h has as 2 28 r rating terr rritori ries i in au auto.

Yao, J. (2008). Clustering in ratemaking; applications in territories clustering. Casualty Actuarial Society Predictive Modeling Seminar

slide-14
SLIDE 14

14

Princi cipal com compon

  • nent

an anal alysis: frau aud detect ction

  • n
  • Re

Redu duces a a lar arge n

  • no. of

vari ariab ables t to a smal aller n no.

  • f

mutually lly u uncorrela lated ed v variable les t that p preser erves es as as m much v vari ariab ability as as po possible.

  • Auto

f fraud (s (staged a accid ccident, inf inflated b bil ills, co collusive medical

  • r

bo body sho shops) ha hard to to dete tect by by first st-level c l claim m exam xaminers rs.

  • No

de depe pendent v vari ariab

  • able. D

Data do doesn’t s say ay which c clai aim i is defi finitely fr fraud ud. . Muc uch fr fraud ud is un undetected.

  • Data
  • ft
  • ften
  • r
  • rdinal,

e e.g .g. s . sus uspicion l level = = {1, 2 2, …, …, 5} f for

  • r

eac ach vari ariab able ( (# chiroprac actor v visits, h hi v vol med pr provide der) r).

  • Goa
  • al

i is

  • ve
  • verall

fr fraud ud sus uspicion

  • n

s scor

  • re,

iteratively w weighting ind indiv iv v varia iables b based

  • n

their co cons nsistenc ncy a and nd co correlation to

  • ve
  • verall

s scor

  • re.
  • Why

Why d do we stu study L Linear Al Algebra? PCA u use ses

  • rtho

thogonal transformati tions, s, e eigenvecto tors. s.

slide-15
SLIDE 15

15

Credi dit s scoring

  • A

numerica ical s score

  • f

a person’s n’s c credit itwo worthine iness. Ideally corr rrelates w with c clai aims e expe xperi rience an and pr provide des addi additional al pr predi dictive ab ability b beyo yond t tradi aditional al r rating v vari ariab ables.

  • Perm

rmitted b by FL Statute 6 626.9741.

  • Crit

iticized a as unf nfair ir t to mino inorities a and nd low-income me p people, le, al although an anal alys ysts di dispu pute t the c cri riticism.

  • Variabl

bles include d debt/ bt/asse sset r ratio, l late te payment h histo story…

  • Data

s science nce t tech chniq niques: clusters, t trees, G GLM, M, P PCA, … … ■ Lo Loss r ratio = f (vari ariab ables X X1, … …, Xn)

  • Oft

ften us used i in the d decision

  • n

w whether

  • r
  • r

not

  • t

to

  • ffe
  • ffer

insu surance, bu but n not use sed to to dete termine the the p price.

  • Man

any pu publicly av avai ailable c credi dit ri risk da datab abases f for credi dit car ards ds an and loan ans, e e.g., Kag aggle c compe petition a at Kaggle. ggle.com/c m/c/G /GiveM eMeSo eSomeC meCred edit/ ■ Pr Prob

  • bability
  • f
  • f

defa faul ult = = f (va variables X X1, …, …, Xn)

slide-16
SLIDE 16

16

Text An Anal alysis

  • Mo

Most data is is nu numerical and nd is is ne neatly ca captured in in f fie ields

  • Fre

ree-for

  • rm

text i is a pot

  • tential

g gol

  • ld

m mine

  • f
  • f

infor

  • rmation
  • n,

b but ut i it requ equires ef effort t to ex extract go gold ld n nugg ggets.

  • Misspelli

ellings gs, s synonyms, s stems ems l like “ “ing ing” an and “e “ed” d”, e etc.

  • Look
  • ok

for

  • r

fr

  • freq. w

. wor

  • rds,

g group

  • ups
  • f
  • f

wor

  • rds

appearing t tog

  • gether.
  • What

k kinds ds

  • f

clai aims ar are

  • ccurri

rring? “W “Water” r” m may ay be a a captured ed f field ld, b but “water er & & baseme ement”

  • r

“water er & & ceili ling” g” may be m more h helpful in in find indin ing t trends.

  • What

w words ds s signal po potential al l lar arge c clai aim am amount?

  • How

do do pe peopl ple f feel ab about i insuran ance ads ads? Ide dentify senti timents ts i in custo tomer s surveys & & tweets. ts.

  • FL

Statute 627.4145 r requir ires i insurance nce p policie cies h have m min 4 45 score

  • n

Flesc sch r readabi bility t test. st.

slide-17
SLIDE 17

17

Ref Refer eren ences

  • Frees,

E.

  • E. W.,

.,

  • et. a

. al. . (2016). Pr . Predictive m mod

  • deling

appl applications i in ac actuari rial al s

  • science. N

New Y York rk: Cambr bridge U Universi sity ty Press. ss.

  • Healy,

K K. . (2018). D Data vi visua ualization

  • n. Pr

. Princeton

  • n,

NJ.: P Princeto ton U Universi sity ty P Press. ss.

  • James,

G G., ., e

  • et. a

. al. . (2017). A An introd

  • duc

uction

  • n

t to statis istical l learning ing w with a applic icatio ions ns i in

  • R. New

York rk: S Spri pringer

  • Zh

Zhao ao,

  • Y. (

(2013). R an and da data m

  • mining. San

an D Diego: Academic P Press. ss.