ACTUARIES & DATA SCIENCE
Jerome Tuttle, FCAS, CPCU Retired Actuary
1
ACTUARIES & DATA SCIENCE Jerome Tuttle, FCAS, CPCU - - PowerPoint PPT Presentation
ACTUARIES & DATA SCIENCE Jerome Tuttle, FCAS, CPCU Retired Actuary 1 What i is an an act actuar ary? The m mathematicia icians ns o of the i insurance ce i industry. A business p professiona nal who wh
1
m mathematicia icians ns
the i insurance ce i industry.
business p professiona nal wh who deals wit with t the f fina inanc ncial im impact
ris isk a and nd unce ncertaint nty.
alyz yzes, m man anag ages, an and meas asures t the f finan ancial i impac pact
ri risk an and u uncertai ainty. y.
elops a and validates es m models ls a and commu municates es r result lts t to guide d decis isio ion-making king.
ctuaries in in movie ies:
2
Jack N k Nicholson n – Ben S Stille ller – About Sc Schmidt (200 2002) A Alo long Ca Came Polly lly (200 2004)
d don’t n’t kno know
co cost (cl (claims) wh when we we sell t the p policy icy, a and nd wit with s some cl claim ims we we don’t n’t kno know for many ny years.
We a are not required to to se sell to to everyone – similar ar t to ban ank l loan ans and college a admissi ssions. s.
do do not char arge t the s sam ame pri price t to everyo ryone. This i is RE REQUIRE RED b by law aw, e e.g., F FL Statute 6 627.062:
■
Ra Rates m may ay not be u unfai airl rly di discri riminatory. ry.
■
A rate is is unf nfair irly d dis iscr criminatory t to a group
ris isks ks i if the rate te d does n not be bear a a reasonable r relationship to to the the expe xpected l loss e expe xperience am among t the ri risks.
3
$$$$$ $$$
The he i interse secti tion a among math/ th/sta stats, ts, c compute ter s sci, & subject ct m matter k knowledge t to extract ct m meaning ningful i insig ights from data ta tr transl slating i into to ta tangible bu business v value.
4
net s search ch e engine ine a algorit ithms.
argeted adv advert rtising an and recommenda dations.
arget S Stores s sent di diape aper c coupo pons t to the pr pregnan ant t teenager be before she she to told he her f fath ther. (Fo Folklore?)
yball an and spo port rts an anal alyt ytics.
tter s singles m matchi hing
dating w websi bsite tes. s.
ase di diag agnosis, pe pers rsonal alized h heal althcar are r recs.
dri driven c cri rime pr predi diction, f fac acial al r recognition, t terr rrori rist forecasts. sts.
t twe weets d did id Trump wr writ ite, a and nd wh whic ich d did id his is s staff wr writ ite?
5
ctuaries we were a among ng t the f fir irst d data sci cient ntists.” (C (Colin P Prie iest, ac actuary ary turn rned da data scientist a at Data Ro Robot, S Singapo pore.)
aries ar are strongest a at math/stat an and do domai ain k knowledg dge (we s study i insurance, b besi sides m math/ h/sta stat). t).
s scient ntis ists a are strong ngest a at computer s science nce, e especia cially co coding ing, d data m manip nipulation a and nd joinin ining t tables, t theory
machine ine l learning ning ( (training ing v versus t testing ing,
ining), a and machine ine l learning ning a algorit ithms.
arial exam xams n now i include de: Gener erali lized ed linea ear m models ls, K K-nearest st n neighbo bors, s, K K-mea means clusteri ring, B Baye ayes c clas assifier, r, de decision t trees, r ran ando dom f forest, pri principa pal c compo ponent an anal alys ysis. Also a a pr predi dictive an anal al spe pecialty. y.
6
7
RMSE
n test d data = = √[∑ (A (Act ctual – Predict icted)2 / n] n]
8
predict ctiv ive m modeling ing r refers t to estim imating ing i insurance nce c costs, t then ac actuari aries h hav ave b been do doing t this f forever.
p predict ctiv ive m modeling ng i is computatio iona nally i intens nsiv ive,
testi sting a all possibl ble p permuta tati tions
variabl bles, s, t transformati tions, s, etc
2 2 broad ad c categori ries i in da data science ar are pr predi diction an and classifica ication. Classif ifica icatio ion i is p predict cting ng a a category.
Prediction
i invol volve ves t types
regression. . Linear r regression i is bei eing r rep eplaced b by mo more f flex lexible Ge Generalized L Linear M Models els.
ifica icatio ion i includes: Decis cisio ion t trees: underwr writ iting ing Clustering ing: territ itorie ies Pri rincipa pal c compo ponent an anal alys ysis: de detect f frau aud
the f following e exam xamples, as assume n n inde dependent v vari ariab ables an and p da data v val alues.
9
Before classif ifica icatio ion After classif ifica icatio ion
But w within in e each ch n n-dim imensiona nal s slic ice, t there is is s stil ill co cons nsiderable vari ariab ability. y. A compan pany w wan ants t to choose t the b better t than an av averag age c customers within e eac ach c clas ass t to mak ake a a pr profit.
10
itionally we we u used cl classical l line inear r regression, n, a and nd we we t treated
pricing ng b by class a as multip iplica icativ ive:
Base se rate te = = $ $100 Times fa factor
f for
Age i i = 1.5 .50 Times fa factor
f for
Gender j j = 1.2 .20 Times fa factor
f for
Territor
k k = 1.4 .40, ... ... , etc.
di disregar ards ds i interac actions b between c clas asses an and mak akes as assumptions
norm rmal ality an and common v vari arian ances.
co cons nsis ist
wid wider r rang nge
models wit with r response v varia iable as assumed t to be a a member
expo xponential f fam amily. y.
Results i in some f fac actors rs b being r redu duced,
i increas ased. d.
appl applications
GLM LM: Effect ct
telematics ics
claim ims Unde derw rwri riting s score c car ards ds Predict ict c claims l like kely t to settle f far above t their ir i initia ial e estim imate
11
12
quen entially lly s splits d data into c categ egories es h having s simi mila lar v values es f for de depe pendent v vari ariab ables.
Uses sta statistic su such a as Gini In Index to to do sp split.
v vari ariab ables:
years ars r renewed,
pation, pr premium paym payment h history, ry, t telematics ( (spe peed, b brak aking, t time
day, day, etc.)
13
titions d data ta into to c classes ba base sed
ho how c close sely d data ta i is grouped ed. Iter eratively ely u updates es c center ers a and re-parti titi tions. s.
i is n no de depe pende dent v vari ariab able.
nother u use is is cl clustering s sim imilar
ccupations ns.
rida da h has as 2 28 r rating terr rritori ries i in au auto.
Yao, J. (2008). Clustering in ratemaking; applications in territories clustering. Casualty Actuarial Society Predictive Modeling Seminar
14
Redu duces a a lar arge n
vari ariab ables t to a smal aller n no.
mutually lly u uncorrela lated ed v variable les t that p preser erves es as as m much v vari ariab ability as as po possible.
f fraud (s (staged a accid ccident, inf inflated b bil ills, co collusive medical
bo body sho shops) ha hard to to dete tect by by first st-level c l claim m exam xaminers rs.
de depe pendent v vari ariab
Data do doesn’t s say ay which c clai aim i is defi finitely fr fraud ud. . Muc uch fr fraud ud is un undetected.
e e.g .g. s . sus uspicion l level = = {1, 2 2, …, …, 5} f for
eac ach vari ariab able ( (# chiroprac actor v visits, h hi v vol med pr provide der) r).
i is
fr fraud ud sus uspicion
s scor
iteratively w weighting ind indiv iv v varia iables b based
their co cons nsistenc ncy a and nd co correlation to
s scor
Why d do we stu study L Linear Al Algebra? PCA u use ses
thogonal transformati tions, s, e eigenvecto tors. s.
15
numerica ical s score
a person’s n’s c credit itwo worthine iness. Ideally corr rrelates w with c clai aims e expe xperi rience an and pr provide des addi additional al pr predi dictive ab ability b beyo yond t tradi aditional al r rating v vari ariab ables.
rmitted b by FL Statute 6 626.9741.
iticized a as unf nfair ir t to mino inorities a and nd low-income me p people, le, al although an anal alys ysts di dispu pute t the c cri riticism.
bles include d debt/ bt/asse sset r ratio, l late te payment h histo story…
s science nce t tech chniq niques: clusters, t trees, G GLM, M, P PCA, … … ■ Lo Loss r ratio = f (vari ariab ables X X1, … …, Xn)
ften us used i in the d decision
w whether
not
to
insu surance, bu but n not use sed to to dete termine the the p price.
any pu publicly av avai ailable c credi dit ri risk da datab abases f for credi dit car ards ds an and loan ans, e e.g., Kag aggle c compe petition a at Kaggle. ggle.com/c m/c/G /GiveM eMeSo eSomeC meCred edit/ ■ Pr Prob
defa faul ult = = f (va variables X X1, …, …, Xn)
16
Most data is is nu numerical and nd is is ne neatly ca captured in in f fie ields
ree-for
text i is a pot
g gol
m mine
infor
b but ut i it requ equires ef effort t to ex extract go gold ld n nugg ggets.
ellings gs, s synonyms, s stems ems l like “ “ing ing” an and “e “ed” d”, e etc.
for
fr
. wor
g group
wor
appearing t tog
k kinds ds
clai aims ar are
rring? “W “Water” r” m may ay be a a captured ed f field ld, b but “water er & & baseme ement”
“water er & & ceili ling” g” may be m more h helpful in in find indin ing t trends.
w words ds s signal po potential al l lar arge c clai aim am amount?
do do pe peopl ple f feel ab about i insuran ance ads ads? Ide dentify senti timents ts i in custo tomer s surveys & & tweets. ts.
Statute 627.4145 r requir ires i insurance nce p policie cies h have m min 4 45 score
Flesc sch r readabi bility t test. st.
17
E.
.,
. al. . (2016). Pr . Predictive m mod
appl applications i in ac actuari rial al s
New Y York rk: Cambr bridge U Universi sity ty Press. ss.
K K. . (2018). D Data vi visua ualization
. Princeton
NJ.: P Princeto ton U Universi sity ty P Press. ss.
G G., ., e
. al. . (2017). A An introd
uction
t to statis istical l learning ing w with a applic icatio ions ns i in
York rk: S Spri pringer
Zhao ao,
(2013). R an and da data m
an D Diego: Academic P Press. ss.