The Duck Test: Leveraging Machine Learning to Remediate Fraud in - - PowerPoint PPT Presentation

the duck test leveraging machine learning to remediate
SMART_READER_LITE
LIVE PREVIEW

The Duck Test: Leveraging Machine Learning to Remediate Fraud in - - PowerPoint PPT Presentation

1 The Duck Test: Leveraging Machine Learning to Remediate Fraud in Huge Datasets Matthew Harper Director Cyber Crime Prevention 2 Who is Aflac? Supplemental insurance Significant presence in Japan (2/3 of corporate revenue) My


slide-1
SLIDE 1

1

The Duck Test: Leveraging Machine Learning to Remediate Fraud in Huge Datasets

Matthew Harper – Director Cyber Crime Prevention

slide-2
SLIDE 2

2

Who is Aflac?

  • Supplemental insurance
  • Significant presence in Japan (2/3 of

corporate revenue)

  • My Special Aflac Duck – Robotic duck for

child cancer victims

slide-3
SLIDE 3

3

We do what the duck says… “Aflac” When you get hurt or sick, Aflac pays you cash in usually a day

slide-4
SLIDE 4

4

Spoiler Alert -> Machine Learning is not black/white but boy are we trying

slide-5
SLIDE 5

5

Business Problem to solve Data Collection Data Preparation and Exploration Model training and evaluation Implement Results

Steps of Machine Learning

slide-6
SLIDE 6

6

Business Problem -> Why did Cyber Crime Prevention Start

In late 2016 Aflac was the victim of a Account Take Over (ATO) against policy holders

  • The financial industry has been seeing this for over a

decade, emergence against Aflac due to our direct policy holder payment model

  • Aflac US was not built to detect this type of fraud or

attack

  • Limited identity validation at claim or time of

enrollment for new policy holders (workplace based enrollment model) Mature Cyber Security program in place

  • Security controls designed with the presumption of

hackers going after core Aflac data (customer master files, etc…)

  • New control infrastructure needed

Aflac Core Technology

Traditional internal Security & Network controls (access management, firewall, IDS, etc…)

Channels Channels Channels Policy Holders Policy Holders

Hackers

Blocked Easier Target

Associate Access

slide-7
SLIDE 7

7

Data Collection

Sell

Enroll Service Claim

Bill

IDV Service – All Identity Validation Calls Agent/Call Center Enrollment (~10%) D2C Enrollment MyAflac – Client online activity Client Central – Contact center activity, notes, phone number IVR – Voice response system Claims – Claims Processing System Online Risk Engine -> Device Tagging OTP Client Information File – Information on clients: names, address, etc… NoCheck – Claim payment data, bank account Supporting Legacy Security/IT Feeds in Splunk (US & Japan)

1.5% of daily Splunk volume

98.5% of daily Splunk volume All items in green custom integrated into identity validation service and in partnership with third party

slide-8
SLIDE 8

8

One cool thing -> Built and logging our own middleware

login.aflac.com Enrollment System Office 365 Identity Validation Service Identity Validation Vendor - Configurable risk engine and scoring

  • Online Device Analytics – Device Data
  • Identity Validation
  • KBA/OTP

Splunk – Logs all IDV and MFA Solution calls via JSON Sales System Field Force Services MyAflac

(Policy Holder Services)

Agent & Remote Employee Policy Holder

MFA SOLUTION

(Authentication & Multi-factor)

Separate policy holder authentication solution

Ability to provide real-time risk insight to all policy holder and associate authentication

Key Insight

slide-9
SLIDE 9

9

Data Exploration

So much spaghetti throwing…

  • Starting from zero…nothing to lose
  • Alerting -> Known Use Cases
  • Investigation -> New Dashboards
  • Operational Monitoring -> who works

alerts?

  • Drive real-time risk based controls

Key insight -> Don’t jump straight into machine learning, take time to develop faster returns with the data and understand it; Stacked use cases, hotlist alerts, research dashboards, etc…

slide-10
SLIDE 10

10

Level 1: Real-time Control Monitoring; Call Center OTP …Level 2: Alerts & Hot List Tracking …Level 3: Investigation & Link Analysis …Level 4: Associate MFA, tune controls

Maturity Levels

slide-11
SLIDE 11

11

DEFCON 5

slide-12
SLIDE 12

12

Data Preparation - Risk Index Risk Summary Index

IVR/Contac t Center Online Activity Client Information Account Updates Claims

slide-13
SLIDE 13

13 Risk Rules Data Indexes:

  • Phs
  • Client central
  • Nocheck
  • Idv
  • Ivr
  • claim

Risk Index: _time risk_object_type risk_object risk_score risk_source_rule guid cif_number policy_number Use Cases Risk Index: _time risk_object_type: alert risk_object: cif_number risk_score: 10*(number of use cases) risk_source_rule: Suspect_Policyholder alert_type: Use case list guid cif_number policy_number Suspect Policyholder Alert Manager: CIF_Customer_ID

Customer_PH_GUID CIF_Name risk_source_rules alert_type IP Application_ID Customer_Email_Address CIF_State DeviceID_Print Account_Number Hashed_Password Username alert_history Fraudnet Rules Fired CIF_Phone Phone Cleared Cleared Referred Referred Referred ATO

Suspect ATO: Through DB:

  • Remarks Added
  • Reset Password

SIU Privacy Abuse Cleared F/P Reviewed by Trust – ATO First Pass Second Pass

create manual

Auto-decision Suspect ATO

Suspect

Suspect Hotlist Suspect Monitoring Hotlist:

CIF_Customer_ID Policy_Number Customer_PH_Guid Incident_id Comment _time Alert

Suspect Claim Alert Results

risk_source_rule=“Alert_Results”

Assign Alert History:

  • Alert
  • Title
  • Date
  • incident_id
  • status
  • Owner
  • comments
  • CIF_Customer_ID
  • Customer_Email_Address
  • Customer_PH_GUID
  • IP
  • Policy_Number
  • Account_Number
  • Routing_Number

Alert history

Assign Shared Indicator Metric DBs/Reporting FraudNet

Generate FN Feedback File

slide-14
SLIDE 14

14 Policyholder Identifier Online Login Claim Filed Address Update

26

  • the

r feat ures …

Calls

1 4 8 2 7 9 1 1 3 5 5 4 5 2 1 1 6 14 1

Data Preparation – Feature Creation

Features:

∙ Developed over 30 key cross-channel activities to analyze ∙ Leveraged online risk scoring platform

Modeling:

∙ Challenges: Labeled dataset not available, supervised learning not an option ∙ First Attempt: Assign each feature a weight manually based on how “risky” the event is

  • Problem: High activity = high score, even if the activity is

“normal” ∙ Second Attempt: Use K-Means clustering to find outliers based on features

slide-15
SLIDE 15

15

Home Grown Cross-Channel User Behavior Analytics

slide-16
SLIDE 16

16

Numeric Clustering with MLTK

3D Scatterplot of Fields vs. Cluster Policyholder Count by Cluster

slide-17
SLIDE 17

17

Lessons Learned

slide-18
SLIDE 18

Thank You