1
The Duck Test: Leveraging Machine Learning to Remediate Fraud in Huge Datasets
Matthew Harper – Director Cyber Crime Prevention
The Duck Test: Leveraging Machine Learning to Remediate Fraud in - - PowerPoint PPT Presentation
1 The Duck Test: Leveraging Machine Learning to Remediate Fraud in Huge Datasets Matthew Harper Director Cyber Crime Prevention 2 Who is Aflac? Supplemental insurance Significant presence in Japan (2/3 of corporate revenue) My
1
Matthew Harper – Director Cyber Crime Prevention
2
3
4
5
Business Problem to solve Data Collection Data Preparation and Exploration Model training and evaluation Implement Results
6
In late 2016 Aflac was the victim of a Account Take Over (ATO) against policy holders
decade, emergence against Aflac due to our direct policy holder payment model
attack
enrollment for new policy holders (workplace based enrollment model) Mature Cyber Security program in place
hackers going after core Aflac data (customer master files, etc…)
Aflac Core Technology
Traditional internal Security & Network controls (access management, firewall, IDS, etc…)
Channels Channels Channels Policy Holders Policy Holders
Hackers
Associate Access
7
Sell
Enroll Service Claim
Bill
IDV Service – All Identity Validation Calls Agent/Call Center Enrollment (~10%) D2C Enrollment MyAflac – Client online activity Client Central – Contact center activity, notes, phone number IVR – Voice response system Claims – Claims Processing System Online Risk Engine -> Device Tagging OTP Client Information File – Information on clients: names, address, etc… NoCheck – Claim payment data, bank account Supporting Legacy Security/IT Feeds in Splunk (US & Japan)
98.5% of daily Splunk volume All items in green custom integrated into identity validation service and in partnership with third party
8
login.aflac.com Enrollment System Office 365 Identity Validation Service Identity Validation Vendor - Configurable risk engine and scoring
Splunk – Logs all IDV and MFA Solution calls via JSON Sales System Field Force Services MyAflac
(Policy Holder Services)
Agent & Remote Employee Policy Holder
MFA SOLUTION
(Authentication & Multi-factor)
Separate policy holder authentication solution
Key Insight
9
Key insight -> Don’t jump straight into machine learning, take time to develop faster returns with the data and understand it; Stacked use cases, hotlist alerts, research dashboards, etc…
10
Level 1: Real-time Control Monitoring; Call Center OTP …Level 2: Alerts & Hot List Tracking …Level 3: Investigation & Link Analysis …Level 4: Associate MFA, tune controls
11
12
13 Risk Rules Data Indexes:
Risk Index: _time risk_object_type risk_object risk_score risk_source_rule guid cif_number policy_number Use Cases Risk Index: _time risk_object_type: alert risk_object: cif_number risk_score: 10*(number of use cases) risk_source_rule: Suspect_Policyholder alert_type: Use case list guid cif_number policy_number Suspect Policyholder Alert Manager: CIF_Customer_ID
Customer_PH_GUID CIF_Name risk_source_rules alert_type IP Application_ID Customer_Email_Address CIF_State DeviceID_Print Account_Number Hashed_Password Username alert_history Fraudnet Rules Fired CIF_Phone Phone Cleared Cleared Referred Referred Referred ATO
Suspect ATO: Through DB:
SIU Privacy Abuse Cleared F/P Reviewed by Trust – ATO First Pass Second Pass
create manual
Auto-decision Suspect ATO
Suspect
Suspect Hotlist Suspect Monitoring Hotlist:
CIF_Customer_ID Policy_Number Customer_PH_Guid Incident_id Comment _time Alert
Suspect Claim Alert Results
risk_source_rule=“Alert_Results”
Assign Alert History:
Alert history
Assign Shared Indicator Metric DBs/Reporting FraudNet
Generate FN Feedback File
14 Policyholder Identifier Online Login Claim Filed Address Update
Calls
1 4 8 2 7 9 1 1 3 5 5 4 5 2 1 1 6 14 1
∙ Developed over 30 key cross-channel activities to analyze ∙ Leveraged online risk scoring platform
∙ Challenges: Labeled dataset not available, supervised learning not an option ∙ First Attempt: Assign each feature a weight manually based on how “risky” the event is
“normal” ∙ Second Attempt: Use K-Means clustering to find outliers based on features
15
16
3D Scatterplot of Fields vs. Cluster Policyholder Count by Cluster
17