Using NVIDIA CUDF to Simplify and Accelerate Data Prep for Credit Card Algo. Prediction
March 19, 2019 Richard Liu Vice President
Using NVIDIA CUDF to Simplify and Accelerate Data Prep for Credit - - PowerPoint PPT Presentation
Using NVIDIA CUDF to Simplify and Accelerate Data Prep for Credit Card Algo. Prediction March 19, 2019 Richard Liu Vice President Agenda Macro economics trends Behavioral surplus Paradigm shift Deep dive to the data How
March 19, 2019 Richard Liu Vice President
2
Business case: Credit card business now faces the challenges on risk management and more importantly on payment or transaction behavior. The conventional balance sheet data approaches can hardly afford such new requirement.
0.00% 0.50% 1.00% 1.50% 2.00% 2.50% 3.00% 3.50% 4.00% 100,000 200,000 300,000 400,000 500,000 600,000 700,000
800,000 900,000 1,000,000
1984Q1 1985Q1 1986Q1 1987Q1 1988Q1 1989Q1 1990Q1 1991Q1 1992Q1 1993Q1 1994Q1 1995Q1 1996Q1 1997Q1 1998Q1 1999Q1 2000Q1 2001Q1 2002Q1 2003Q1 2004Q1 2005Q1 2006Q1 2007Q1 2008Q1 2009Q1 2010Q1 2011Q1 2012Q1 2013Q1 2014Q1 2015Q1 2016Q1 2017Q1 2018Q1
$ % Rate Amounts in $ Million
Total outstanding Noncurrent rate
Source from FDIC: Loan Performance (as of 2018/Q4)
Pool level thinking Book keeping Action to behavior to data to prediction Surveillance capitalism
Examples (simulated data for illustration purpose) Customer ID: cust_id Merchant category code: mcc Transaction date: trans_date Dollar amount: trans_amt Array objects after pivoting process: [array of mcc], [array of trans_date], [array of trans_amt] Neuroscience observation on customer behavior (Visualization)
data against computation
(Python Pandas like)
investment Functional language: increment :: [int] -> [int] increment = map (1+) The advantage of modern computation:
AlphaGo and AI Progress. Retrieved October 24, 2017, from http://www.milesbrundage.com/blog-posts/alphago-and-ai-progress.
Our expectation:
Easier yet efficient way to resolve the chronic “horizon stacking” data
SELECT COUNT(), SUM(), STD(), PARTITION BY () # Window function FROM … LEFT JOIN … GROUP BY … PATTERN = HORIZON(); (0 until array.length) .map(I = PATTERN .addData( attributes(i),array(i))) Conventional Distributed over GPU cores Data object Smart distributed computation by RAPIDS
Time interval = 1…n
With conventional table way, how to find a departure from the prevailing deep learning zeitgeist that prizes learning from scratch, tabula rasa.
Hebbian learning like representation Table with system records
SDR
function feature_map(hierarchical, data[1.. T ],C )
levels[1..L ] <- hierarchical.levels
for l <- 1 to L do
regions <- levels[l ].regions
for all r in regions do Until spatialPooling converged for r for t <-1 to T do spatialPooling(r, data[t]) end for end Until for c <- 1 to C do for t <- 1 to T do spatialPooling(r,data[t]) Sparse_Data_Representation <- pivoted_array Time_Horizon_Pooling(r , Sparse_Data_Representation) end for end for end for end for end function
Inspiration from Recursive Cortical Network, Hierarchical Temporal Memory
* Dileep George et al. Science 2017;358:eaag2612 (published by AAAS)
yesterday.
– Feature engineering (more and accurate) – Computational significance (less data yet robust to noise)
Dileep George et al. Science 2017;358:eaag2612 (published by AAAS) Github scripts: https://github.com/vicariousinc/science_rcn