Using NVIDIA CUDF to Simplify and Accelerate Data Prep for Credit - - PowerPoint PPT Presentation

using nvidia cudf to simplify and accelerate data prep
SMART_READER_LITE
LIVE PREVIEW

Using NVIDIA CUDF to Simplify and Accelerate Data Prep for Credit - - PowerPoint PPT Presentation

Using NVIDIA CUDF to Simplify and Accelerate Data Prep for Credit Card Algo. Prediction March 19, 2019 Richard Liu Vice President Agenda Macro economics trends Behavioral surplus Paradigm shift Deep dive to the data How


slide-1
SLIDE 1

Using NVIDIA CUDF to Simplify and Accelerate Data Prep for Credit Card Algo. Prediction

March 19, 2019 Richard Liu Vice President

slide-2
SLIDE 2

Agenda

  • Macro economics trends
  • Behavioral surplus
  • Paradigm shift
  • Deep dive to the data
  • How RAPIDS/cuDF helps

2

slide-3
SLIDE 3

Perspective on the challenges

Business case: Credit card business now faces the challenges on risk management and more importantly on payment or transaction behavior. The conventional balance sheet data approaches can hardly afford such new requirement.

0.00% 0.50% 1.00% 1.50% 2.00% 2.50% 3.00% 3.50% 4.00% 100,000 200,000 300,000 400,000 500,000 600,000 700,000

800,000 900,000 1,000,000

1984Q1 1985Q1 1986Q1 1987Q1 1988Q1 1989Q1 1990Q1 1991Q1 1992Q1 1993Q1 1994Q1 1995Q1 1996Q1 1997Q1 1998Q1 1999Q1 2000Q1 2001Q1 2002Q1 2003Q1 2004Q1 2005Q1 2006Q1 2007Q1 2008Q1 2009Q1 2010Q1 2011Q1 2012Q1 2013Q1 2014Q1 2015Q1 2016Q1 2017Q1 2018Q1

$ % Rate Amounts in $ Million

U.S. Credit Cards

Total outstanding Noncurrent rate

Source from FDIC: Loan Performance (as of 2018/Q4)

slide-4
SLIDE 4

Trade secret on behavioral surplus

Pool level thinking Book keeping Action to behavior to data to prediction Surveillance capitalism

Traditional Digital Age

slide-5
SLIDE 5

Paradigm Shift But … How to walk the talk?

slide-6
SLIDE 6

Now We Ha Have New Way T y To Look At Dat ata

Examples (simulated data for illustration purpose) Customer ID: cust_id Merchant category code: mcc Transaction date: trans_date Dollar amount: trans_amt Array objects after pivoting process: [array of mcc], [array of trans_date], [array of trans_amt] Neuroscience observation on customer behavior (Visualization)

slide-7
SLIDE 7

Why y RAPID IDS S cu cuDF

  • The efficient way to deal with very sparse

data against computation

  • Performance with ease of programming

(Python Pandas like)

  • Much better return on GPU solution

investment Functional language: increment :: [int] -> [int] increment = map (1+) The advantage of modern computation:

Prog

  • gress

ress so far has large gely ly been n towar ard d demonstratin

  • nstrating

g general neral approac

  • aches

hes for building lding narrow

  • w systems

stems rather er than n general neral approac

  • aches

hes for building lding genera neral l systems.

  • stems. Progre

gress ss toward rd the forme mer r does s not entail ail substanti stantial l progre gress ss towar ard d the latter. ter.

AlphaGo and AI Progress. Retrieved October 24, 2017, from http://www.milesbrundage.com/blog-posts/alphago-and-ai-progress.

Our expectation:

slide-8
SLIDE 8

How RAPIDs Helps On Transaction Over Time Horizon

Easier yet efficient way to resolve the chronic “horizon stacking” data

SELECT COUNT(), SUM(), STD(), PARTITION BY () # Window function FROM … LEFT JOIN … GROUP BY … PATTERN = HORIZON(); (0 until array.length) .map(I = PATTERN .addData( attributes(i),array(i))) Conventional Distributed over GPU cores Data object Smart distributed computation by RAPIDS

Time interval = 1…n

slide-9
SLIDE 9

Challenges from DL computation

With conventional table way, how to find a departure from the prevailing deep learning zeitgeist that prizes learning from scratch, tabula rasa.

Hebbian learning like representation Table with system records

SDR

slide-10
SLIDE 10

CuDF with Better format for Deep Learning Like Computation

function feature_map(hierarchical, data[1.. T ],C )

levels[1..L ] <- hierarchical.levels

for l <- 1 to L do

regions <- levels[l ].regions

for all r in regions do Until spatialPooling converged for r for t <-1 to T do spatialPooling(r, data[t]) end for end Until for c <- 1 to C do for t <- 1 to T do spatialPooling(r,data[t]) Sparse_Data_Representation <- pivoted_array Time_Horizon_Pooling(r , Sparse_Data_Representation) end for end for end for end for end function

Inspiration from Recursive Cortical Network, Hierarchical Temporal Memory

* Dileep George et al. Science 2017;358:eaag2612 (published by AAAS)

slide-11
SLIDE 11

How Much RAPIDs Helps

  • Speed, speed, speed! Things you should know by

yesterday.

  • More time to think (smart machine for smart people).

– Feature engineering (more and accurate) – Computational significance (less data yet robust to noise)

Dileep George et al. Science 2017;358:eaag2612 (published by AAAS) Github scripts: https://github.com/vicariousinc/science_rcn

slide-12
SLIDE 12

Thank you