Maximizing Gain Full Feature Space Representation While Upgrading - - PowerPoint PPT Presentation

maximizing gain
SMART_READER_LITE
LIVE PREVIEW

Maximizing Gain Full Feature Space Representation While Upgrading - - PowerPoint PPT Presentation

Minimizing Risk While Maximizing Gain Full Feature Space Representation While Upgrading Minimal Subset of PCs Tom Drabas Senior Data Scientist the pr probl blem em highly diverse ecosyst osystem em circle of upd pdat ates es data


slide-1
SLIDE 1

Minimizing Risk While Maximizing Gain

Full Feature Space Representation While Upgrading Minimal Subset of PCs

Tom Drabas Senior Data Scientist

slide-2
SLIDE 2

the

pr probl blem em

slide-3
SLIDE 3

highly diverse

ecosyst

  • system

em

slide-4
SLIDE 4

circle of

upd pdat ates es

slide-5
SLIDE 5

data is

bi biased ased

slide-6
SLIDE 6

selection

bias as

confirmation

bias as

gender

bias as…

slide-7
SLIDE 7

asking for

trouble uble

slide-8
SLIDE 8

a machine learning model

learn rns from rom the e data

slide-9
SLIDE 9

we don’t know what

we don’t know

“ ”

slide-10
SLIDE 10

the

solution lution

slide-11
SLIDE 11

full

view ew

slide-12
SLIDE 12

minimize

ri risk sk

slide-13
SLIDE 13

be

sele lectiv ctive

slide-14
SLIDE 14
slide-15
SLIDE 15

this problem is

ha hard

Solvable Not solvable Number of records Optimal
slide-16
SLIDE 16

naïve ~O(n3) work rk efficient ~O(n2)

slide-17
SLIDE 17

restate my

assump sumptions tions

https://aka.ms/pi_movie
slide-18
SLIDE 18

find a minimal subset of transactions that covers the universe of all values

minimize ze the e cost

  • f covering the universe of all values
slide-19
SLIDE 19

set paral allel ~O(n logn)

slide-20
SLIDE 20

1. 1. Calcula late cost 2. 2. Sort in ascendin ding g order

slide-21
SLIDE 21
slide-22
SLIDE 22

8 5 3 5 3 6 2 2

𝑑𝑗 = 1 𝑜 ෍

𝑘

ln 𝑔

𝑘

cost

= avera erage e of

  • f log of
  • f

frequ quen encies es of

  • f

individu dual compon

  • nen

ents

slide-23
SLIDE 23

1.77 1.64 1.35 1.64 1.77 1.23 1.50 1.64

slide-24
SLIDE 24

final

  • rder

rder

Increasing cost
slide-25
SLIDE 25 import cudf import pandas as pd import numpy as np def calc_log(count_id): return np.log(float(count_id)) gdf = cudf.read_csv( '../data/exploded.csv’ , delimiter=‘,’ , names=['id', 'feature’] , skiprows=1 ) freq_items = gdf.groupby('feature').agg('count') freq_items['ln_freq'] = gdf['count_id'].applymap(calc_log) gdf = gdf.set_index('feature’) freq_items = freq_items.set_index('feature’) gdf = gdf.join(freq_items, how='left’) gdf = gdf.groupby('id').agg(['mean']) gdf = gdf.sort_values(by='mean_ln_freq')

RAPIDS

data a fram amew ework

  • rk
slide-26
SLIDE 26

3. 3. Run Set Prefix x Scan on GPU

Based on https://aka.ms/mharris_pps
slide-27
SLIDE 27

Prefix Set Scan

up the e tree ree

Set Union
slide-28
SLIDE 28 __global__ void gpu_prefix_set_scan_full_kernel( const uint32_t* input , uint32_t* output , uint32_t curr_val_size , uint32_t rec_cnt ) { extern __shared__ uint32_t temp[]; int thid = blockIdx.x * blockDim.x + threadIdx.x; int offset = 1; // STORE IN TEMP ... // SCAN UP THE TREE int n = rec_cnt; for(int d = n >> 1; d > 0; d >>= 1) { __syncthreads(); if(thid < d) { int ai = offset * (2 * thid + 1) - 1; int bi = offset * (2 * thid + 2) - 1; set_union_device(ai, bi, temp, curr_val_size, rec_cnt); }
  • ffset *= 2;
}

Prefix Set Scan

up the e tree ree

slide-29
SLIDE 29 (1) Set Int ntersect ct (2) Set Differenc nce

Prefix Set Scan

down the tre ree

slide-30
SLIDE 30 ... for(int d = 1; d < n; d <<= 1) {
  • ffset >>= 1;
__syncthreads(); if(thid < d) { int ai = offset * (2 * thid + 1) - 1; int bi = offset * (2 * thid + 2) - 1; set_intersect_device(bi, ai, temp, curr_val_size, rec_cnt); set_difference_device(ai, bi, temp, curr_val_size, rec_cnt); } } }

Prefix Set Scan

down the tre ree

slide-31
SLIDE 31

the

be benefi nefits ts

slide-32
SLIDE 32 time (minutes) speedup (naïve) speedup (work efficient) 54.1 18.1 2.98x 0.43 (~26s) 125.8x 42.1x 1M 1M records 100k feature values NVIDIA RTX 2080Ti, i5 2.4GHz, 64GB RAM, NVMe naïve work efficient set parallel
slide-33
SLIDE 33

keeping

tra rack ck

slide-34
SLIDE 34

account for

ever verything ything

slide-35
SLIDE 35
slide-36
SLIDE 36