Yong Cao, Debprakash Patnaik, Sean Ponce, Jeremy Archuleta, Patrick - PowerPoint PPT Presentation

Yong Cao, Debprakash Patnaik, Sean Ponce, Jeremy Archuleta, Patrick Butler, Wu-chun Feng, and Naren Ramakrishnan Virginia Polytechnic Institute and State University

 Reverse-engineer the brain National Academy of Engineering Top 5 Grand Challenges Axon Terminal (transmitter) Neuron Cited from Sciseek.com Axon (wires) Dendrites (receiver) Question: How are the neurons connected? Action Potentials (Spikes) 2

 Reverse-engineer the brain National Academy of Engineering Top 5 Grand Challenges Multi-Electrode Array (MEA) Neurons grown on MEA Chip A B C A B C tim time Spike Train Stream 3

 Reverse-engineer the brain National Academy of Engineering Top 5 Grand Challenges Find Repeating Patterns Infer Network Connectivity 4

 Fast data mining of spike train stream on Graphics Processing Units (GPUs) MEA MEA Chip hip GPU GPU Chip hip Multi-Electrode Array NVIDIA GTX280 (MEA) Graphics Card 5

 Fast data mining of spike train stream on Graphics Processing Units (GPUs)  Two key algorithmic strategies to address scalability problem on GPU  A hybrid mining approach  A two-pass elimination approach 6

 Event stream data: sequence of neurons firing ( ) , E 2 , t 2 ( ) ,..., E n , t n ( ) E 1 , t 1 Event of Type A occurred at t = 6 t = 6 A 1 1 1 Neuron B 1 1 C 1 1 1 D 1 1 1 1 Time Event of Type D occurred at t = 5 t = 5 7

 Pattern or Episode Inter-event constraint  Occurrences (Non-overlapped) 1 1 A 1 1 1 1 Neurons 1 B 1 1 1 1 1 C 1 1 1 1 D 1 1 1 1 1 Time Episode appears twice in the event stream. 8

 Data mining problem: Find all possible episodes / patterns which occur more than X-times in the event sequence.  Challenge:  Combinatorial Explosion: large number of episodes to count Episode …… 1 2 3 4 Size/Length: A A → B A → B → C A → B → C → D B B → A A → C → B A → C → B → D …… A → C B → A → C A → C → D → B B → C → A A → D → B → C …… …… A → D → C → B …… 9

 Mining Algorithm (A level wise procedure to control combinatorial explosion) Generate an initial list of candidate size-1 episodes  Repeat until - no more candidate episodes  Count ount : Occurrences of size-M candidate episodes  Prune: Retain only frequent episodes  Candidate Generation: size-(M+1) candidate episodes  from N-size frequent episodes Output all the frequent episodes  Computational bottleneck 10

 Counting Algorithm (for one episode) Episode : Episode Accept_A pt_A() () Accept_B pt_B() () Accept_C pt_C() () Accept_D pt_D() () A 1 B 4 C 10 D 17 A 2 B 12 C 13 A 5 5 10 A 1 B 4 A 5 C 10 B 12 C 13 D 17 A 2 Event Str Ev nt Stream 11

 Find an efficient counting algorithm on GPU to count the occurrences of N size-M episodes in an event stream.  Address scalability problem on GPU’s massive parallel execution architecture. 12

 One episode per GPU thread (PTPE)  Each thread counts one episode  Simple extension of serial counting GPU MP MP MP N Episodes SP SP SP … N GPU Threads SM SM SM Event Stream Global Memory  Efficient when the number of episode is larger than the number of GPU cores. 13

 Not enough episodes/thread, some GPU cores will be idle.  Solution: Increase the level of parallelism. Multiple Thread per Episode (MTPE) N Episodes N Episodes NM NM GPU N GPU Threads Threads Event Stream M Event Segments 14

 Problem with simple count merge. 15

 Choose the right algorithm with respect to the number of episodes N .  Define a switching threshold - Crossover point (CP) No Yes If N < CP Use Use PTPE MTPE GPU Performance CP = MP × B MP × T B × f ( size ) computing Penalty Factor capacity MP : Number of multi- processors B MP : Block per multi- processor T B : Thread per block 16

 Problem: Original counting algorithm is too complex for a GPU kernel function. Episode : Episode Accept_A pt_A() () Accept_B pt_B() () Accept_C pt_C() () Accept_D pt_D() () A 1 B 4 C 10 D 17 A 2 B 12 C 13 A 5 5 10 A 1 B 4 A 5 C 10 B 12 C 13 D 17 A 2 Event Str Ev nt Stream 17

 Problem: Original counting algorithm is too complex for a GPU kernel function. MP MP MP Accept_A pt_A() () Accept__B pt__B() () Accept_C pt_C() () Accept_D pt_D() () SP SP SP A 1 B 4 C 10 D 17 … A 2 B 12 C 13 SM SM SM A 5 Global Memory  Large shared memory usage  Large register file usage  Large number of branching instructions 18

 Solution: PreElim algorithm  Less constrained counting  Simple kernel function  Upper bound only ( − ,5] ( − ,10] ( − ,5] Episode : A Episode ⎯ B ⎯ C ⎯ D ⎯ ⎯ → ⎯ ⎯ → ⎯ ⎯ → Accept_A pt_A() () Accept_B pt_B() () Accept_C pt_C() () Accept_D pt_D() () A 5 A 2 A 1 B 4 C 10 D 17 C 13 B 12 5 10 A 1 B 4 A 5 C 10 B 12 C 13 A 2 D 17 Ev Event Str nt Stream 19

 A simpler kernel function Shared Memory Register Local Memory PreElim 4 x Episode Size 13 0 Normal Counting 44 x Episode Size 17 80 20

 Solution:  Two-pass elimination approach PASS 1: Less Constrained Counting PASS 2: Normal Counting Fewer Episodes Threads Episodes Threads Event Stream Event Stream 21

 A simpler kernel function Compile Time Difference Shared Memory Register Local Memory PreElim 4 x Episode Size 13 0 Normal Counting 44 x Episode Size 17 80 Run Time Difference Local Memory Load Divergent Branching and Store Two Pass 24,770,310 12,258,590 Hybrid 210,773,785 14,161,399 22

 Hardware  Computer (custom-built)  Intel Core2 Quad @ 2.33GHz  4GB memory  Graphics Card (Nvidia GTX 280 GPU)  240 cores (30 MPs * 8 cores) @ 1.3GHz  1GB global memory  16K shared memory for each MP 23

 Datasets  Synthetic ( Sym26 )  60 seconds with 50,000 events  Real (Culture growing for 5 weeks)  Day 33: 2-1-33 (333478 events)  Day 34: 2-1-34 (406795 events)  Day 35: 2-1-35 (526380 events) 24

 PTPE vs MTPE Crossover points 25

 Performance of the Hybrid Approach PTPE PTPE MTPE MTPE Hybrid 1200 1200 1000 1000 Time (ms) ime (ms) Time (ms) ime (ms) 800 800 Crossover points 600 600 400 400 200 200 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 Episode Size Episode Siz Episode Size Episode Siz e e Episode Number: Sym26 dataset, Support = 100 26

 Crossover Point Estimation a  is a better fit. f ( size ) = size + b  A least square fit is performed. 27

 Two-pass approach vs Hybrid approach 99.9% fewer episodes 28

 Performance of the Two-pass approach One Pass Two Pass Total # First Pass Cull 200K 160K 160K 120K ime (ms) Time (ms) Episode # Episode # 120K 80K 80K 40K 40K 0K 0K 1 2 3 4 5 1 2 3 4 5 One Pass 93.2 1839.8 16139.7 132752.6 7036.6 Total # 64 6210 33623 173408 6288 Two Pass 160.4 1716.6 12602.6 41581.7 1844.6 First Pass Cull 18 2677 21442 169360 6288 Episode Size Episode Siz e Episode Siz Episode Size e 2-1-35 dataset, Support = 3150 29

 Percentage of episodes eliminated by each pass First Pass Second Pass 100% 99% 98% 97% 96% 95% 94% 93% 92% 91% 3000 3050 3100 3150 3200 3250 3300 3350 3400 3450 3500 3550 3600 3650 3700 3750 3800 3850 3900 3950 4000 Suppor Support t 2-1-35 dataset, episode size = 4 30

 GPU vs CPU • GPU is always faster than CPU – 5x - 15x speedup – Fair comparison • Two-pass algorithm used • Maximum threading for both 31

 Massive parallelism is required for conquering near exponential search space  GPU’s far more accessible than high performance clusters  Frequent episode mining – Not data parallel  Redesigned algorithm  Framework for real-time and interactive analysis of spike train experimental data 32

 A fast temporal data mining framework on GPUs  Commoditized system  Massive parallel execution architecture  Two programming strategies  A hybrid approach  Increase level of parallelism (data segmentation + map-reduce)  Two-pass elimination approach  Decrease algorithm complexity (Task decomposition) 33

Questions. 34

ACE A .  Parallel Execution via B . pthreads . C  Optimized for CPU ACDE execution D A  Minimize disk access B E  Cache performance D AEF F E  Implements Two-Pass G F Approach H Z  PreElim – Simpler/ EFG G Quicker state machine … .  Full State Machine – … . Slower but is required to . eliminate all unsupported episodes

 Level-wise  N-size frequent episodes => (N+1)-size candidates 1 A A 1 1 B B + 1 1 C C 1 D D 1 A 1 B 1 C 1 D

Yong Cao, Debprakash Patnaik, Sean Ponce, Jeremy Archuleta, Patrick - PowerPoint PPT Presentation

Yong Cao, Debprakash Patnaik, Sean Ponce, Jeremy Archuleta, Patrick Butler, Wu-chun Feng, and Naren Ramakrishnan Virginia Polytechnic Institute and State University Reverse-engineer the brain National Academy of Engineering Top 5 Grand

Jean Ponce (ponce@di.ens.fr) http://www.di.ens.fr/~ponce Equipe-projet WILLOW ENS/INRIA/CNRS UMR

P AROLE: ICE, CBP , and USCIS Sonali Patnaik Patnaik Law Office sonali@patnaiklaw.com

http://www.di.ens.fr/willow/teaching/recvis11/ Jean Ponce (ponce@di.ens.fr)

Jean Ponce (ponce@di.ens.fr) http://www.di.ens.fr/~ponce Equipe-projet WILLOW ENS/INRIA/CNRS UMR

CASSIE ARCHULETA AIR QUALITY PROGRAM MANAGER CITY OF FORT COLLINS 7/18/2019 Cassie Archuleta

CS 4204 Computer Graphics Computer Animation Computer Animation Yong Cao Yong Cao Virginia

CS 4204 Computer Graphics OpenGL Practice OpenGL Practice Yong Cao Yong Cao Virginia Tech

CS 4204 Computer Graphics Scan Conversion Scan Conversion Yong Cao Yong Cao Virginia Tech

CS 4204 Computer Graphics 2D Transformations 2D Transformations Yong Cao Yong Cao Virginia

29 Si MAS NMR SiO 2 -CaO sol-gel samples: S50 50% SiO 2 , 50% CaO S70 70% SiO 2 , 30% CaO

Exchange Rate Pass-through in India Rudrani Bhattacharya, Ila Patnaik and Ajay Shah National

Assignment 3 Hng-n Cao hong-an.cao@inf.ethz.ch Distributed Systems - HS 2014 Hng-n Cao

Discrete Mathematics Jeremy Siek Spring 2010 Jeremy Siek Discrete Mathematics 1 / 118 Jeremy

INTRODUCTION TO CAMAS CAO PERFORMANCE EVALUATION TOOLKIT Robert Hughes, member of the CAO

Jeremy Larochelle Consulting www.jeremylarochelle.com About Jeremy Larochelle, MBA Jeremy has

Shaders Jeremy Nicholson COMP30019: Graphics and Interaction 05 Sep 2011 Jeremy Nicholson

Hormonal regulation: hypothalamus and anterior pituitary Thyroid gland histology Thyroid

Executable Symbolic Models Of Neural Processes Sriram M Iyengar 1 , Carolyn Talcott 2 , Riccardo

Sensory system Interrelations among the tactile sensations of Touch,

Statistical analysis for the Johnson-Mehl germination-growth model Jesper Mller, Mohammad

Modelling Interacting Networks in the Brain Philippe De Wilde Department of Computer Science

Welcome! COMP 546 Computational Perception Prof: Michael Langer See public web page for this

Purchasing Permanent Slides at Low Cost Gregor Overney Page 2 of 14 Purchasin ing Permanent S

Summary & Questions Weria and Kai I NF3490/ 4490 Exam Format: Written/Digital (see small

Yong Cao, Debprakash Patnaik, Sean Ponce, Jeremy Archuleta, Patrick - PowerPoint PPT Presentation

Yong Cao, Debprakash Patnaik, Sean Ponce, Jeremy Archuleta, Patrick Butler, Wu-chun Feng, and Naren Ramakrishnan Virginia Polytechnic Institute and State University Reverse-engineer the brain National Academy of Engineering Top 5 Grand

Jean Ponce (ponce@di.ens.fr) http://www.di.ens.fr/~ponce Equipe-projet WILLOW ENS/INRIA/CNRS UMR

P AROLE: ICE, CBP , and USCIS Sonali Patnaik Patnaik Law Office sonali@patnaiklaw.com

http://www.di.ens.fr/willow/teaching/recvis11/ Jean Ponce (ponce@di.ens.fr)

Jean Ponce (ponce@di.ens.fr) http://www.di.ens.fr/~ponce Equipe-projet WILLOW ENS/INRIA/CNRS UMR

CASSIE ARCHULETA AIR QUALITY PROGRAM MANAGER CITY OF FORT COLLINS 7/18/2019 Cassie Archuleta

CS 4204 Computer Graphics Computer Animation Computer Animation Yong Cao Yong Cao Virginia

CS 4204 Computer Graphics OpenGL Practice OpenGL Practice Yong Cao Yong Cao Virginia Tech

CS 4204 Computer Graphics Scan Conversion Scan Conversion Yong Cao Yong Cao Virginia Tech

CS 4204 Computer Graphics 2D Transformations 2D Transformations Yong Cao Yong Cao Virginia

29 Si MAS NMR SiO 2 -CaO sol-gel samples: S50 50% SiO 2 , 50% CaO S70 70% SiO 2 , 30% CaO

Exchange Rate Pass-through in India Rudrani Bhattacharya, Ila Patnaik and Ajay Shah National

Assignment 3 Hng-n Cao hong-an.cao@inf.ethz.ch Distributed Systems - HS 2014 Hng-n Cao

Discrete Mathematics Jeremy Siek Spring 2010 Jeremy Siek Discrete Mathematics 1 / 118 Jeremy

INTRODUCTION TO CAMAS CAO PERFORMANCE EVALUATION TOOLKIT Robert Hughes, member of the CAO

Jeremy Larochelle Consulting www.jeremylarochelle.com About Jeremy Larochelle, MBA Jeremy has

Shaders Jeremy Nicholson COMP30019: Graphics and Interaction 05 Sep 2011 Jeremy Nicholson

Hormonal regulation: hypothalamus and anterior pituitary Thyroid gland histology Thyroid

Executable Symbolic Models Of Neural Processes Sriram M Iyengar 1 , Carolyn Talcott 2 , Riccardo

Sensory system Interrelations among the tactile sensations of Touch,

Statistical analysis for the Johnson-Mehl germination-growth model Jesper Mller, Mohammad

Modelling Interacting Networks in the Brain Philippe De Wilde Department of Computer Science

Welcome! COMP 546 Computational Perception Prof: Michael Langer See public web page for this

Purchasing Permanent Slides at Low Cost Gregor Overney Page 2 of 14 Purchasin ing Permanent S

Summary &amp; Questions Weria and Kai I NF3490/ 4490 Exam Format: Written/Digital (see small

Summary & Questions Weria and Kai I NF3490/ 4490 Exam Format: Written/Digital (see small