Not Just a Universal Crutch: Other Useful Things To Do With - PowerPoint PPT Presentation

Not Just a Universal Crutch: Other Useful Things To Do With atomicCAS Mitglied der Helmholtz-Gemeinschaft S6220 - Elmar Westphal - Forschungszentrum Jülich

Overview • atomicCAS and the “Universal Crutch” • Parallel hashing on GPUs using atomicCAS • Example 1: Counting unique keys in a block • Example 2: Group by keys within a warp Mitglied der Helmholtz-Gemeinschaft • Conclusions • Addendum: Sample source codes

      On atomicCAS • From the CUDA C Programming Manual:   “ int atomicCAS(int* address, int compare, int val);   …   reads … old … located … in global or shared memory, computes   (old == compare ? val : old)   Mitglied der Helmholtz-Gemeinschaft and stores the result back to memory at the same address. …   The function returns old (Compare And Swap).”

“The Universal Crutch” __device__ double atomicAdd(double* address, double val) { • According to said guide,   unsigned long long int* address_as_ull = (unsigned long long int*)address; “any atomic operation can   unsigned long long int old = *address_as_ull, assumed; do { be implemented based on   assumed = old; old = atomicCAS(address_as_ull, assumed, atomicCAS()” __double_as_longlong(val + __longlong_as_double(assumed))); } while (assumed != old); return __longlong_as_double(old); • Example: double precision   } atomicAdd Mitglied der Helmholtz-Gemeinschaft

“The Universal Crutch” __device__ double atomicAdd(double* address, double val) { • According to said guide,   unsigned long long int* address_as_ull = (unsigned long long int*)address; “any atomic operation can   unsigned long long int old = *address_as_ull, assumed; do { be implemented based on   assumed = old; old = atomicCAS(address_as_ull, assumed, atomicCAS()” __double_as_longlong(val + __longlong_as_double(assumed))); } while (assumed != old); return __longlong_as_double(old); • Example: double precision   } atomicAdd Mitglied der Helmholtz-Gemeinschaft “any atomic operation”, like that? Great, then we’re done here. Thank you for your time!

Mitglied der Helmholtz-Gemeinschaft There is more! But wait!

Overview • atomicCAS and the “Universal Crutch” • Parallel hashing on GPUs using atomicCAS • Example 1: Counting unique keys in a block • Example 2: Group by keys within a warp Mitglied der Helmholtz-Gemeinschaft • Conclusions • Addendum: Sample source codes

Origin & Motivation • Originally developed as part of building (partial) linked lists in shared memory (see GTC 2012, S2036) • original use became obsolete with Kepler’s faster atomics • general idea became useful again with Maxwell’s native shared Mitglied der Helmholtz-Gemeinschaft atomics

Hashing using atomicCAS • atomicCAS can be used to implement parallel hashing functions • Works very efficiently in shared memory on Maxwell • Building block for several useful counting and grouping operations • Works best at warp- or block-level with N keys << N threads Mitglied der Helmholtz-Gemeinschaft • Hashing function chosen must fit data properties: • Constantly strided keys may lead to repeated collisions

Mitglied der Helmholtz-Gemeinschaft A New Building Block

How Does It Work? • The loop maps the threads’ arbitrary keys to hash indices within the warp or block • The hash index within a scope (warp/block) is then assigned to all threads with the same key • atomicCAS tries to claim the calculated hash index for its thread’s key (“my_key”) • There are three possible outcomes for the return value of atomicCAS: 1. UNCLAIMED: this thread is the first to claim a hash index, success Mitglied der Helmholtz-Gemeinschaft 2. Same key as my_key: hash index claimed by same key from different thread, success 3. Key different from my_key: hash index claimed by different key (hash collision), try again with new hash index

Example, 13 Unique Keys in 16 Threads Slot 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Key U U U U U U U U U U U U U U U U Key 2 3 7 2 8 7 9 7 14 11 15 21 19 20 23 22 Mitglied der Helmholtz-Gemeinschaft Hash 2 3 7 2 8 7 9 7 14 11 15 5 3 4 7 6 Iteration 1 Hash function: (key+5*hash_iteration)%BLOCK_SIZE (simple to show, but ineffective for data with stride of BLOCK_SIZE!) assigned unassigned collision

Example, 13 Unique Keys in 16 Threads Slot Slot 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 Key Key U U U U U 2 19 U 20 U 21 U 22 U U 7 U 8 U 9 U U 11 U U U U U 14 U 15 U Key 2 3 7 2 8 7 9 7 14 11 15 21 19 20 23 22 Mitglied der Helmholtz-Gemeinschaft Hash Hash 2 2 3 3 7 7 2 2 8 8 7 7 9 9 7 7 14 14 11 11 15 15 5 5 3 3 4 4 7 7 6 6 Key from U 19 7 2 U 7 U U U U U U U U 7 U Slot Iteration 1 Hash function: (key+5*hash_iteration)%BLOCK_SIZE (simple to show, but ineffective for data with stride of BLOCK_SIZE!) assigned unassigned collision

Example, 13 Unique Keys in 16 Threads Slot Slot 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 Key Key U U U U U 2 19 U 20 U 21 U 22 U U 7 U 8 U 9 U U 11 U U U U U 14 U 15 U Key 2 3 7 2 8 7 9 7 14 11 15 21 19 20 23 22 Mitglied der Helmholtz-Gemeinschaft Hash 2 3 7 2 8 7 9 7 14 11 15 5 3 4 7 6 Key from U 19 7 2 U 7 U U U U U U U U 7 U Slot Iteration 2 Hash function: (key+5*hash_iteration)%BLOCK_SIZE assigned unassigned collision

Example, 13 Unique Keys in 16 Threads Slot Slot 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 Key Key U U U U U 2 19 U 20 U 21 U 22 U U 7 U 8 U 9 U U 11 U U U U U 14 U 15 U Key 2 3 7 2 8 7 9 7 14 11 15 21 19 20 23 22 Mitglied der Helmholtz-Gemeinschaft Hash Hash 2 2 8 3 7 7 2 2 8 8 7 7 9 9 7 7 14 14 11 11 15 15 5 5 3 3 4 4 12 7 6 6 Key from Key from U - 19 - 7 - 2 - U - 7 - U - U - U - U - U - U - U - U - 7 - U - Slot Slot Iteration 2 Hash function: (key+5*hash_iteration)%BLOCK_SIZE assigned unassigned collision

Example, 13 Unique Keys in 16 Threads Slot Slot Slot 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 Key Key Key U U U U U U U 2 2 19 19 U 20 20 U 21 21 U 22 22 U U 7 7 U 8 8 U 9 9 U U U 11 11 U 23 U U U U U 14 14 U 15 15 U Key 2 3 7 2 8 7 9 7 14 11 15 21 19 20 23 22 Mitglied der Helmholtz-Gemeinschaft Hash Hash Hash 2 2 2 8 3 8 7 7 7 2 2 2 8 8 8 7 7 7 9 9 9 7 7 7 14 14 14 11 11 11 15 15 15 5 5 5 3 3 3 4 4 4 12 12 7 6 6 6 Key from Key from Key from U - - 19 8 - 7 - - 2 - - U - - 7 - - U - - U - - U - - U - - U - - U - - U - - U - - U 7 - U - - Slot Slot Slot Iteration 2 Hash function: (key+5*hash_iteration)%BLOCK_SIZE assigned unassigned collision

Not Just a Universal Crutch: Other Useful Things To Do With - PowerPoint PPT Presentation

Not Just a Universal Crutch: Other Useful Things To Do With atomicCAS Mitglied der Helmholtz-Gemeinschaft S6220 - Elmar Westphal - Forschungszentrum Jlich Overview atomicCAS and the Universal Crutch Parallel hashing on GPUs

mockup review section b Clutch Crutch A better crutch for a better life Presenter Mario Martinez

sketch model review team b Clutch Crutch compact crutch for easy storage Clutch Crutch

A Description: crutch design to improve stability when ascending and descending stairs Customer:

Adding Aerosol Cans to the Universal Waste Regulations Where does Universal Waste fit? HAZARDOUS

UNIVERSAL ROBOTS RUC 2018 Universal Robots - Evolving the future UNIVERSAL ROBOTS SET THE

Tech Day: Universal Acceptance Mark van rek Universal Acceptance Todays Objectives

Universal Credit Universal Credit Universal Credit is for working-age people aged over 18 and

WWW.TOTW.ORG By Kenneth M Hoeck Finally, brethren, whatsoever things are true, whatsoever things

Universal Acceptance Quick Guide What Does Universal Acceptance Mean? ACCEPT Universal

North West Landlords Forum Universal Credit June 2014 Universal Credit Current position

V-PLC9000 Product Series Veesta Universal PLC & Veesta Universal PLC & Universal PLC

Architecture A discussion, not a deck IoT architectures Mesh-of-things--things talk to other

Things you can do Things you can do Things you can do Everything you need to know

HIV- -1 Integrase: 1 Integrase: HIV not just an not just an other HIV enzyme other HIV

KRISTA BOAN WAIT, WHAT JUST HAPPENED? WAIT, WHAT JUST HAPPENED? WAIT, WHAT JUST HAPPENED? WAIT,

Just Culture CAPT JEFF SALVON-HARMAN, MD JUST CULTURE, CERTIFIED QUALITY FOCUS OFFICE OF THE

1 Content 0 Introduction

Oral Communication Skills Kathryn S McKinley, Microsoft Research Why do presenta.on

Before you make a presentation to school leadership Allen Pitts, W1AGP ARRL Media &

1 Steps in Giving Presentation 1. Plan 2. Prepare 3. Practice 4. Present 1. Planning

WHAT WILL WE TALK ABOUT? Why your presentation sucks YOUR Its all about you! PRESENTATION

Persuasive Presentation Skills Top tips for your presenting success! Analysis A presentation is

The Goode Sheep Merinos for the future Henry Goode Property - Barooka Approx 2800 hectares

Why Your Tech Obsessed Conference Presentation Stinksand How to Make It Better | EdSurge News

Not Just a Universal Crutch: Other Useful Things To Do With - PowerPoint PPT Presentation

Not Just a Universal Crutch: Other Useful Things To Do With atomicCAS Mitglied der Helmholtz-Gemeinschaft S6220 - Elmar Westphal - Forschungszentrum Jlich Overview atomicCAS and the Universal Crutch Parallel hashing on GPUs

mockup review section b Clutch Crutch A better crutch for a better life Presenter Mario Martinez

sketch model review team b Clutch Crutch compact crutch for easy storage Clutch Crutch

A Description: crutch design to improve stability when ascending and descending stairs Customer:

Adding Aerosol Cans to the Universal Waste Regulations Where does Universal Waste fit? HAZARDOUS

UNIVERSAL ROBOTS RUC 2018 Universal Robots - Evolving the future UNIVERSAL ROBOTS SET THE

Tech Day: Universal Acceptance Mark van rek Universal Acceptance Todays Objectives

Universal Credit Universal Credit Universal Credit is for working-age people aged over 18 and

WWW.TOTW.ORG By Kenneth M Hoeck Finally, brethren, whatsoever things are true, whatsoever things

Universal Acceptance Quick Guide What Does Universal Acceptance Mean? ACCEPT Universal

North West Landlords Forum Universal Credit June 2014 Universal Credit Current position

V-PLC9000 Product Series Veesta Universal PLC &amp; Veesta Universal PLC &amp; Universal PLC

Architecture A discussion, not a deck IoT architectures Mesh-of-things--things talk to other

Things you can do Things you can do Things you can do Everything you need to know

HIV- -1 Integrase: 1 Integrase: HIV not just an not just an other HIV enzyme other HIV

KRISTA BOAN WAIT, WHAT JUST HAPPENED? WAIT, WHAT JUST HAPPENED? WAIT, WHAT JUST HAPPENED? WAIT,

Just Culture CAPT JEFF SALVON-HARMAN, MD JUST CULTURE, CERTIFIED QUALITY FOCUS OFFICE OF THE

1 Content 0 Introduction

Oral Communication Skills Kathryn S McKinley, Microsoft Research Why do presenta.on

Before you make a presentation to school leadership Allen Pitts, W1AGP ARRL Media &amp;

1 Steps in Giving Presentation 1. Plan 2. Prepare 3. Practice 4. Present 1. Planning

WHAT WILL WE TALK ABOUT? Why your presentation sucks YOUR Its all about you! PRESENTATION

Persuasive Presentation Skills Top tips for your presenting success! Analysis A presentation is

The Goode Sheep Merinos for the future Henry Goode Property - Barooka Approx 2800 hectares

Why Your Tech Obsessed Conference Presentation Stinksand How to Make It Better | EdSurge News

V-PLC9000 Product Series Veesta Universal PLC & Veesta Universal PLC & Universal PLC

Before you make a presentation to school leadership Allen Pitts, W1AGP ARRL Media &