Parallel Programming and Heterogeneous Computing Feedback Assignment - - PowerPoint PPT Presentation

parallel programming and heterogeneous computing feedback
SMART_READER_LITE
LIVE PREVIEW

Parallel Programming and Heterogeneous Computing Feedback Assignment - - PowerPoint PPT Presentation

Parallel Programming and Heterogeneous Computing Feedback Assignment 2 Max Plauth, Sven Khler , Felix Eberhardt, Lukas Wenzel, and Andreas Polze Operating Systems and Middleware Group Assignment 1: Covered Topics General Concepts:


slide-1
SLIDE 1

Parallel Programming and Heterogeneous Computing Feedback Assignment 2

Max Plauth, Sven Köhler, Felix Eberhardt, Lukas Wenzel, and Andreas Polze Operating Systems and Middleware Group

slide-2
SLIDE 2

General Concepts:

Foster’s Method

Amdahl’s Law

Shared Memory Parallelism with OpenMP:

Task 2.1: Heat Map

Task 2.2: IO-bound problem and reentrancy of legacy functions

Task 2.3: Task-Parallel workloads

Task 2.4: Java Monitors

Hardware Effects:

Efficient use of caching

Assignment 1: Covered Topics

ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 2

slide-3
SLIDE 3

ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 3

./heatmap

Parsum

1

slide-4
SLIDE 4

Good Idea or Bad Idea?

Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 4

#ifdef WITH_OMP #pragma omp parallel for default(none) shared(heatmaps) #endif for (auto row = 1; row < height - 1; ++row) { for (auto col = 1; col < width - 1; ++col) { /* ... */ } }

K No need to mask omp-pragmas (unless you have functions). Just don’t include –fopenmp in CFLAGS.

slide-5
SLIDE 5

Heat Map: And the winner was (A1) …

ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 5

0,00* 12,18 12,41 28,27 30,54 3300** 20 40 60 80 100 120 submission16003 submission16005 submission16002 submission15983 submission16006 submission16022

seconds

./heatmap 1000 1000 1000 random.csv (4 runs)

slide-6
SLIDE 6

Heat Map: And the winner is …

ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 6

0,484 0,877 2,432 4,245 12,259 2 4 6 8 10 12 14 submission16245 submission16417 submission16405 submission16429 submission16427

seconds

./heatmap 1000 1000 1000 random.csv

slide-7
SLIDE 7

ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 7

./decrypt

decrypt

2

user266;Osten3 user906;Bahnhof

slide-8
SLIDE 8

Your Verification Data

Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 8

barbera:Gozsjkgq.2N62 gene:SqJwiPjc8z9OQ grace:L3xIP64G5RVk6 ian:MyIR7zQEkP3Mg sheelagh:7CYgbT6A0xsM6 richard:oGlayhJ1bTXuE margarete:qRbG.QWxv9c.6 elon:UFy0LW2XSNPVo satoshi:Hqw9N3HL38lAw SubstitionItIs speedup0 NotSoLowLevelNoMore partitioning FishEyes ArrogantHippy GuideMeToTheMoon FlyVeryHigh BurnAllYourPower Dictionary: The 42 most common terms from Unit A Passwords:

slide-9
SLIDE 9

Good Idea or Bad Idea?

Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 9

#pragma omp parallel for shared(result1, result2) for (int i = 0; i < tasks.size(); i++) { /* ... */ if (result1.found && result2.found) continue; for (int j = 0; j < dictPasswords.size(); j++) { auto password = dictPasswords[j]; struct crypt_data data; data.initialized = 0; /* ... */ } }

L Only two results, not synchronization on vars L Wide jumps through dict-data (dict >> tasks) for locality swap loops

slide-10
SLIDE 10

Good Idea or Bad Idea?

Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 10

struct crypt_data data; /* ... */ data.initialized = 0; { if (strcmp(crypt_r((password + "0").c_str(), salt, &data), hash) == 0) { /* ... */ break; } if (strcmp(crypt_r((password + "1").c_str(), salt, &data), hash) == 0) { /* ... */ break; } /* ... */ }

L Loop unrolling only helps with tight loops L Potential overhead for string buffer allocation+free

slide-11
SLIDE 11

Good Idea or Bad Idea?

Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 11

#pragma omp parallel shared(db,dict) { #pragma omp master { uint64_t last = 0; for (uint64_t i = 0; i < db_size; i++) { /* iterate through entire db character by character */ if (db[i] == '\n’) { /* if we are at a newline */ db[i] = '\0'; /* 0-terminate user entry */ #pragma omp task crack_user(db+last); last = i+1; } } } #pragma omp taskwait }

J Start tasks while parsing input

slide-12
SLIDE 12

Good Idea or Bad Idea?

Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 12

while(dictFile >> word) { if (common_8_prefix(word, previousWord)) { continue; } previousWord = word; words.emplace_back(word); }

J Smart reduction of problem size crypt(3) only operates on first 8 chars of input

slide-13
SLIDE 13

decrypt: And the winner is …

ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 13

51,305 62,151 274,29 305,585 662,597 1570,16 3210,346 500 1000 1500 2000 2500 3000 3500 s u b m i s s i

  • n

1 6 4 1 s u b m i s s i

  • n

1 6 4 8 s u b m i s s i

  • n

1 6 4 1 3 s u b m i s s i

  • n

1 6 3 8 8 s u b m i s s i

  • n

1 6 4 2 8 s u b m i s s i

  • n

1 6 4 9 s u b m i s s i

  • n

1 6 4 4

seconds

./decrypt taskCryptPw.txt taskCryptDict.txt

slide-14
SLIDE 14

ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 14

./hoi

Hash Ordered Index

3

slide-15
SLIDE 15
  • Provide own implementation
  • Use OpenSSL
  • Use Glibc

How to MD5?

Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 15

slide-16
SLIDE 16

Good Idea or Bad Idea?

Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 16

#pragma omp parallel shared(hashes) firstprivate(seed) { const uint_fast32_t thread = omp_get_thread_num(); const uint_fast32_t threads = omp_get_num_threads(); const uint_fast32_t from = thread*(blocks/threads); const uint_fast32_t to = (thread != threads-1) ? (thread+1)*(blocks/threads) : blocks; add(seed, from); for (unsigned int i = from; i < to; i++) { __uint128_t v = md5(seed); #pragma omp critical hashes.push_back(v); inc(seed); } }

K Manual scheduling against paradigm use scheduling clause, if really needed L Use std::vector::reserve and index

  • perations to get rid of synchro-needs
slide-17
SLIDE 17

Good Idea or Bad Idea?

Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 17

std::sort(hashes.begin(),hashes.end(),less); int max_query = *std::max_element(queries.begin(), queries.end()); if (max_query > 0.8f * n) std::sort(hashes.begin(), hashes.end()); else std::partial_sort(hashes.begin(), hashes.begin() + max_query + 1, hashes.end());

L Serial by default better use task-parallelism with OpenMP Since C++17: std::execution::parallel_policy

slide-18
SLIDE 18

Good Idea or Bad Idea?

Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 18

void qsort(unsigned char data[][MD5_DIGEST_LENGTH], unsigned int left, unsigned int right) { if (left < right) { auto pivot = qpartition(data, left, right); #pragma omp parallel sections { #pragma omp section if (pivot > 0) { qsort(data, left, pivot - 1); } #pragma omp section if (pivot < right - 1) { qsort(data, pivot + 1, right); } } } }

K good, but can do faster with #pragma omp task (better task distribution, pot. higher data locality)

slide-19
SLIDE 19

Hint: Use Pipelining

Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 19

Use pipelining to reduce allocated memory size and reduce possible paging.

slide-20
SLIDE 20

HOI: And the winner is …

ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 20

17,566 18,620 62,887 69,869 381,494 459,178 50 100 150 200 250 300 350 400 450 500 submission16430 submission16421 submission16398 submission16419 submission16424 submission16410

seconds

./hoi deadc0deba5e 268435456 0 32768 268435453

slide-21
SLIDE 21

ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 21

end

^D