Parallel Programming and Heterogeneous Computing Feedback Assignment 2
Max Plauth, Sven Köhler, Felix Eberhardt, Lukas Wenzel, and Andreas Polze Operating Systems and Middleware Group
Parallel Programming and Heterogeneous Computing Feedback Assignment - - PowerPoint PPT Presentation
Parallel Programming and Heterogeneous Computing Feedback Assignment 2 Max Plauth, Sven Khler , Felix Eberhardt, Lukas Wenzel, and Andreas Polze Operating Systems and Middleware Group Assignment 1: Covered Topics General Concepts:
Parallel Programming and Heterogeneous Computing Feedback Assignment 2
Max Plauth, Sven Köhler, Felix Eberhardt, Lukas Wenzel, and Andreas Polze Operating Systems and Middleware Group
■
General Concepts:
□
Foster’s Method
□
Amdahl’s Law
■
Shared Memory Parallelism with OpenMP:
□
Task 2.1: Heat Map
□
Task 2.2: IO-bound problem and reentrancy of legacy functions
□
Task 2.3: Task-Parallel workloads
□
Task 2.4: Java Monitors
■
Hardware Effects:
□
Efficient use of caching
Assignment 1: Covered Topics
ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 2
ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 3
Parsum
Good Idea or Bad Idea?
Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 4
#ifdef WITH_OMP #pragma omp parallel for default(none) shared(heatmaps) #endif for (auto row = 1; row < height - 1; ++row) { for (auto col = 1; col < width - 1; ++col) { /* ... */ } }
K No need to mask omp-pragmas (unless you have functions). Just don’t include –fopenmp in CFLAGS.
Heat Map: And the winner was (A1) …
ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 5
0,00* 12,18 12,41 28,27 30,54 3300** 20 40 60 80 100 120 submission16003 submission16005 submission16002 submission15983 submission16006 submission16022
seconds
./heatmap 1000 1000 1000 random.csv (4 runs)
Heat Map: And the winner is …
ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 6
0,484 0,877 2,432 4,245 12,259 2 4 6 8 10 12 14 submission16245 submission16417 submission16405 submission16429 submission16427
seconds
./heatmap 1000 1000 1000 random.csv
ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 7
decrypt
user266;Osten3 user906;Bahnhof
Your Verification Data
Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 8
barbera:Gozsjkgq.2N62 gene:SqJwiPjc8z9OQ grace:L3xIP64G5RVk6 ian:MyIR7zQEkP3Mg sheelagh:7CYgbT6A0xsM6 richard:oGlayhJ1bTXuE margarete:qRbG.QWxv9c.6 elon:UFy0LW2XSNPVo satoshi:Hqw9N3HL38lAw SubstitionItIs speedup0 NotSoLowLevelNoMore partitioning FishEyes ArrogantHippy GuideMeToTheMoon FlyVeryHigh BurnAllYourPower Dictionary: The 42 most common terms from Unit A Passwords:
Good Idea or Bad Idea?
Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 9
#pragma omp parallel for shared(result1, result2) for (int i = 0; i < tasks.size(); i++) { /* ... */ if (result1.found && result2.found) continue; for (int j = 0; j < dictPasswords.size(); j++) { auto password = dictPasswords[j]; struct crypt_data data; data.initialized = 0; /* ... */ } }
L Only two results, not synchronization on vars L Wide jumps through dict-data (dict >> tasks) for locality swap loops
Good Idea or Bad Idea?
Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 10
struct crypt_data data; /* ... */ data.initialized = 0; { if (strcmp(crypt_r((password + "0").c_str(), salt, &data), hash) == 0) { /* ... */ break; } if (strcmp(crypt_r((password + "1").c_str(), salt, &data), hash) == 0) { /* ... */ break; } /* ... */ }
L Loop unrolling only helps with tight loops L Potential overhead for string buffer allocation+free
Good Idea or Bad Idea?
Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 11
#pragma omp parallel shared(db,dict) { #pragma omp master { uint64_t last = 0; for (uint64_t i = 0; i < db_size; i++) { /* iterate through entire db character by character */ if (db[i] == '\n’) { /* if we are at a newline */ db[i] = '\0'; /* 0-terminate user entry */ #pragma omp task crack_user(db+last); last = i+1; } } } #pragma omp taskwait }
J Start tasks while parsing input
Good Idea or Bad Idea?
Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 12
while(dictFile >> word) { if (common_8_prefix(word, previousWord)) { continue; } previousWord = word; words.emplace_back(word); }
J Smart reduction of problem size crypt(3) only operates on first 8 chars of input
decrypt: And the winner is …
ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 13
51,305 62,151 274,29 305,585 662,597 1570,16 3210,346 500 1000 1500 2000 2500 3000 3500 s u b m i s s i
1 6 4 1 s u b m i s s i
1 6 4 8 s u b m i s s i
1 6 4 1 3 s u b m i s s i
1 6 3 8 8 s u b m i s s i
1 6 4 2 8 s u b m i s s i
1 6 4 9 s u b m i s s i
1 6 4 4
seconds
./decrypt taskCryptPw.txt taskCryptDict.txt
ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 14
Hash Ordered Index
How to MD5?
Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 15
Good Idea or Bad Idea?
Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 16
#pragma omp parallel shared(hashes) firstprivate(seed) { const uint_fast32_t thread = omp_get_thread_num(); const uint_fast32_t threads = omp_get_num_threads(); const uint_fast32_t from = thread*(blocks/threads); const uint_fast32_t to = (thread != threads-1) ? (thread+1)*(blocks/threads) : blocks; add(seed, from); for (unsigned int i = from; i < to; i++) { __uint128_t v = md5(seed); #pragma omp critical hashes.push_back(v); inc(seed); } }
K Manual scheduling against paradigm use scheduling clause, if really needed L Use std::vector::reserve and index
Good Idea or Bad Idea?
Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 17
std::sort(hashes.begin(),hashes.end(),less); int max_query = *std::max_element(queries.begin(), queries.end()); if (max_query > 0.8f * n) std::sort(hashes.begin(), hashes.end()); else std::partial_sort(hashes.begin(), hashes.begin() + max_query + 1, hashes.end());
L Serial by default better use task-parallelism with OpenMP Since C++17: std::execution::parallel_policy
Good Idea or Bad Idea?
Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 18
void qsort(unsigned char data[][MD5_DIGEST_LENGTH], unsigned int left, unsigned int right) { if (left < right) { auto pivot = qpartition(data, left, right); #pragma omp parallel sections { #pragma omp section if (pivot > 0) { qsort(data, left, pivot - 1); } #pragma omp section if (pivot < right - 1) { qsort(data, pivot + 1, right); } } } }
K good, but can do faster with #pragma omp task (better task distribution, pot. higher data locality)
Hint: Use Pipelining
Sven Köhler ParProg 2019 Feedback Assignment 2 Chart 19
Use pipelining to reduce allocated memory size and reduce possible paging.
HOI: And the winner is …
ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 20
17,566 18,620 62,887 69,869 381,494 459,178 50 100 150 200 250 300 350 400 450 500 submission16430 submission16421 submission16398 submission16419 submission16424 submission16410
seconds
./hoi deadc0deba5e 268435456 0 32768 268435453
ParProg 2019 Feedback Assignment 2 Sven Köhler Chart 21