Acceleration of Genetic Algorithms for Acceleration of Genetic Algorithms for Sudoku Solution on Many-core Processors
Yuji Sato*1, Naohiro Hasegawa*1, Mikiko Sato*2
*1: Hosei University, *2: Tokyo University of A&T
Acceleration of Genetic Algorithms for Acceleration of Genetic - - PowerPoint PPT Presentation
Acceleration of Genetic Algorithms for Acceleration of Genetic Algorithms for Sudoku Solution on Many-core Processors Yuji Sato* 1 , Naohiro Hasegawa* 1 , Mikiko Sato* 2 * 1 : Hosei University, * 2 : Tokyo University of A&T 0 O Outline li
*1: Hosei University, *2: Tokyo University of A&T
Background Sudoku Solution Accuracy by GA Accelerating Genetic Computation with Accelerating Genetic Computation with
Evaluation Tests Conclusion Conclusion
As the first step towards that objective,
Genetic computation is suitable for
Therefore increasing the number of Therefore, increasing the number of
GPUs are designed for the processing
But research on General-Purpose But, research on General Purpose
number, the other position should be solved. A Sudoku puzzles is completed by filling in all of the empty cells with numerals 1 to 9 completed by filling in all of the empty cells with numerals 1 to 9.
g p g crossover operation. The chromosome is defined as one-dimensional array
points can only appear between sub blocks points can only appear between sub blocks.
This design generate chromosomes
9 9
i=1 9
j=1 9
j
6 4 7 2 1 7 9 2 8 8 1 5 3 5 9 7 4 6
8 1 5 3 5 9 7 4 6 2 3 7 8 6 4 3 1 5 9 2 8 1 7 5 6 3 4
9 2 8 1 7 5 6 3 4 4 7 1 6 3 2 5 8 9 5 3 6 9 4 8 1 7 2
5 3 6 9 4 8 1 7 2 6 3 4 8
Score of the row that constitute the sub-blocks
3 4 8 8 9 4 3
the sub blocks
27 27 27 26
27 27 25
1’ 1 1 1 1 1 2 2’ 2 1 1’ 1
1 1 1 1 1 1 1 1 1 2 2 2 1 1 1’ 2’ 2 2 1’ 1 1 2 2’ 2
1 1 1 1 1 1 1 1 1 1 1’ 1 2 2 2 2 2 2 2 2 2 1 2 2 1 2 2 1 2 2 1 2 2 1 2’ 2 1’ 2 2 1 2 2 1 2’ 2 1’ 2 2 2 2 2 1 2 2
1 2 2 1 2 2 1 2 2’ 1 2 2
1 2 2’ 1’ 2 2’
[Population size]
[Number of child candidates/Parents] 2 [Crossover rate]
[Crossover rate]
[Mutation rate]
[Tournament size]
We selected two puzzles from each
For comparison with the conventional
Sudoku puzzle Our proposed GA 100, 000 trials Mantere-2008 [7] 100, 000 trials
100, 000 trials 100, 000 trials AI Escarcot 83 /100 5/100
[Population size] *1: 150, *2: 11 [Population size] 1: 150, 2: 11
Improve efficiency Speed up Mantere etc. Cultural Algorithm (CA) Small population size Our GA Properly GA design + LS Parallel processing on GPU
The results show the proposed genetic
On the other hand, the processing time
Board ELSA GLADIA GTX460 #Core 336 (7 SM X 48 Core / SM) Clock 675 MHz Clock 675 MHz Memory 1 GB Shared memory / SM 48 KB y #Register / SM 32768 #Thread / SM 1024
The parallelization of genetic computing must be implemented with full consideration given to the feature.
The genetic computing programs
Single-core:
Parallel processing for individuals: Parallel processing for individuals:
exe
Parallel processing for manipulation :
(Intel Core i7)
7 blocks / grid 3 x N threads / block 3 x N threads / block
CPU MCP: Intel Corei7 920 (2.67GHz, 4 cores) GPU: Phenom ⅡX4 945 (3 GHz, 4 cores) OS Ubuntu 10.04 C Compiler gcc 4.4.3 (optimization " –O3") CUDA Toolkit 3.2 RC
The evaluation results for problems
Table 6. The acceleration effect of using the
Count [%] Average Execution
Count [%] Average Gen. Execution time Java 83 45,468 7m 50s 678
x 74
C 86 44,250 1m 26s 320 Core i7 #Thread: 8 100 5,992 12s 12
x 14
GTX460 #SM: 7 97 22,142 6s 391
C ff 100 000 i P l i i 150 Cutoff set: 100,000 generations, Population size: 150
Table 13. The minimum numbers of generations and
Sudoku Minimum Gen. Execution time SD1 83 25 ms SD2 158 47 ms SD2 158 47 ms SD3 198 76 ms
Table 7. The number of generations until the correct
Count [%] Average Execution Count [%] Average Gen. Execution time #Th: 1 82 42,276 28s 41 #Th: 1 82 42,276 28s 41 #Th: 2 98 25,580 22s 48 #Th: 4 100 13 261 21s 47 #Th: 4 100 13,261 21s 47 #Th: 8 100 5,992 12s 12
Table 10. The number of generations until the correct
Count [%] Average Execution Count [%] Average Gen. Execution time #SM: 1 50 70,067 20s 199 #SM: 2 69 58,786 16s 958 #SM: 3 82 41,757 12s 630 #SM 4 93 31 254 9 260 #SM: 4 93 31,254 9s 260 #SM: 5 95 28,709 8s 287 #SM: 6 97 22 065 6s 368 #SM: 6 97 22,065 6s 368 #SM: 7 97 22,142 6s 391
Area for individual data:
Area for selection: 4 bytes (int) x N
Area for crossover: 4 bytes (float) x N/2 Area for mutation: 1 byte (char) x 81N Area for mutation: 1 byte (char) x 81N
Maximum number of N which can be stored
Table 14. The execution time and the correct solution
Sudoku Count [%] Average Execution [ ] g Gen. time SD1 100 9072 2s 751 SD2 100 13 481 4s 530
29%
SD2 100 13,481 4s 530 SD3 100 22,799 6s 862
Count [%]
Best Gen.
100 100 8,641 11s 63 644 150 100 5 992 12s 12 243 150 100 5,992 12s 12 243 200 100 7,115 19s 20 229 300 300 100 9,441 38s 29 123 400 98 15,441 84s 76 86
At the same time, it is more difficult to
We have used the problem of solving
Specifically, we implemented parallel genetic
We want to try another parallel GA
We need to investigate another
We want to show that EC (+ GPU) can