Master Project Presentation Given by Yandong Wang To committee of - - PowerPoint PPT Presentation

master project presentation
SMART_READER_LITE
LIVE PREVIEW

Master Project Presentation Given by Yandong Wang To committee of - - PowerPoint PPT Presentation

Master Project Presentation Given by Yandong Wang To committee of Rochester Institute of technology Project Committee: Chair : Prof Alan Kaminsky Reader : Prof Stanislaw Radziszowski Observer: Prof James Heliotis NVIDIA CUDA


slide-1
SLIDE 1

Master Project Presentation

Given by Yandong Wang To committee of

Rochester Institute of technology

Project Committee: Chair : Prof Alan Kaminsky Reader : Prof Stanislaw Radziszowski Observer: Prof James Heliotis

slide-2
SLIDE 2

NVIDIA CUDA Architecture-based Parallel Incomplete SAT Solver

Agenda:

  • Introduction of this project
  • Introduction of satisfiability problem and SAT solver
  • Introduction of CUDA GPU programming
  • CUDA-based Parallel Incomplete SAT Solver Design
  • Measurement and Observation of the new CUDA-based SAT Solver
  • Related research and future work
slide-3
SLIDE 3

Introduction of Project

Stochastic local search and Genetic Algorithm Massive parallel Computing capability

  • f CUDA GPU

CUDA SAT Solver combine

slide-4
SLIDE 4

Satisfibility (SAT) Problem

Problem description

  • “Given a boolean expression, determine if there exists an assignment of

true or false to all boolean variables that make entire expression to be true ?”

slide-5
SLIDE 5
  • NP-complete problem
  • Literal
  • Clauses
  • Conjunctive Normal Form (CNF)
  • k-SAT (2-SAT, 3-SAT, max-SAT)
  • Phase Transition Phenomenon

Satisfibility (SAT) Problem

Terminology

slide-6
SLIDE 6

Satisfibility (SAT) Problem

Phase Transition Phenomenon

slide-7
SLIDE 7

SAT Problem Solver

  • Complete SAT solver
  • Incomplete SAT solver
slide-8
SLIDE 8

SAT Problem Solver

complete SAT solver

  • Based on DPLL algorithm whose principle is backtracking and divide-

and-conquer

  • Unit Propagation
  • Pure Literal Elimination
slide-9
SLIDE 9

SAT Problem Solver

Incomplete SAT solver

  • Stochastic local Search.

Random walk strategy

  • Genetic Algorithm

Cellular genetic algorithm

slide-10
SLIDE 10

Random Walk Strategy

  • Involving process

Pure (unbiased) random walk selection strategy Biased heuristic search strategy

slide-11
SLIDE 11

Cellular Genetic Algorithm

  • Inherit the properties of

regular genetic algorithm

  • Diffusion model
slide-12
SLIDE 12

Open Issues

  • Keep steady diversity of the search space.
  • Population homogeneity
  • Premature convergence
slide-13
SLIDE 13

CUDA GPU Programming

  • Designed specifically for computing high parallel intensive-computation.
  • Concentrate on similar data processing rather than data caching, flow

control.

  • NVIDIA's CUDA SDK and high-level programming language C.
slide-14
SLIDE 14

CUDA GPU Programming Model

  • Threads block
  • Blocks Grid
  • k-threads unit call warps
  • Maximum number of blocks

65535 * 65535

  • Number of threads is limited
  • Transparent scalability
  • Single instruction multiple thread
  • _synthreads() & ThreadSynchronize()
slide-15
SLIDE 15

CUDA GPU Memory Model

  • Device memory and host memory
  • Global memory
  • Register
  • Shared memory
  • Local memory
  • Constant memory
slide-16
SLIDE 16
  • CUDA-based Parallel Incomplete SAT Solver Design

Initialization variables Optimize clauses Device configuration Update random number generator Population initialization Generate random masks Do{ 1: initialize necessary variables

_synchronization()

2: neighbor selection

_synchronization()

3: crossover and mutation

evaluation() and _synchronization()

4: random walk strategy }while(evalutation fail) No solution found Print result Time out

slide-17
SLIDE 17

Data Allocation in Device Memory

  • Data need to be transferred to device memory.
  • Put as much data as possible into the shared memory or constant memory.
  • Truth assignment matrix in global memory.
  • Random generated masks in global memory.
slide-18
SLIDE 18
  • Truth assignment matrix
slide-19
SLIDE 19

Data in Constant memory and shared memory

  • Put clause information into constant memory.

– Limited by the size of the problem.

  • Using shared memory as truth assignment cache.

– Limited by the number of threads in each block.

slide-20
SLIDE 20

Random Number Generator

  • Keep diversity of the search space.
  • Hash function
  • Parallelize random number generator.

– Multiple random number generators ? – Using one random number generator ?

slide-21
SLIDE 21

Sequence Splitting

  • Approach to parallelize a sequential random number generator.
  • A tradeoff needs to be made. (speed or perfect random number sequence)
slide-22
SLIDE 22

Generate initial random population

  • Minimize the probability of different threads generating the same truth

assignment.

  • Each truth assignment is a char array.
slide-23
SLIDE 23

Generate crossover and mutation masks

  • Minimize the probability of different threads generating the same truth

assignment.

  • The probability of 1 in the mask is equal to the P.
slide-24
SLIDE 24

Evaluation

Char array = word + Bit array

slide-25
SLIDE 25

Neighbor Selection

slide-26
SLIDE 26

Crossover and Mutation

  • riginal design
slide-27
SLIDE 27

Crossover and Mutation

modified design 1

slide-28
SLIDE 28

Crossover and Mutation

modified design 2

slide-29
SLIDE 29

Random walk strategy and Evolution

  • Random walk strategy consumes most of the running time.
  • Greedy strategy.
  • Back and forth M times.
  • Always replace the old generation.
slide-30
SLIDE 30

Testing result and Observation

Testbed

  • Sun Microsystem Ultra 40 workstation with 1 GHz dual-core AMD

Opteron 2218 CPU, 8GB main memory.

  • NVIDIA Tesla C870, 16 multiprocessors, each has 8 cores. 500 MHz

clock

  • Uniform Random 3-SAT problem set.

(size from 20 variables / 91 clauses to 250 variables / 1065 clauses)

slide-31
SLIDE 31
  • Used all of the 16 multiprocessors.
  • Running time depends on the initial value of seed at a great extent.
  • Hardness of different instances at the same size varies greatly.

Testing result and Observation

Running time measurement

slide-32
SLIDE 32

Testing result and Observation

Running time measurement

slide-33
SLIDE 33

Testing result and Observation

Running time measurement

slide-34
SLIDE 34

Testing result and Observation

Running time measurement

slide-35
SLIDE 35

Testing result and Observation

Scalability measurement

slide-36
SLIDE 36

Testing result and Observation

Scalability measurement

slide-37
SLIDE 37

Testing result and Observation

Scalability measurement

slide-38
SLIDE 38

Testing result and Observation

Scalability measurement

slide-39
SLIDE 39

Future work

  • Decide the right seed.
  • Conditional statement in neighbor selection.
  • Use of constant memory and shared memory.
  • Unit Propagation and Pure Literal Elimination.
  • Test on structured SAT problems.
slide-40
SLIDE 40
  • “Parallel resolution of the satisfiability problem: a survey”
  • “Implementing Survey Propagation on Graphics Processing Units”
  • “Using Modern Graphics Architectures for General Purpose Computing: A

Framework and Analysis”

  • “NVIDIA CUDA for research”

Related work

slide-41
SLIDE 41

[1] M.W.Moskewicz, C.F.Madigan, Y.Zhao, L.Zhang, S.Malik "Chaff: Engineering an Efficient SAT Solver" in Proc.of the Design Automation Conference, pages: 530-535, Year 2001. [2] Mate Soos, Karsten Nohl and Claude Castelluccia "Extending SAT Solvers to Cryptographic Problems" In Theory and Applications of Satisfiability Testing - SAT 2009, pages: 244-257, Year 2009. [3] Youssef Hamadi, Lakhdar Sais "ManySAT: a parallel SAT solver" Journal on Satisfiability, Boolean Modeling and Computation (JSAT), Year 2009. [4] W.Chrabakh and R.Wolski. "GrADSAT: A parallel sat solver for the grid." Technical report, UCSB CS TR N. 2003-05, Year 2003. [5] Cook, Stephen "The complexity of theorem proving procedures" Proceedings of the Third Annual ACM Symposium on Theory of Computing. Pages: 151-158. Year 1971.

References

slide-42
SLIDE 42

[6] Papadimitriou, C., Computational Complexity. 1994. Addison–Wesley. [7] D.Singer."Parallel resolution of the satisfiability problem: a survey." In E.Talbi,editor. Parallel Combinatorial Optimization. John Wiley and Sons, pages: 123-147, Year 2006. [8] Davis, Martin, Putnam, Hillary "A Computing Procedure for Quantification Theory" In Journal

  • f the ACM 7. pages: 201-215, Year 1960.

[9] D.Singer, and A.Monnet. "JaCk-SAT: A New Parallel Scheme to Solve the Satisfiability Problem (SAT) based on Join-and-Check." In Proceedings 6th. Int. Conf. on Parallel Processing and Applied Mathematics, PPAM 2007, Gdansk, Poland, Springer Verlag LNCS 4967, pages: 249-258, Year 2008. [10] Gianluigi Folino, Clara Pizzuti, and Giandomenico Spezzano "Parallel Hybrid Method for SAT That Couples Genetic Algorithms and Local Search" In IEEE transaction on evolutionary computation, VOL.5, NO.4, Year 2001.

References

slide-43
SLIDE 43

[11] Wei Wei and Bart Selman "Accelerating Random Walks" In Principles and Practice of Constraint Programming, pages: 61-67, Year 2002 [12] Weisstein, Eric W., "von Neumann Neighborhood" from MathWorld. [13] Lance Chambers "Practical Handbook of Genetic Algorithms: Complex coding systems" pages: 415-421, Year 2001 [14] A.Schoneveld, J.F.de Ronde, P.M.A.Sloot, and J. A. Kaandorp "A parallel cellular genetic algorithm used in finite element simulation " In Parallel Problem Solving from Nature °U PPSN IV pages: 533-542, Year 2006 [15] NVIDIA CUDA Programming Guide. Version 3 http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_Programmi ngGuide.Last accessd date: MAY 10 2010

References

slide-44
SLIDE 44

[16] Alan Kaminsky, http://www.cs.rit.edu/ ark/spring2009/736/ 4005-736 Parallel Computing II, GPU Computing: Introduction to GPGPU and CUDA.2009 [17] W.Press et al., Numerical Recipes: The Art of Scientific Computing, Third Edition (Cambridge University Press, 2007), page 352. [18] Alan Kaminsky, http://www.cs.rit.edu/ ark/pj/doc/index.html Parallel Java Library Documentation. [19] Alan Kaminsky "Building Parallel Programs SMPs, Clusters, and Java" Cengage Course Technology, 2010, ISBN 1-4239-0198-3 [20] P.Manolios and Y.Zhang, ˇ TImplementing Survey Propagation on Graphics Processing UnitsˇT in International Conference on Theory and Applications of Satisfiability Testing (SAT 2006). Seattle, WA: Springer, pages: 311-324, Year: Aug 2006. [21] Chris J. Thompson, Sahngyun Hahn and Mark Oskin. Using Modern Graphics Architectures for General Purpose Computing: A Framework and Analysis In Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture. ISBN ISSN:1072-4451, 0-7695-1859-1, Pages: 306- 317, Year 2002.

References

slide-45
SLIDE 45

Questions ?