Optimisation
Zhiwei Cao, Renhao Huang, Junning Fan
Optimisation Zhiwei Cao, Renhao Huang, Junning Fan Background - - PowerPoint PPT Presentation
Optimisation Zhiwei Cao, Renhao Huang, Junning Fan Background Knowledge Introduction What is this optimisation about? NP-complete Problem NP-complete problem in the real world MetaHeuristic Algorithm What is this
Zhiwei Cao, Renhao Huang, Junning Fan
❏Background Knowledge
❏What is this optimisation about? ❏NP-complete Problem ❏NP-complete problem in the real
world
❏MetaHeuristic Algorithm
❏Optimisation of software and hardware ❏Find the balance between power consumption
and speed
❏NP problem stands for problems
that run in Non-deterministic Polynomial time and can be verified in polynomial time.
❏A problem p in NP is NP-
complete if every other problem in NP can be transformed (or reduced) into p in polynomial time.
❏Time Complexity
❏ Deterministic polynomial time:
O(na)
❏ Non-Deterministic polynomial
time: O(an) ❏ O(n!), O(nW)
❏A Hamiltonian path is a path
that visits each vertex of the graph exactly once.
❏Time complexity
❏ Find the solution:
O(2nn2)
❏ Verification:
O(1)
Hardware
Processor: Hi3559A dual-core 1.8 CPU Memory: 1GB
Software
System: Embedded linux
Functionalities:
Structural Recogisation in 100 ms Broader Crossing detection
Renhao Huang
A salesman wants to: 1. Start from a city A 2. Travel through cities {B,C,D,E} 3. Go back to city A
A B C D E Aim: Find the shortest path?
Possible solution: 1. Brute force: Enumeration (O(n!)) 2. Dynamic Programming (O(n22n)) 3. Cutting-plane method
Assume:
However, these accurate algorithms cannot handle large scales of problems. Solution: Simulated Annealing Algorithm
Simulated Annealing
process of annealing
Gelatt et. in 1983
completes cases
VLSI, Deep Learning, Image processing, TSP
What is Annealing
science.
Heating Maintain Cooling
With the temperature increases, the motion of the molecules increases. Under the constant temperature, the free energy goes to minimum and the system becomes equilibrium With the temperature decrease, the thermal motion of molecules decreases and the energy level goes to minimum.
Start
Get Initial Solution A Generate a New Solution B
Find the cost ΔE = E(A) - E(B)
ΔE > 0
Initialize T, Final Tend and Cooling Rate 0< δ<1
Find Possibility P(ΔE) P(ΔE) > 0.5 Accept New Solution
Cooling T =δ * T
Reaches the
iterations
END
Preserve Previous Solution T F T T F
Algorithm
probability
learning(or deceleration of the temperature)
○
If δ is too small: less accuracy
○
If δ is too large: time consuming
T = Tend F F T
: The change of the internal energy : Boltzmann constant
probability goes to zero
a big change of internal energy
The internal energy of a system also means the evaluation of the cost of the system. In most cases, we want to minimize the energy. Energy in TSP = the total distance
∆"
A B C D E
Parameter: T = 50, Tend = 0.01, δ = 0.95 Step 1: Initialize first path: Path A = A -> B -> C -> D -> E -> A Step 2: Generate a new path (by swapping) Path B = A -> E -> C -> D -> B -> A Step 3: Evaluate two paths: Δdistance =distance(Path A) - distance(Path B) Step 4.1: If Δdistance > 0, B win. Accept Path B Step 4.2: If Δdistance < 0, A win. Calculate P(Δdistance) to decide if Path B stays Step 5: After several loops from step 1 to step 4, update T ← T x δ and repeat.
Simulated Annealing vs Vanilla Greedy
minimum
jump out the local minimum Simulated Annealing vs Dynamic Programming
is close to the optimal
Local minima global minima
A tradeoff between greedy and dynamic programming
problems ○ Find a solution with minimum cost or maximum benefit ○ But too heavy to use dynamic programming ○ It is an NP-complete problem ○ Perfect solution is not required
○ In TSP function: Total distance = ∑ distance between a pair of cities ○ In Knapsack problem: Total values of all items
○ Have to avoid solutions outside the limitations
○ Current solution ○ Another possible solution ○ Compare and decide which one to be selected
Ex: Problem Definition on FPGA Floorplanning using SA
○
An NP-complete problem
○
FPGA consists of CLBs connected with each other by routing.
○
Long wires will lead to the delay.
○
Good floor planning will increase the efficiency of the FPGA
○
Wire Length Cost IO between I/O connections
○
Internal Wire Length Cost IC
○
Extra cost Ex such as the power cost from the logic gates or the area demand.
○
Total Cost = sum of {Ex, IO, IC}
○
Block placement cannot be overlapped, etc
○
Find the floor plan with minimum total cost
x10 x20
Delay on wires
○
High initial temperature -> high accuracy low efficiency
○
Low initial temperature -> low accuracy high efficiency
○
High δ -> high accuracy low efficiency
○
Low δ -> low accuracy high efficiency
○
Search under a same temperature is important
○
But too many iterations will be time consuming
Edward Fan
Challenge: Hardware- software partitioning
embedded system, there’re various computation resources to be selected for tasks. CPUs can execute software code FPGAs can be reconfigured to accelerate various tasks ASIC components (DSPs) can accelerate certain computations like multiplication
timing for executing a task.
to these resources to meet the limitations?
Difference between resources
Software implementations are the slowest, but consumes least power. FPGAs are fast, and are reconfigurable to execute other tasks, but they are power hungry. ASIC are the fastest, but they lack flexibility to be executed other tasks while being power hungry.
multiple objects of power, timing and cost? Use of genetic Algorithm is purposed to achieve these multiple objectives.
Genetic Algorithm
(expect some randomness)
combinatorial problem (like assignment of a discrete, finite set of
conditions)
solution space
genetic inheritance, exchange and mutation of organisms.
Chromosome of genetic algorithm represent a certain solution to the
software/hardware/FPGA)
Fitness is a function that evaluates how a specific chromosome meets the requirements of the problem.
H H F S H H F S A chromosomic representation of scheduling first task to hardware, second to hardware, third to FPGA, forth to software F( ) = 2.2
N pairs of parents. These parents will have the change to pass their gene to the next generation.
H F F S H H H S H S F S S H F S S S F S S S S S F F F F H H F S Fittest 2 pairs
creating offspring, the parent may exchange part of chromosome.
A crossover point is randomly selected Offspring have chromosome from both parents
be subjected to random mutation, result in minor alternation of their gene.
Mutation introduces diversity and randomness to the population
generation(G+1)’s population will be formed by the selected parent, altered by effect of crossover and mutation.
H F F S H H H S H S F S H H S S G S F S S H H F S H S H S H H F S G+1
Randomly generated
Discussed in next slide
For a pre-set number of generations
satisfying time requirement and power requirement
For a task graph of N nodes, each node represents a task. Every note will be assigned to H or F
Software. The chromosome will be a string of length N consisting H, F, S that represent a certain allocation of resources.
Time: !
"#$% = ' '(
)*+,-. )/-0
∗
' '(23434-,
5/-0
, Tmax is the slowest possible configuration experienced, Cmax is max communication delay Power: !
678%9 = ' '(
:*+,-. :/-0
∗
' '(
;3434-, ;/-0
, S is the switching power of interprocessor communication(switching to another resource) Overall: !
7<%9=>> = ! "#$% ∗ ! 678%9
that satisfy the timing and power requirements.
" = $
0 , ' < ) * ∗ ' − ) , ' ≥ ), D is deadline
" = $
0, ! < !
./0123
4 ∗ ! − !
0/5 , ! ≥ ! 0/5
6 = $
0, 78380/. = 0 78380/., 9:ℎ<=>?@<
B13CD = A 5E5CD ∗ " "FGH ∗ " "FGI
Capable of generating resource allocation
It is hard do assign estimated execution time for this GA Classifying resources into ASIC, Software and FPGA can be too generic for actual application.