Optimisation Zhiwei Cao, Renhao Huang, Junning Fan Background - - PowerPoint PPT Presentation

optimisation
SMART_READER_LITE
LIVE PREVIEW

Optimisation Zhiwei Cao, Renhao Huang, Junning Fan Background - - PowerPoint PPT Presentation

Optimisation Zhiwei Cao, Renhao Huang, Junning Fan Background Knowledge Introduction What is this optimisation about? NP-complete Problem NP-complete problem in the real world MetaHeuristic Algorithm What is this


slide-1
SLIDE 1

Optimisation

Zhiwei Cao, Renhao Huang, Junning Fan

slide-2
SLIDE 2

Introduction

❏Background Knowledge

❏What is this optimisation about? ❏NP-complete Problem ❏NP-complete problem in the real

world

❏MetaHeuristic Algorithm

slide-3
SLIDE 3

What is this

  • ptimisation about?

❏Optimisation of software and hardware ❏Find the balance between power consumption

and speed

slide-4
SLIDE 4

NP-complete Problem

❏NP problem stands for problems

that run in Non-deterministic Polynomial time and can be verified in polynomial time.

❏A problem p in NP is NP-

complete if every other problem in NP can be transformed (or reduced) into p in polynomial time.

❏Time Complexity

❏ Deterministic polynomial time:

O(na)

❏ Non-Deterministic polynomial

time: O(an) ❏ O(n!), O(nW)

slide-5
SLIDE 5

Hamiltonian Path Problem

❏A Hamiltonian path is a path

that visits each vertex of the graph exactly once.

❏Time complexity

❏ Find the solution:

O(2nn2)

❏ Verification:

O(1)

slide-6
SLIDE 6

NP- complete problem in the real world

Hardware

Processor: Hi3559A dual-core 1.8 CPU Memory: 1GB

Software

System: Embedded linux

Functionalities:

Structural Recogisation in 100 ms Broader Crossing detection

slide-7
SLIDE 7

MetaHeuristic Algorithm

slide-8
SLIDE 8

Simulated Annealing

Renhao Huang

slide-9
SLIDE 9

Ex: Travel Salesman Problem(TSP)

A salesman wants to: 1. Start from a city A 2. Travel through cities {B,C,D,E} 3. Go back to city A

A B C D E Aim: Find the shortest path?

Possible solution: 1. Brute force: Enumeration (O(n!)) 2. Dynamic Programming (O(n22n)) 3. Cutting-plane method

Assume:

  • Edges are bidirectional
  • Each edge has a weight (distance)
  • Must go through all cities

However, these accurate algorithms cannot handle large scales of problems. Solution: Simulated Annealing Algorithm

slide-10
SLIDE 10

Simulated Annealing

  • A heuristic algorithm imitating the

process of annealing

  • Invented by S. Kirkpatrick, D.

Gelatt et. in 1983

  • To find an optimal solution in NP-

completes cases

  • Popular in applications such as

VLSI, Deep Learning, Image processing, TSP

slide-11
SLIDE 11

What is Annealing

  • A Physical technology used in Metallurgy and material

science.

  • Cooling slowly to find the lowest energy state (Crystallization)
  • Steps:

Heating Maintain Cooling

With the temperature increases, the motion of the molecules increases. Under the constant temperature, the free energy goes to minimum and the system becomes equilibrium With the temperature decrease, the thermal motion of molecules decreases and the energy level goes to minimum.

slide-12
SLIDE 12

Start

Get Initial Solution A Generate a New Solution B

Find the cost ΔE = E(A) - E(B)

ΔE > 0

Initialize T, Final Tend and Cooling Rate 0< δ<1

Find Possibility P(ΔE) P(ΔE) > 0.5 Accept New Solution

Cooling T =δ * T

Reaches the

  • no. of

iterations

END

Preserve Previous Solution T F T T F

Algorithm

  • Temperature T is just a value used to control the

probability

  • δ is similar to the learning rate in deep

learning(or deceleration of the temperature)

If δ is too small: less accuracy

If δ is too large: time consuming

T = Tend F F T

slide-13
SLIDE 13

Probability Function

: The change of the internal energy : Boltzmann constant

  • With temperature goes down,

probability goes to zero

  • The probability becomes low if there is

a big change of internal energy

The internal energy of a system also means the evaluation of the cost of the system. In most cases, we want to minimize the energy. Energy in TSP = the total distance

∆"

slide-14
SLIDE 14

Travel Salesman Problem: SA

A B C D E

Parameter: T = 50, Tend = 0.01, δ = 0.95 Step 1: Initialize first path: Path A = A -> B -> C -> D -> E -> A Step 2: Generate a new path (by swapping) Path B = A -> E -> C -> D -> B -> A Step 3: Evaluate two paths: Δdistance =distance(Path A) - distance(Path B) Step 4.1: If Δdistance > 0, B win. Accept Path B Step 4.2: If Δdistance < 0, A win. Calculate P(Δdistance) to decide if Path B stays Step 5: After several loops from step 1 to step 4, update T ← T x δ and repeat.

slide-15
SLIDE 15

Travel Salesman Problem: SA

slide-16
SLIDE 16

Why Simulated Annealing is Good

Simulated Annealing vs Vanilla Greedy

  • Greedy Algorithm may be stuck at the local

minimum

  • Simulated Annealing has an opportunity to

jump out the local minimum Simulated Annealing vs Dynamic Programming

  • DP will reach the global minima but slow
  • Simulated Annealing can find a solution that

is close to the optimal

Local minima global minima

A tradeoff between greedy and dynamic programming

slide-17
SLIDE 17

How to use Simulated Annealing Algorithms?

  • Simulated Annealing algorithm can be used on most of optimization

problems ○ Find a solution with minimum cost or maximum benefit ○ But too heavy to use dynamic programming ○ It is an NP-complete problem ○ Perfect solution is not required

  • First define the energy function, e.g.:

○ In TSP function: Total distance = ∑ distance between a pair of cities ○ In Knapsack problem: Total values of all items

  • List all constraints:

○ Have to avoid solutions outside the limitations

  • Finally, follow the steps of simulated annealing algorithm

○ Current solution ○ Another possible solution ○ Compare and decide which one to be selected

slide-18
SLIDE 18

Ex: Problem Definition on FPGA Floorplanning using SA

  • Description:

An NP-complete problem

FPGA consists of CLBs connected with each other by routing.

Long wires will lead to the delay.

Good floor planning will increase the efficiency of the FPGA

  • Define:

Wire Length Cost IO between I/O connections

Internal Wire Length Cost IC

Extra cost Ex such as the power cost from the logic gates or the area demand.

Total Cost = sum of {Ex, IO, IC}

  • Constraints:

Block placement cannot be overlapped, etc

  • Object:

Find the floor plan with minimum total cost

slide-19
SLIDE 19

Ex: Problem Definition on FPGA Floorplanning using SA

x10 x20

Delay on wires

slide-20
SLIDE 20

Some Disadvantages of SA

  • Selection of initial temperature is essential.

High initial temperature -> high accuracy low efficiency

Low initial temperature -> low accuracy high efficiency

  • Need to set a suitable cooling rate δ

High δ -> high accuracy low efficiency

Low δ -> low accuracy high efficiency

  • Need to control the number of iterations between each cooling

Search under a same temperature is important

But too many iterations will be time consuming

  • Still has possibility to stuck at local minimum
slide-21
SLIDE 21

Genetic Algorithm in HW/SW partitioning

Edward Fan

slide-22
SLIDE 22

Challenge: Hardware- software partitioning

  • For SoC like Zynq or more complicated

embedded system, there’re various computation resources to be selected for tasks. – CPUs can execute software code – FPGAs can be reconfigured to accelerate various tasks – ASIC components (DSPs) can accelerate certain computations like multiplication

  • We have limitation on both cost, power and

timing for executing a task.

  • Question: How do we plan and assign tasks

to these resources to meet the limitations?

slide-23
SLIDE 23

Difference between resources

  • Experiment shows:

– Software implementations are the slowest, but consumes least power. – FPGAs are fast, and are reconfigurable to execute other tasks, but they are power hungry. – ASIC are the fastest, but they lack flexibility to be executed other tasks while being power hungry.

  • How to plan these resources to meet

multiple objects of power, timing and cost? – Use of genetic Algorithm is purposed to achieve these multiple objectives.

slide-24
SLIDE 24

Genetic Algorithm

  • A stochastic search based algorithm.

(expect some randomness)

  • It performed well on solving

combinatorial problem (like assignment of a discrete, finite set of

  • bjects that satisfies given

conditions)

  • Good at avoiding local optimal in the

solution space

  • A genius algorithm inspired by the

genetic inheritance, exchange and mutation of organisms.

slide-25
SLIDE 25

Elements of Genetic Algorithm

  • Chromosome

– Chromosome of genetic algorithm represent a certain solution to the

  • problem. (e.g. which task is executed by

software/hardware/FPGA)

  • Fitness

– Fitness is a function that evaluates how a specific chromosome meets the requirements of the problem.

H H F S H H F S A chromosomic representation of scheduling first task to hardware, second to hardware, third to FPGA, forth to software F( ) = 2.2

slide-26
SLIDE 26

Operations of Genetic Algorithm

  • Selection
  • Crossover
  • Mutation
  • Reproduction
slide-27
SLIDE 27

Selection

  • For a certain generation, select the fittest

N pairs of parents. These parents will have the change to pass their gene to the next generation.

H F F S H H H S H S F S S H F S S S F S S S S S F F F F H H F S Fittest 2 pairs

slide-28
SLIDE 28

Crossover

  • When pair of parents is selected for

creating offspring, the parent may exchange part of chromosome.

– A crossover point is randomly selected – Offspring have chromosome from both parents

slide-29
SLIDE 29

Mutation

  • When a offspring is formed, their gene can

be subjected to random mutation, result in minor alternation of their gene.

– Mutation introduces diversity and randomness to the population

slide-30
SLIDE 30

Reproduction

  • For generation G, the next

generation(G+1)’s population will be formed by the selected parent, altered by effect of crossover and mutation.

H F F S H H H S H S F S H H S S G S F S S H H F S H S H S H H F S G+1

slide-31
SLIDE 31

Procedure of algorithm

  • Initial population:

– Randomly generated

  • Fitness Evaluation:

– Discussed in next slide

  • Termination:

– For a pre-set number of generations

slide-32
SLIDE 32

Applying the algorithm to the partitioning

  • Goal: assign the resource to the task graph

satisfying time requirement and power requirement

  • Chromosome definition:

– For a task graph of N nodes, each node represents a task. – Every note will be assigned to H or F

  • r S representing ASIC, FPGA,

Software. – The chromosome will be a string of length N consisting H, F, S that represent a certain allocation of resources.

slide-33
SLIDE 33

Applying the algorithm to the partitioning

  • Fitness definition:

– Time: !

"#$% = ' '(

)*+,-. )/-0

' '(23434-,

5/-0

, Tmax is the slowest possible configuration experienced, Cmax is max communication delay – Power: !

678%9 = ' '(

:*+,-. :/-0

' '(

;3434-, ;/-0

, S is the switching power of interprocessor communication(switching to another resource) – Overall: !

7<%9=>> = ! "#$% ∗ ! 678%9

slide-34
SLIDE 34

Penalty

  • Coefficient that enforces the GA to find solutions

that satisfy the timing and power requirements.

  • For time: !

" = $

0 , ' < ) * ∗ ' − ) , ' ≥ ), D is deadline

  • For power: !

" = $

0, ! < !

./0123

4 ∗ ! − !

0/5 , ! ≥ ! 0/5

  • For utilization: !

6 = $

0, 78380/. = 0 78380/., 9:ℎ<=>?@<

  • Final fitness: A

B13CD = A 5E5CD ∗ " "FGH ∗ " "FGI

slide-35
SLIDE 35

The result

  • With timing + power + utilization:

– Capable of generating resource allocation

  • ptimizing for three goals simultaneously.
  • Potential problems:

– It is hard do assign estimated execution time for this GA – Classifying resources into ASIC, Software and FPGA can be too generic for actual application.