CS137: Things weve seen Electronic Design Automation Add two N-bit - - PDF document

cs137 things we ve seen electronic design automation
SMART_READER_LITE
LIVE PREVIEW

CS137: Things weve seen Electronic Design Automation Add two N-bit - - PDF document

CS137: Things weve seen Electronic Design Automation Add two N-bit numbers in O(log(N)) time on O(N) processors (gates) Sort N elements in O(log(N)) time on O(N) processors Day 13: February 8, 2006 Evaluate an FSM on N inputs in


slide-1
SLIDE 1

1

CALTECH CS137 Winter2006 -- DeHon

1

CS137: Electronic Design Automation

Day 13: February 8, 2006 NC

CALTECH CS137 Winter2006 -- DeHon

2

Things we’ve seen

  • Add two N-bit numbers in O(log(N)) time on

O(N) processors (gates)

  • Sort N elements in O(log(N)) time on O(N)

processors

  • Evaluate an FSM on N inputs in O(log(N))

time on O(N) processors

  • Find the I’th element in a collection of N items

in O(log2(N)) time on O(N) processors

  • Compute issuable instructions in O(log(N))

time with O(N) hardware

CALTECH CS137 Winter2006 -- DeHon

3

Complexity Class

  • What are the complexity classes for

parallelism?

  • Suggested not all tasks have perfect

area-time tradeoffs

  • How well can we parallelize problems?

– Differentiate things which parallelize well – …things that don’t parallelize so well

CALTECH CS137 Winter2006 -- DeHon

4

If we use enough space…

  • Exponential space: P=NP

– NTM runs in time f(N) – Use 2f(N) PEs – Each evaluates with a different choice sequence – Prefix on completion – Solve problem in f(N) time

  • Of course, ignores 3-space wire delays

CALTECH CS137 Winter2006 -- DeHon

5

  • So, we really want to know, how fast

something can be run with a “reasonable” number of processor (amount of hardware)

CALTECH CS137 Winter2006 -- DeHon

6

NC

  • Class of problems that can be:

– Computed in polylogarithimic time

  • Polynomial in logk(N)
  • E.g. 3log2(N)+2log(N)+234

– Using polynomial hardware

  • NC for Nick’s Class

– Named after Nick Pippenger

slide-2
SLIDE 2

2

CALTECH CS137 Winter2006 -- DeHon

7

All in NC

  • Can do

– Add two N-bit numbers in O(log(N)) time on O(N) processors (gates) – Sort N elements in O(log(N)) time on O(N) processors – Evaluate an FSM on N inputs in O(log(N)) time on O(N) processors – Find the I’th element in a collection of N items in O(log2(N)) time on O(N) processors – Compute issuable instructions in O(log(N)) time with O(N) hardware

CALTECH CS137 Winter2006 -- DeHon

8

Open Question

  • NC ?= P
  • Are all Polynomial Time algorithms

computable in parallel

– Polylog time – Polynomial processors

  • Suspected they are not

– More at end

CALTECH CS137 Winter2006 -- DeHon

9

Transitive Closure

  • Given a Graph: G=(V,E)
  • Compute G*=(V,E*)
  • E* contains an edge e=(Vi,Vj)

– Iff there is a path from Vi to Vj in G

  • Transitive Closure ∈ NC

CALTECH CS137 Winter2006 -- DeHon

10

Basic Sequential Algorithm

  • N=|V|
  • Think of M=N×N connectivity matrix for G
  • M2=G2
  • M2[i,j]=OR(all k)(M[i,k] & M[k,j])
  • M2n[i,j]=OR(all k)(Mn[i,k] & Mn[k,j])
  • MN represents GN=G*
  • Compute in log steps

– O(N3log(N))

CALTECH CS137 Winter2006 -- DeHon

11

Parallel Algorithm

  • Use N3 processor
  • N processors per element Mn[i,j]
  • N2 processors to compute all elements of Mn
  • Group of N processors for Mn[i,j] perform an

associative reduce O(log(N)) time

  • Still takes log(N) steps to compute MN
  • O(log2(N)) with N3 processors in NC

– [this construct may be weak?]

CALTECH CS137 Winter2006 -- DeHon

12

All Pairs Shortest Paths

  • Given a Graph: G=(V,E)

– Edge weight on each edge e∈E

  • Compute G’=(V,E’)
  • E’ contains an edge e’=(Vi,Vj)

– Iff there is a path from Vi to Vj in G – Edge weight is shortest path from Vi to Vj in G

  • All Pairs Shortest Path ∈ NC

– Slight modification on transitive closure

slide-3
SLIDE 3

3

CALTECH CS137 Winter2006 -- DeHon

13

Basic Sequential Algorithm

  • As before

– N=|V| – Think of M=N×N connectivity matrix for G – M2=G2

  • Change

– OR to MIN – & to +

  • So

– M2[i,j]=OR(all k)(M[i,k] & M[k,j]) – Becomes: M2[i,j]=MIN(all k)(M[i,k] + M[k,j])

  • MN represents GN=G’

CALTECH CS137 Winter2006 -- DeHon

14

(Same) Parallel Algorithm

  • Use N3 processor
  • N processors per element Mn[i,j]
  • N2 processors to compute all elements of Mn
  • Group of N processors for Mn[i,j] perform an

associative reduce O(log(N)) time

  • Still takes log(N) steps to compute MN
  • O(log2(N)) with N3 processors in NC

– [this construct may be weak?]

CALTECH CS137 Winter2006 -- DeHon

15

NL

  • Complexity class

– Computations that can be computed using logarithmic space on a Non-Deterministic Turing Machine

  • Similarly L

– logspace on Deterministic TM – Addition ∈ L

  • Certainly: L⊆NL

CALTECH CS137 Winter2006 -- DeHon

16

NL ⊆ NC

  • Theorem from Borodin:

– If A is accepted by a NDTM using space S(n)≥log2(n), – then there is a d>0 such that: DEPTHA(n)≤d×S(n)2. – [Depth here = circuit depth = time]

  • For NL

– S(n)=log2(n) Depth(n)≤d×log2(n)

CALTECH CS137 Winter2006 -- DeHon

17

Borodin Construction (Idea)

  • State is bounded
  • Can construct the graph of all states

– This will only take polynomial hardware

  • Compute transitive closure on graph

– O(log2(N))

  • Use associative reduce to extract

solution

– O(log(N))

CALTECH CS137 Winter2006 -- DeHon

18

Borodin States

  • What states can the NDTM be in?

– At most sS(N) values on tape

  • s=size of symbol set

– Head of TM at most S(N) positions – q states for FSM – N locations for input tape head

  • Total: states=N×q×S(N)×sS(N)

– For S(N)=log(N)

  • N×q×log(N) ×slog(N)=qN(log(s)+1)log(N)

– Number of states polynomial in N

slide-4
SLIDE 4

4

CALTECH CS137 Winter2006 -- DeHon

19

Build Graph

  • Construct graph |V|=# states
  • M[i,j]=1 iff move from configuration i to j
  • If Vi is a state that corresponds to the input

head being on square k

– M[i,j] “enabled” iff move from i to j only when kth input is 1 and inputs is 1. – M[i,j] “enabled” iff move from i to j only when kth input is 0 and input is 0. – Can just be a set of AND’s initially setting up the initial connectivity matrix M

CALTECH CS137 Winter2006 -- DeHon

20

Transitive Closure

  • Transitive Closure with O(|V|3) PEs

– Still polynomial in N – |V|=N×q×log(N) ×slog(N)=qN(log(s)+1)log(N) – O(|V|3) ⊆ O(N3(log(s)+2))

  • In log2(N) time

– O(log2(|V|)) ⊆ O( [log(N (log(s)+2))]2) – O([log(s)+2]2×log2(N))=O(log2(N))

CALTECH CS137 Winter2006 -- DeHon

21

Extract Result

  • OR reduce on Reachable states

– Can reach an accepting state for TM?

  • Therefore: NL ⊆ NC

CALTECH CS137 Winter2006 -- DeHon

22

Converse Holds

  • Borodin

– If A is in DEPTH((S(n)) for S(n)≥log(n) – Then A is in DSPACE(S(n)) – Recursive evaluation of gate value

  • w/ compact stack representation
  • Specialized for S(n)=log(n)

– If A is in NC, then A is in L – NC ⊆ L

  • Know L⊆NL … just showed NL⊆NC
  • NL = NC

CALTECH CS137 Winter2006 -- DeHon

23

Context Free Languages

  • Can recognize all context free

languages in NC

  • PDA ⊆ NC

CALTECH CS137 Winter2006 -- DeHon

24

P-Complete

  • There are languages that are P-Complete

– i.e. if could show these were in NC – Then would show NC=P

  • E.g. TM simulation
slide-5
SLIDE 5

5

CALTECH CS137 Winter2006 -- DeHon

25

Complexity Roundup

  • In NC

– FA – PDA – L – NL

  • Unknown:

– P=NC (P=NL)

CALTECH CS137 Winter2006 -- DeHon

26

Physical Realism

  • All rely on reductions in log(N) time
  • With 3D space, speed of light

– …there are no log(N) time reductions

  • Maybe notion of 3-space parallelizable?

– Run in O(N1/3) time – O(N) processors

  • Cannot talk to more than O(N) in O(N1/3) time

CALTECH CS137 Winter2006 -- DeHon

27

Admin

  • Friday/Monday:??
  • Q: requests – what’s missing?
  • Project: two things due end of next week

– Sequential implementation – Proposed plan of attack