Speculative Plan Execution for Information Agents Greg Barish - - PowerPoint PPT Presentation

speculative plan execution for information agents
SMART_READER_LITE
LIVE PREVIEW

Speculative Plan Execution for Information Agents Greg Barish - - PowerPoint PPT Presentation

Speculative Plan Execution for Information Agents Greg Barish University of Southern California Information Sciences Institute Advisor : Professor Craig A. Knoblock 1 Outline 1. Review and motivating example 2. Speculative plan execution 3.


slide-1
SLIDE 1

1

Speculative Plan Execution for Information Agents

Greg Barish University of Southern California

Information Sciences Institute Advisor: Professor Craig A. Knoblock

slide-2
SLIDE 2

2

Outline

  • 1. Review and motivating example
  • 2. Speculative plan execution
  • 3. Value prediction for speculative execution
  • 4. Related work
  • 5. Summary
slide-3
SLIDE 3

3

MUL MUL ADD a b c d

Streaming dataflow model

  • Dataflow

– Operations scheduled by data availability

  • Independent operations execute in parallel
  • Maximizes horizontal parallelism

– Dataflow computers [Dennis 1974] [Arvind 1978] – Example: computing

  • Streaming

– Operations emit data as soon as possible

  • Independent data processed in parallel
  • Maximizes vertical parallelism

– Network query engines

[Ives et al. 1999] [Naughton et al. 2000] [Hellerstein et al. 2001]

Producer Consumer

(a*b) + (c*d)

MUL MUL ADD a b c d

slide-4
SLIDE 4

4

The CarInfo agent

  • 1. Locate cars that

meet criteria

  • Edmunds.com
  • 2. Filter out

Oldsmobiles

slide-5
SLIDE 5

5

The CarInfo agent

  • 1. Locate cars that

meet criteria

  • Edmunds.com
  • 2. Filter out

Oldsmobiles

  • 3. Gather safety

reviews for each

  • NHSTA.gov
slide-6
SLIDE 6

6

The CarInfo agent

  • 1. Locate cars that

meet criteria

  • Edmunds.com
  • 2. Filter out

Oldsmobiles

  • 3. Gather safety

reviews for each

  • NHSTA.gov
  • 4. Gather detailed

reviews of each

  • ConsumerGuide.com
slide-7
SLIDE 7

7

ConsumerGuide navigation

  • ConsumerGuide requires navigation from
  • riginal search results to desired answer
slide-8
SLIDE 8

8

CarInfo Agent Plan

1. Get list of cars from Edmunds.com that meet specified criteria. 2. Remove any Oldsmobiles from that list. 3. Get the search results for each of those cars from NHTSA.gov, extracting the safety ratings. 4. Get the search results for each car at CG.com, extracting the link to the summary page. 5. Get the summary page for each car, extracting the link to the full review. 6. Get the full review page for each car, extracting the review itself.

slide-9
SLIDE 9

9

Agent Execution Performance

  • Standard von Neumann model

– Execute one operation at a time – Each operation processes all of its input before

  • utput is used for next operation

– Assume: 1000ms per I/O op, 100ms per CPU op

  • Execution time = 13.4 sec

time (seconds)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Select Edmunds NHTSA CG Search CG Summary CG Full

CPU-bound operation I/O-bound operation

slide-10
SLIDE 10

10

Dataflow-style CarInfo agent plan

WRAPPER

ConsumerGuide Search

(Midsize coupe/hatchback, $4000 to $12000, 2002) ((http://cg.com/summ/20812.htm),

  • ther summary review URLs)

((http://cg.com/full/20812.htm),

  • ther full review URLs)

search criteria

WRAPPER

ConsumerGuide Summary

WRAPPER

ConsumerGuide Full Review

(car reviews)

WRAPPER

Edmunds Search

((Oldsmobile Alero), (Dodge Stratus), (Pontiac Grand Am), (Mercury Cougar))

JOIN SELECT

maker != "Oldsmobile"

WRAPPER

NHTSA Search

(safety reports)

((Dodge Stratus), (Pontiac Grand Am), (Mercury Cougar))

slide-11
SLIDE 11

11

Expressing the CarInfo agent plan

PLAN car-info { INPUT: criteria OUTPUT: reviews-and-ratings BODY { Wrapper ("Edmunds", criteria : cars) Select (cars, "maker != 'Oldsmobile'" : filtered-cars) Wrapper ("NHTSA", filtered-cars : safety-ratings) Wrapper ("CG Search", filtered-cars : summary-urls) Wrapper ("CG Summary", summary-urls : full-urls) Wrapper ("CG Full", full-urls : car-reviews) Join (safety-ratings, car-reviews, "l.make=r.make and l.model=r.model" : reviews-and-ratings) } }

slide-12
SLIDE 12

12

Streaming dataflow executor

Plan operators (e.g., Wrapper, Select, etc.)

Thread Pool

3 2 1

Plan Input Plan Output

(Midsize cpe/hatchbk, $4000 to $12000, 2002)

WRAPPER

Edmunds Search ((Oldsmobile Olero), (Dodge Stratus), (Pontiac Grand Am), (Mercury Cougar))

SELECT

maker != "Oldsmobile"

Example:

  • Thread pool architecture

– Enables bounded, dynamic parallelism

slide-13
SLIDE 13

13

Streaming dataflow performance

  • Improved, but plan remains I/O-bound (76%)
  • Main problem: remote source latencies

– Meanwhile, local resources are wasted

  • Complicating factor: binding constraints

– Remote queries dependent on other remote queries

  • Question: How can execution be more efficient?

time (seconds)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Select Edmunds CG Search CG Summary CG Full Join

slide-14
SLIDE 14

14

Speculative plan execution

  • Execute operators ahead of schedule

– Predict data based on past execution

  • Allows greater degree of parallelism

– Solves the problem caused by binding constraints

  • Can lead to speedups > streaming dataflow

time (seconds)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Select Edmunds CG Search CG Summary CG Full Join

GOAL

slide-15
SLIDE 15

15

Focus of this talk

  • An approach to speculative plan execution

– Safe & fair – Yields arbitrary speedups – Algorithm for the automatic transformation of agent plans

  • An approach to value prediction

– Combines caching, classification, and transduction – Better accuracy and space efficiency than strictly caching

slide-16
SLIDE 16

16

Outline

  • 1. Review and motivating example
  • 2. Speculative plan execution
  • 3. Value prediction for speculative execution
  • 4. Related work
  • 5. Summary
slide-17
SLIDE 17

17

How to speculate?

  • General problem

– Means for issuing and confirming predictions

  • Two new operators

– Speculate: Makes predictions based on "hints" – Confirm: Prevents errant results from exiting plan Speculate

answers hints confirmations predictions/additions

Confirm

confirmations probable results actual results

slide-18
SLIDE 18

18

J S W W W W W

BEFORE

How to speculate?

  • Example: CarInfo

– Make predictions about cars based on search criteria – Makes practical sense:

  • Same criteria will typically yield same cars
slide-19
SLIDE 19

19

AFTER

How to speculate?

  • Example: CarInfo

– Make predictions about cars based on search criteria – Makes practical sense:

  • Same criteria will typically yield same cars

J S W W Speculate

hints predictions/additions confirmations answers

W Confirm W W

slide-20
SLIDE 20

20

Detailed example

J S W W Speculate W Confirm W W

2002 Midsize coupe $4000-$12000

Time = 0.0 sec

slide-21
SLIDE 21

21

Issuing predictions

J S W W Speculate W Confirm W W

Oldsmobile Olero T1 Dodge Stratus T2 Pontiac Grand Am T3 Mercury Cougar T4

Time = 0.1 sec

slide-22
SLIDE 22

22

Speculative parallelism

J S W W Speculate W Confirm W W

Dodge Stratus T2 Pontiac Grand Am T3 Mercury Cougar T4

Time = 0.2 sec

slide-23
SLIDE 23

23

Answers to hints

J S W W Speculate W Confirm W W

Oldsmobile Olero Dodge Stratus Pontiac Grand Am Mercury Cougar

Time = 1.0 sec

slide-24
SLIDE 24

24

Continued processing

J S W W Speculate W Confirm W W

T1 T2 T3 T4

Time = 1.1 sec Additions (corrections), if any

slide-25
SLIDE 25

25

Generation of final results

J S W W Speculate W Confirm W W

Dodge Stratus (safety) (review) T2 Pontiac Grand Am (safety) (review) T3 Mercury Cougar (safety) (review) T4

Time = 4.2 sec

slide-26
SLIDE 26

26

Confirmation of results

J S W W Speculate W Confirm W W

Dodge Stratus (safety) (review) Pontiac Grand Am (safety) (review) Mercury Cougar (safety) (review)

Time = 4.3 sec

slide-27
SLIDE 27

27

In practice: how it works

  • Speculate generates speculative tuples
  • These tuples are run by a separate pool of

“speculative threads”

– These threads only execute operator methods on speculative tuples

  • Thus, the Speculate operator elicits more

agent run-time parallelism

– Greater thread-level parallelism (TLP) – Beyond the dataflow limit

slide-28
SLIDE 28

28

Safety and fairness

  • Safety

– Confirm operator

  • Fairness

– CPU

  • Speculative operations executed by "speculative threads"

– Lower priority threads

– Memory and bandwidth

  • Speculative operations allocate "speculative resources"

– Drawn from "speculative pool" of memory – Other solutions exist, such as RSVP (Zhang et al 1994)

slide-29
SLIDE 29

29

  • Cascading speculation

– Single speculation allows a max speedup of 2

  • Time spent either speculating or confirming

– Cascading speculation allows arbitrary speedups

  • Up to the length of the longest plan flow

Getting better speedups

W

a

W W

b c

W

d

W W

e f

W

g

W W

h i

W

j

W W W W W W W W W W S S S S S S S S S C

slide-30
SLIDE 30

30

Automatic plan transformation

  • One important step is determining the set
  • f candidate transformations
  • However:

– Determining this set is an expensive proposition – Assuming:

  • A candidate transformation can include one or more

speculations

  • A given speculation is consumed by one and only one
  • perator

– The # of possible transformations:

ST(n) = (n-1) + n*ST(n-1), ST(1) = 0

– A single flow of 10 consecutive operators has over 3 million possible speculative schedules!

slide-31
SLIDE 31

31

Automatic plan transformation

  • An alternative: leverage Amdahl's Law:

– Focus on most expensive path (MEP)

  • Basic algorithm
  • 1. Find MEP
  • 2. Find best candidate speculative plan transformation
  • 3. IF no candidate found, THEN exit
  • 4. Transform plan accordingly
  • 5. REPEAT

(anytime property)

  • The "best" candidate

– The one with the highest potential speedup

  • Algorithm assumes some addtl speculative overhead

– Function of the amount of data speculated about

slide-32
SLIDE 32

32

CarInfo revisited

  • Modified for speculative execution

– Leverage potential of cascading speculation

  • Optimistic performance

– Execution time: max {1.2, 1.4, 1.5, 1.6} = 1.6 sec – Speedup over streaming dataflow: (4.2/1.6) = 2.63

W J S W W SPEC CONF SPEC W W SPEC

slide-33
SLIDE 33

33

Example: TheaterLoc

  • INPUT

– City & state

  • OUTPUT

– Map of region annotated w/ theaters & restaurants

slide-34
SLIDE 34

34

Example: TheaterLoc

  • Original plan:
  • Modified for speculative execution:

WRAPPER

Yahoo Movies

city

UNION WRAPPER

Geocoder

WRAPPER

Dine.com

WRAPPER

U.S. CensusTiger Map

map

W SPEC CONFIRM U W W SPEC W

city map city map

slide-35
SLIDE 35

35

Example: The RepInfo Agent

  • INPUT

– Any street address 4767 Admiralty Way, Marina del Rey, CA, 90292

  • OUTPUT

– Federal reps

  • 2 senators,
  • 1 house member

– For each rep:

  • Recent news
  • Real-time funding

information

slide-36
SLIDE 36

36

RepInfo agent plan

Wrapper

OpenSecrets (member page)

Join

name

Select

senators, house reps

Wrapper

Vote-Smart address all officials senators & house reps graph URL recent news combined results

Wrapper

OpenSecrets (funding page) funding URL

Wrapper

Yahoo News

Wrapper

OpenSecrets (names page) member URL

4676 Admiralty Way Marina del Rey CA

George Bush Dick Cheney Barbara Boxer Dianne Feinstein Jane Harman James Hahn

Barbara Boxer Dianne Feinstein Jane Harman Boxer Anthrax investigation continues… Boxer Bay area politicans meet… Feinstein Bay area politicans meet… Harman Life in LA is just too sunny…

slide-37
SLIDE 37

37

Example: RepInfo

  • Original
  • Modified for speculative execution

W J S W W SPEC CONFIRM SPEC W W SPEC W SPEC

WRAPPER

Open Secrets Search

nine-digit zip code

WRAPPER

Open Secrets Info

WRAPPER

Open Secrets Funding

WRAPPER

Congress.org Search

JOIN SELECT

title = 'Rep' or 'Sen'

WRAPPER

Yahoo News

Rep info

WRAPPER

Congress.org Info

slide-38
SLIDE 38

38

Example: StockInfo

  • INPUT

– Company name

  • OUTPUT

– Chart comparing company stock vs competitor stock

slide-39
SLIDE 39

39

Example: StockInfo

  • Original plan
  • Modified for speculative execution

WRAPPER

Symbol Lookup company name

WRAPPER

Stock Info

WRAPPER

Profile

WRAPPER

Industry Info

WRAPPER

Sorted Industry

WRAPPER

Competitor Chart

WRAPPER

Compare Chart

W W W W W W W SPEC SPEC SPEC SPEC SPEC SPEC SPEC CONFIRM

slide-40
SLIDE 40

40

Web agent experiments

  • Time to first tuple
  • Time to last tuple

1000 2000 3000 4000 5000 6000 7000 8000 CarInfo RepInfo TheaterLoc FlightStatus StockInfo

Plan Time to last tuple (ms)

No speculation 100% correct 50% correct 0% correct

1000 2000 3000 4000 5000 6000 7000 8000 CarInfo RepInfo TheaterLoc FlightStatus StockInfo Plan Time to first tuple (ms) No speculation 100% correct 50% correct 0% correct

slide-41
SLIDE 41

41

Web agent experiments

0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50

CarInfo RepInfo TheaterLoc FlightStatus StockInfo

Plan Speedup

100% correct 50% correct 0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 CarInfo RepInfo TheaterLoc FlightStatus StockInfo

Plan Speedup

100% correct 50% correct

  • Time to first tuple
  • Time to last tuple
slide-42
SLIDE 42

42

Distributed database experiments

  • Basic idea

– Measure the utility of speculative execution for distributed database queries – Recall: basic query processing

  • Most commercial relational databases:

– parse SQL query build dataflow plan execute plan

  • TPC-H benchmark

– Transaction Processing Council (TPC):

  • Defines database benchmarking queries

– TPC-H

  • Adhoc business queries for an order-entry schema
slide-43
SLIDE 43

43

Distributed database experiments

  • Modeling

– TPC-H schema as a distributed database

Attr1 Attr2 Attr3

Entity A

Attr1 Attr2 Attr3 Attr4 Attr1 Attr2 Attr3

Entity C Entity B

Attr1 Attr2 Attr3

Entity A

Attr1 Attr2 Attr3 Attr4 Attr1 Attr2 Attr3

Entity C Entity B

network Host C Host A Host B

slide-44
SLIDE 44

44

Distributed database experiments

select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#45' and p_container = 'WRAP CAN' and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey ); SELECT STATEMENT () 1 SORT (AGGREGATE) 2 FILTER () 3 NESTED LOOPS () 4 TABLE ACCESS (FULL) LINEITEM 5 TABLE ACCESS (BY INDEX ROWID) PART 5 INDEX (UNIQUE SCAN) PART_PK 6 SORT (AGGREGATE) 4 TABLE ACCESS (FULL) LINEITEM 5

slide-45
SLIDE 45

45

Distributed database experiments

  • Experiment

– Ran 75% of TPC-H queries

  • Queries not run relied on operations that would

be time-consuming to support in Theseus

– Varied latency and database scale – Tested on recurring queries

2000ms 4000ms 6000ms 8000ms 10000ms Theoretical max

Speedup

1 2 3 4 5 6 7

Q2 Q3 Q4 Q5 Q7 Q8 Q9 Q10 Q12 Q16 Q17 Q19 Q20

TPC-H query

slide-46
SLIDE 46

46

Outline

  • 1. Review and motivating example
  • 2. Speculative plan execution
  • 3. Value prediction for speculative execution
  • 4. Related work
  • 5. Summary
slide-47
SLIDE 47

47

Value prediction

  • Better value prediction = better speedups
  • Prediction capability
  • Examples:

Edmunds car list from search criteria

2002 Midsize coupe 4K-12K Olds Alero, Dodge Stratus, Pontiac Grand Am, Mercury Cougar

ConsumerGuide full review URL from summary URL

http://cg.com/summary/20812.htm http://cg.com/full/20812.htm

Category Hint Prediction A Previously seen Previously seen B Never seen Previously seen C Never seen Never seen

H P 5K-12K ?

http://cg.com/summary/12345.htm ?

slide-48
SLIDE 48

48

Value prediction techniques

  • Caching

– Associate a hint with a predicted value

  • Classification

– Use features of a hint to predict value – EXAMPLE: Predicting car list from Edmunds

type = SUV: (Nissan Pathfinder, Ford Explorer) type = Midsize :...min <= 10000: (Olds Alero, Dodge Stratus) min > 10000: (Honda Accord, Toyota Camry)

Year Type Min Max Car list 2002 Midsize 8000 15000 (Oldmobile Alero, Dodge Stratus) 2002 Midsize 7500 14500 (Oldmobile Alero, Dodge Stratus) 2002 SUV 14000 20000 (Nissan Pathfinder, Ford Explorer) 2001 Midsize 11000 18000 (Honda Accord, Toyota Camry) 2002 SUV 18000 22000 (Nissan Pathfinder, Ford Explorer)

Cache Decision list

slide-49
SLIDE 49

49

1

"http://cg.com/summary/" : ε : COPY

3

"." :

2

ε : COPY

1 2 3 4 5

Value prediction techniques (cont'd)

  • Transduction

– Transducers are FSA that translate hint into prediction

http://cg.com/summary/20812.htm http://cg.com/full/20812.htm Part of the prediction is based on the hint: How do we extract & insert the dynamic part of the summary URL (e.g., 20812)?

slide-50
SLIDE 50

50

Value transducers

  • Synthesize predictions from hints
  • Identify predicted value "templates"

– Alternating seq of STATIC/DYNAMIC elements

  • Value transducers built from templates

– State transitions (arcs) = high-level operations:

  • INSERT, CACHE, CLASSIFY, TRANSDUCE

http://cg.com/summary/20812.htm http://cg.com/full/20812.htm

1

STATIC

2

DYNAMIC

3

STATIC

TRANSDUCE

Dodge Stratus http://cg.com/summary/20812.htm

CACHE or CLASSIFY

1

STATIC

2

DYNAMIC

3

STATIC

TRANSDUCE

slide-51
SLIDE 51

51

Learning value transducers

  • Identify STATIC/DYNAMIC template

– Find LCS for the set of predicted values, using technique based on (Hirschberg 1975)

  • For each STATIC element,

– Construct INSERT arc to next automata state

  • For each DYNAMIC element,

– Construct TRANSDUCE, CLASSIFY, or CACHE arc to next automata state

  • Prefer TRANSDUCE and CLASSIFY because

– Better predictive capability on average – Better space efficiency on average

slide-52
SLIDE 52

52

?

Detailed example: CarInfo URLs

http://cg.com/summary/20812.htm

ANSWERS: HINTS: TEMPLATE

http://cg.com/full/[DYNAMIC].htm

http://cg.com/summary/12345.htm http://cg.com/full/20812.htm

TRANSDUCE

http://cg.com/full/12345.htm

TRANSDUCE

slide-53
SLIDE 53

53

Experimental results

  • Better accuracy than strictly caching

Car-summary accuracy Rep-list accuracy Phone-state accuracy

0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00%

200 400 600 800 1000

Number of new examples Accuracy Predictor Average number of examples required Car-Full 3 Rep-Graph 8 Phone-Detail 3

Hint classification Hint transduction

slide-54
SLIDE 54

54

Experimental results

  • More space efficient than strictly caching

Hint classification

(CarInfo summary review URL)

Hint transduction

(CarInfo full review URL)

Number of examples Number of examples

0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 2 10 100 0.00% 20.00% 40.00% 60.00% 200 400 600 800 1000

Space savings (over caching)

slide-55
SLIDE 55

55

Effect on spec exec performance

  • CarInfo

1000 2000 3000 4000 5000 6000 7000 No spec (0) Spec (1-25) Spec (26-50) Spec (51-75) Spec (76-100) Spec (101-125)

Number of tuples seen Average agent execution time (ms)

First tuple Average tuple Last tuple

slide-56
SLIDE 56

56

Effect on spec exec performance

  • RepInfo

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

No spec (0) Spec (1-20) Spec (21-40) Spec (41-60) Spec (61-80) Number of tuples seen Average agent execution time (ms) First tuple Average tuple Last tuple

slide-57
SLIDE 57

57

Value prediction summary

  • Value prediction

– Important part of speculative plan execution – Better value prediction = better average speedups

  • Our approach: learn value transducers

– Construct predicted value based on past experience – Learn STATIC/DYNAMIC prediction template using LCS

  • Build value transducer based on template

– INSERT arc(s) corresponds to STATIC parts – TRANSDUCE, CLASSIFY, CACHE arc(s) correspond to DYNAMIC parts

slide-58
SLIDE 58

58

Outline

  • 1. Review and motivating example
  • 2. Speculative plan execution
  • 3. Value prediction for speculative execution
  • 4. Related work
  • 5. Summary
slide-59
SLIDE 59

59

Related Work

  • Speculative execution

– Approximate & partial query results

  • [Hellerstein et al. 1997] [Shanmugasundaram et al. 2000] [Raman and

Hellerstein 2001]

– Executing anticipated actions in advance

  • Continual computation [Horvitz 2001], time-critical decision

making [Greenwald and Dean 1994]

– Other types of speculative execution

  • File system prefetching [Chang and Gibson 1999], control

speculation in workflow processing [Hull et al. 2000]

– Network prefetching

slide-60
SLIDE 60

60

Related Work

  • Learning value predictors

– Predicting commands

  • Command line prediction [Davison and Hirsh 1998, 2001]
  • Assisted browsing [Lieberman 1995] [Joachims et al. 1997]

– Value prediction as speedup learning

  • [Fikes et al. 1972], [Mitchell 1983], [Minton 1988]

– Transducer learning

  • Provably correct transducers [Oncina et al. 1993]

– Issues: Requires many examples, generalization capability differs

  • Transducers for data extraction [Hsu and Chang 1999]

– URL prediction

  • [Zukerman et al. 1999], [Su et al. 2000]
slide-61
SLIDE 61

61

Outline

  • 1. Introduction and motivating example
  • 2. Speculative plan execution
  • 3. Value prediction for speculative execution
  • 4. Related work
  • 5. Summary
slide-62
SLIDE 62

62

Summary

  • An approach to speculative execution of

information agent plans

– Can yield arbitrary speedups – Safe, fair

  • Value prediction approach that combines

caching, classification, and transduction

– More accurate & space efficient than strictly caching

slide-63
SLIDE 63

63

Future work

  • Placement of the Confirm operator
  • Learning to compute speculative overhead
  • Exploring more value prediction strategies

– Example: Stride value prediction

  • Learning loop increments (e.g., [1,2,3], [2,4,6])
  • Similar to learning ["...page=1", "...page=2"] for URLs
  • Predictor compression

– Probabilistic classifiers

  • Speculative execution of other types of agents

– Example: Robot soccer agents

slide-64
SLIDE 64

64

A final aside… CPU evolution

  • For many wonderful years

– We have been happily writing von Neumann style programs – Compilers have been optimizing these programs

  • To extract as much dataflow parallelism as possible

– We run them on ever-more-powerful CPUs – They run fast

  • Speculative execution (branch prediction) yields greatest

profit, by far (Wall 1991)

  • But now…

– We’re maxing out – Deeper pipelines aren’t much help

slide-65
SLIDE 65

65

Changes in processor architecture

  • Limits of ILP

In-order scheduling with perfect memory (Intel Corporation Research Labs -- ~1998)

slide-66
SLIDE 66

66

Simultaneous Multithreading (SMT)

  • Reorganizing chip architecture so that:

– Multiple functional units can be used per cycle (horiz waste) – AND multiple threads can exist (vert waste) – AND multiple threads can execute per cycle (>1 PCs & maps) U nut i l i zed Thr ead 1 Thr ead 2 Thr ead 3 Thr ead 4 Thr ead 5 Superscalar Multithreaded SMT

Issue slots

cycle

slide-67
SLIDE 67

67

What does this all mean?

  • CPUs running multithreaded code faster
  • Theseus

– Streaming dataflow via multiple threads – One problem: I/O delays on some of these threads

  • Speculative execution

– Allows us to increase the degree of thread level parallelism – We can better utilize available resources

  • Greater TLP with SMT processors

– Even better efficiency with fewer processors

slide-68
SLIDE 68

68

Thank you

slide-69
SLIDE 69

69

Summary of results

  • Increased accuracy (recall)

– Classification-based predictors

  • Can make correct predictions more often than strictly

caching (some errors)

– Transduction-based predictors

  • Quickly up to 100%!
  • Space savings

– Classification-based predictors

  • Up to 40% over strictly caching, increasing with number
  • f examples

– Transduction-based predictors

  • Quickly up to 100%!