load value approximation
play

Load Value Approximation Joshua San Miguel Mario Badr Natalie - PowerPoint PPT Presentation

Load Value Approximation Joshua San Miguel Mario Badr Natalie Enright Jerger Accessing Memory main memory shared caches, directory, network-on-chip L1 cache processor core 2 Accessing Memory main memory shared caches, directory,


  1. Load Value Approximation Joshua San Miguel Mario Badr Natalie Enright Jerger

  2. Accessing Memory main memory shared caches, directory, network-on-chip L1 cache processor core 2

  3. Accessing Memory main memory shared caches, directory, network-on-chip miss L1 cache processor core 3

  4. Accessing Memory main memory Accessing memory is 10x – 100x greater latency and energy than accessing L1 cache! shared caches, directory, network-on-chip miss L1 cache processor core 4

  5. Accessing Memory main memory Accessing memory is 10x – 100x greater latency and energy than accessing L1 cache! shared caches, directory, network-on-chip Higher efficiency via Approximate Computing… miss L1 cache processor core 5

  6. Approximate Computing Not all computations need to be precise. Data mining Computer vision Audio and video processing http://www.zentut.com/ http://www.cc.gatech.edu/~cnieto6/ http://themusicparlour.blogspot.ca/ Gaming Machine learning Dynamical simulation http://www.businessweek.com/ http://www.analyticbridge.com/ http://www.scientific-computing.com/ 6

  7. Approximate Computing execution time energy 7

  8. Approximate Computing execution time energy error 8

  9. Approximate Computing execution time energy error 9

  10. Approximate Computing Many applications can tolerate approximate data.  40% to nearly 100% of data footprint is approximate [Sampson, MICRO 2013]. 10

  11. Approximate Computing Many applications can tolerate approximate data.  40% to nearly 100% of data footprint is approximate [Sampson, MICRO 2013]. Approximate value locality:  Many data values are similar to or can be approximated from previously seen values. 11

  12. Outline • Load Value Approximation • Non-Speculative Operation • Approximator Design • Relaxed Confidence Windows • Approximation Degree • Methodology • Evaluation 12

  13. Load Value Approximation main memory shared caches, directory, network-on-chip L1 cache processor core 13

  14. Load Value Approximation main memory shared caches, directory, network-on-chip L1 cache approximator processor core 14

  15. Load Value Approximation main memory shared caches, directory, network-on-chip load miss A L1 cache approximator A? processor core 15

  16. Load Value Approximation main memory shared caches, directory, network-on-chip L1 cache approximator generate A_approx A? processor core 16

  17. Load Value Approximation main memory shared caches, directory, network-on-chip L1 cache approximator A_approx processor core 17

  18. Load Value Approximation main memory No speculation, no rollbacks. shared caches, directory, network-on-chip L1 cache approximator A_approx processor core 18

  19. Load Value Approximation main memory shared caches, directory, network-on-chip L1 cache approximator A_approx processor core 19

  20. Load Value Approximation main memory fetch A_actual shared caches, directory, network-on-chip L1 cache approximator A_approx processor core 20

  21. Load Value Approximation main memory shared caches, directory, network-on-chip L1 cache approximator train with A_actual A_approx processor core 21

  22. Load Value Approximation main memory Learns past values. Estimates future values. Improves performance and saves energy. shared caches, directory, network-on-chip L1 cache approximator A_approx processor core 22

  23. Approximator Design approximator table tag conf degree LHB global history buffer instruction ℎ , address local history buffer 𝑔 23

  24. Approximator Design time approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address local history buffer 𝑔 4.1 3.9 4.0 24

  25. Approximator Design load miss A time approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address local history buffer 𝑔 4.1 3.9 4.0 25

  26. Approximator Design load miss A time approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address PC ⊕ 1.0 ⊕ 2.2 ⊕ 3.1 local history buffer 𝑔 4.1 3.9 4.0 26

  27. Approximator Design load miss A time approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address PC ⊕ 1.0 ⊕ 2.2 ⊕ 3.1 local history buffer 𝑔 4.1 3.9 4.0 (4.1 + 3.9 + 4.0) / 3 A_approx = 4.0 27

  28. Approximator Design load miss A do_work(A_approx) time approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address local history buffer 𝑔 4.1 3.9 4.0 28

  29. Approximator Design load miss A do_work(A_approx) time request(A_actual) approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address local history buffer 𝑔 4.1 3.9 4.0 29

  30. Approximator Design load miss A do_work(A_approx) time request(A_actual) A_actual = 4.2 approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address local history buffer 𝑔 4.1 3.9 4.0 30

  31. Approximator Design load miss A do_work(A_approx) time request(A_actual) A_actual = 4.2 approximator table tag conf degree LHB global history buffer instruction ℎ , 2.2 3.1 4.2 address local history buffer 𝑔 3.9 4.0 4.2 31

  32. Approximator Design – Other Considerations • Floating-point precision • History buffer sizes • Stale values More details in paper. 32

  33. Approximator Design Relaxed Confidence Windows  How do we avoid making bad approximations?  Trade-off performance and error. Approximation Degree  Do we need to fetch the actual value from memory every time?  Trade-off energy and error. 33

  34. Relaxed Confidence Windows load miss A do_work(A_approx) time request(A_actual) approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address local history buffer 𝑔 4.1 3.9 4.0 A_approx = 4.0 34

  35. Relaxed Confidence Windows load miss A do_work(A_approx) time request(A_actual) A_actual = 9.0! approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address local history buffer 𝑔 4.1 3.9 4.0 A_approx = 4.0 35

  36. Relaxed Confidence Windows tag conf degree LHB When approximating: if conf >= 0: use A_approx else: don’t use A_approx When updating: if A_approx , A_actual differ by <= CONF_WINDOW% : conf ++ else: conf- - 36

  37. Relaxed Confidence Windows – Output Error Varying CONF_WINDOW %: 0% 5% 10% 20% infinite 100% 80% output error 60% 40% 20% 0% 37

  38. Relaxed Confidence Windows – L1-D MPKI Varying CONF_WINDOW %: 1.0 normalized L1-D MPKI 0.8 0.6 0.4 0.2 0.0 0% 5% 10% 20% infinite CONF_WINDOW% 38

  39. Approximator Design Relaxed Confidence Windows  How do we avoid making bad approximations?  Trade-off performance and error. Approximation Degree  Do we need to fetch the actual value from memory every time?  Trade-off energy and error. 39

  40. Approximation Degree load miss A do_work(A_approx) time request(A_actual) approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address local history buffer 𝑔 4.1 3.9 4.0 A_approx = 4.0 40

  41. Approximation Degree load miss A do_work(A_approx) time request(A_actual) A_actual = 4.0 approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address local history buffer 𝑔 4.1 3.9 4.0 A_approx = 4.0 41

  42. Approximation Degree load miss A do_work(A_approx) time request(A_actual) A_actual = 4.0 approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address local history buffer 𝑔 4.1 3.9 4.0 A_approx = 4.0 42

  43. Approximation Degree tag conf degree LHB When approximating: if degree == APPROX_DEGREE : fetch A_actual else: don’t fetch A_actual When updating: if degree == APPROX_DEGREE : degree = 0 else: degree ++ 43

  44. Approximation Degree – Output Error Varying APPROX_DEGREE : 0 1 2 4 8 16 100% 80% output error 60% 40% 20% 0% 44

  45. Approximation Degree – L1-D Fetches Varying APPROX_DEGREE : 1 normalized L1-D fetches 0.8 0.6 0.4 0.2 0 0 1 2 4 8 16 APPROX_DEGREE 45

  46. Methodology Multi-threaded approximate applications  PARSEC benchmark suite [Bienia, Princeton 2011]  Programmer annotations and ISA extensions [Esmaeilzadeh, ASPLOS 2012] Approximator design space exploration  Pin dynamic binary instrumentation tool [Luk, PLDI 2005] Full-system simulation  FeS2 cycle-level x86 simulator [Neelakantam, ASPLOS 2008] Approximator, cache and memory energy consumption  CACTI modeling tool [Thoziyoor, HP 2008] 46

  47. Evaluation application speedup energy savings 16% 14% 12% 10% 8% 6% 4% 2% 0% 0 4 16 APPROX_DEGREE 47

  48. Evaluation application speedup energy savings 16% Up to 28% speedup 14% 12% 10% 8% 6% 4% 2% 0% 0 4 16 APPROX_DEGREE 48

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend