Liquid Architecture Microarchitecture Optimization for Embedded - - PowerPoint PPT Presentation

liquid architecture
SMART_READER_LITE
LIVE PREVIEW

Liquid Architecture Microarchitecture Optimization for Embedded - - PowerPoint PPT Presentation

Liquid Architecture Microarchitecture Optimization for Embedded Systems D. Schuehler, B. Brodie, R. Chamberlain, R. Cytron, S. Friedman, J. Fritts, P. Jones, P. Krishnamurthy, J. Lockwood, S. Padmanabhan, and H. Zhang Dept. of Computer Science


slide-1
SLIDE 1

Liquid Architecture

  • D. Schuehler, B. Brodie, R. Chamberlain, R. Cytron,
  • S. Friedman, J. Fritts, P. Jones, P. Krishnamurthy,
  • J. Lockwood, S. Padmanabhan, and H. Zhang
  • Dept. of Computer Science and Engineering

Washington University in St. Louis Supported by NSF ITR-0313203

Microarchitecture Optimization for Embedded Systems

slide-2
SLIDE 2

Liquid Architecture

  • Configurable architecture that can adapt

to needs of particular application

  • E.g., within an FPGA

– Soft-core processors

  • E.g., as an embedded processor

– Tensilica supports configuration at fab time – Stretch support configuration at run time

  • Today’s discussion is on performance

analysis and configuration choice

slide-3
SLIDE 3

Block Diagram

` ` ` ` Layered Internet Protocol Wrappers Control Packet Processor External Memory AHB APB I-Cache D-Cache UART UART LED LED Adapter Adapter Boot Rom

FPGA

Statistics Module Network Interface

FPX

Event Bus Memory Controller LEON SPARC- compatible processor

slide-4
SLIDE 4

Microarchitecture Configurability

  • Instruction set
  • Memory subsystem

– Cache size (I and D) – Associativity – Cache line size

  • Co-processor(s)
  • Instruction pipeline
  • Full HDL source is available
slide-5
SLIDE 5

Design Flow

Write and compile embedded SPARC application with GCC

Internet

Reconfigure FPX hardware via Internet and upload system software. Identify configuration for candidate architecture Execute program

  • n FPX Platform

and measure run- time performance

slide-6
SLIDE 6

Cycle-accurate profiling

Method Time / Cycles

.text main addQuery findMatch computeKey computeBase computeStep fillQuery Rnd

  • Choose methods to profile from the

user interface

slide-7
SLIDE 7

Method Address Range

.text main addQuery findMatch computeKey computeBase computeStep fillQuery Rnd 0x400003EF

Hi

0x4000027C

Lo

slide-8
SLIDE 8

Method

.text main addQuery findMatch computeKey computeBase computeStep fillQuery Rnd 0x400003EF

Hi

0x4000027C

Lo

0x4000035A Statistics Module

PC CLK

Event Bus

slide-9
SLIDE 9

Function

.text main addQuery findMatch computeKey computeBase computeStep fillQuery Rnd 0x400003EF

Hi

0x4000027C

Lo

0x4000035A

≤ ≤

Counter

Statistics Module

PC CLK

Event Bus

INCR

slide-10
SLIDE 10

Function

.text main addQuery findMatch computeKey computeBase computeStep fillQuery Rnd 0x400003EF

Hi

0x4000027C

Lo

0x4000035A

≤ ≤

Counter PC CLK

0x4000061F

Hi

0x400005D8

Lo

0x4000035A

≤ ≤

Counter

Statistics Module Event Bus

INCR INCR

slide-11
SLIDE 11

0x400003EF

Hi

0x4000027C

Lo

0x4000035A

≤ ≤

Counter PC CLK

0x4000061F

Hi

0x400005D8

Lo

0x4000035A

≤ ≤

Counter

Statistics Module Event Bus

To User

INCR INCR

slide-12
SLIDE 12

Where is time spent?

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 128K 32K

Size of hash table ( Bytes) % of total runtim e

Rest coreLoop findMatch

BLASTN biosequence search application

slide-13
SLIDE 13

Function Time / Cycles

.text main addQuery findMatch computeKey computeBase computeStep fillQuery Rnd

Cache Hits / Misses Read Write

Expand to measure cache hits/misses

slide-14
SLIDE 14

Measure Several Configurations

slide-15
SLIDE 15

Impact of D-cache Configuration

BLASTN biosequence search application

86 88 90 92 94 96 98 100 128K, 1Kx1 128K, 32Kx1 128K, 16Kx2 32K, 1Kx1 32K, 32Kx1 32K, 16Kx2

Size of hash table, D-cache configuration hit rate (%)

Total findMatch coreLoop

slide-16
SLIDE 16

Impact of I-cache Configuration

BLASTN biosequence search application

5 10 15 20 25 30 35 128K 32K

BLASTN hash table sizes ( Bytes) Run tim e ( secs)

1KB I-Cache 4KB I-Cache

slide-17
SLIDE 17

Function Time / Cycles Cache Hits / Misses Read Write

.text main addQuery findMatch computeKey computeBase computeStep fillQuery Rnd

Pipeline Stalls Branch Predict

slide-18
SLIDE 18

Time for Single Run

Almost 2

  • rders of

magnitude faster than simulation

80000 1800 1 10 100 1000 10000 100000 SimpleScalar 3.0 LEON Time (sec)

slide-19
SLIDE 19

Implications of Slow Simulation

  • Focus has historically been on measuring

the performance of a single thread of a single application

  • Real apps are often executed in a

multitasking environment

– Impacts cache behavior – Ignores OS (system call) performance

  • Liquid architecture system enables direct

measurement, including OS

slide-20
SLIDE 20

OS Boot Sequence

slide-21
SLIDE 21

Summary

  • Run-time reconfigurable processors will be

available sooner rather than later

  • Determining desired configuration is a

difficult design task

– Large search space – Depends on accurate performance data

  • Liquid architecture system enables direct

measurement of performance properties

slide-22
SLIDE 22

Current and Future Work

  • Evaluation of several arch. design ideas
  • Automated search of the design space
  • Characterizing performance analysis

methods

– Analytic models – Simulation models – Direct execution models

  • Usable as is for evaluating soft-core procs
  • Like to extend to higher-speed procs