A Framework for Automatic Generation A Framework for Automatic - - PowerPoint PPT Presentation

a framework for automatic generation a framework for
SMART_READER_LITE
LIVE PREVIEW

A Framework for Automatic Generation A Framework for Automatic - - PowerPoint PPT Presentation

A Framework for Automatic Generation A Framework for Automatic Generation of Configuration Files for a Custom of Configuration Files for a Custom Hardware/Software RTOS Hardware/Software RTOS Jaehwan Lee* Lee* Jaehwan Kyeong Keol Keol Ryu


slide-1
SLIDE 1

A Framework for Automatic Generation A Framework for Automatic Generation

  • f Configuration Files for a Custom
  • f Configuration Files for a Custom

Hardware/Software RTOS Hardware/Software RTOS

Jaehwan Jaehwan Lee* Lee* Kyeong Kyeong Keol Keol Ryu Ryu* * Vincent J. Mooney III Vincent J. Mooney III+

+

{ {jaehwan jaehwan, , kkryu kkryu, , mooney}@ece.gatech.edu mooney}@ece.gatech.edu http:// http://codesign.ece.gatech.edu codesign.ece.gatech.edu

+ +Assistant Professor, *School of Electrical and Computer Engineer

Assistant Professor, *School of Electrical and Computer Engineering ing

+ +Adjunct Assistant Professor, College of Computing

Adjunct Assistant Professor, College of Computing

Georgia Institute of Technology Georgia Institute of Technology

26 June 2002 at ERSA 26 June 2002 at ERSA HW/SW RTOS Project of the HW/SW HW/SW RTOS Project of the HW/SW Codesign Codesign Group at GT Group at GT

slide-2
SLIDE 2

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 2 2

Outline Outline

Introduction Goals Motivation Methodology Experimental Results Conclusion

slide-3
SLIDE 3

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 3 3

Introduction Introduction

Hardware RTOS library Makefile User.h Verilog File(s) Software RTOS library GUI tool Base Architecture library

Specify custom HW/SW RTOS in a graphical user interface (GUI) Generate configuration files used to make a custom RTOS

  • A custom RTOS may contain HW

(as well as SW) components

Compile both hardware and software with an application Simulate the system to evaluate the result

User input

slide-4
SLIDE 4

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 4 4

Goals Goals

To help the user examine which configuration is most suitable for the user’s specific applications To help the user explore the RTOS design space after chip fabrication as well as before chip fabrication To help the user examine different system-on-a- chip (SoC) architectures subject to a custom RTOS

slide-5
SLIDE 5

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 5 5

Motivation (1/5) Motivation (1/5)

HW/SW RTOS partitioning approach Three previous innovations in HW/SW RTOS components

  • SoCLC: System-on-a-Chip Lock Cache
  • SoCDMMU: System-on-a-Chip Dynamic Memory

Management Unit

  • SoCDDU: System-on-a-Chip Deadlock Detection Unit
slide-6
SLIDE 6

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 6 6

Motivation (2/5) Motivation (2/5)

System-on-a-Chip Lock Cache

  • A hardware mechanism that resolves the critical section

(CS) interactions among PEs

  • Lock variables are moved into a separate “lock cache”
  • utside of the memory
  • Improving the performance criteria in terms of lock

latency, lock delay and bandwidth consumption

slide-7
SLIDE 7

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 7 7

Motivation (3/5) Motivation (3/5)

SoCDMMU: System-on-a-Chip Dynamic Memory Management Unit

  • Provides fast, deterministic and yet dynamic

memory management of a global on-chip memory

  • Achieves flexible, efficient memory utilization
  • Provides APIs for applications
slide-8
SLIDE 8

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 8 8

Motivation (4/5) Motivation (4/5)

SoCDDU: System-on-a-Chip Deadlock Detection Unit

  • Performs a novel parallel hardware deadlock detection based
  • n implementing deadlock searches on the resource allocation

matrix in hardware

  • Provides a very fast deadlock detection at run-time with

dedicated hardware performing simple bit-wise boolean

  • perations
  • Reduces deadlock detection time by 99% as compared to

software

  • Requires at most O(2*min(m,n)) iterations as opposed to

O(m*n) required by all previously reported (sequential) software algorithms

slide-9
SLIDE 9

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 9 9

Motivation (5/5) Motivation (5/5)

Constraints about using three previous innovations

  • Perhaps not enough chip space for all three of them
  • All of them may not be necessary

⇒ Our framework

  • Enables automatic generation of different mixes of the

three previous innovations for different versions of a HW/SW RTOS

  • Can be generalized to instantiate additional HW or SW

RTOS components

slide-10
SLIDE 10

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 10 10

Methodology (1/2) Methodology (1/2)

Hardware RTOS library Makefile User.h Verilog File Software RTOS library GUI tool Base Architecture library

Translates the user choices into a custom RTOS

  • Given the IP library of processors

and HW/SW RTOS components

Generates configuration files to glue together a custom RTOS executable in the Seamless Co- Verification Environment from Mentor Graphics

  • Makefile and User.h for SW link
  • Verilog header file for HW glue

User input

slide-11
SLIDE 11

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 11 11

Methodology (2/2) Methodology (2/2)

Hardware RTOS library Makefile User.h Verilog File Software RTOS library GUI tool Base Architecture library

Explores the HW/SW RTOS design space defined by the available HW/SW RTOS components easily

SW Compile and Link HW Compile Application User input Result and Feedback HW/SW Co- Simulation

slide-12
SLIDE 12

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 12 12

Our RTOS and Possible Target Our RTOS and Possible Target SoC SoC

A multiprocessor System-on-a-Chip (Base architecture) A multiprocessor RTOS Application(s) running on the SoC using the RTOS APIs

H/W S/W RTOS

slide-13
SLIDE 13

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 13 13

Our RTOS in Detail Our RTOS in Detail

Atalanta software RTOS

  • A multiprocessor SoC RTOS

The RTOS and device drivers are loaded into the L2 cache memory

  • All Processing Elements (PEs)

share the kernel code and data structures Hardware RTOS components are

downloaded into the reconfigurable logic

H/W S/W RTOS

slide-14
SLIDE 14

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 14 14

Selectable RTOS IP components Selectable RTOS IP components

Software (Atalanta RTOS)

  • Inter-Process Communication (IPC) components

(semaphore, queue, event, mailbox, etc)

  • CPU schedulers (priority, round-robin)
  • Memory management module (gmm)
  • Deadlock detection module (ddm)

Hardware

  • SoC Lock Cache for fast IPC (SoCLC)
  • Dynamic Memory Management Unit (SoCDMMU)
  • Deadlock Detection Unit (SoCDDU)
slide-15
SLIDE 15

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 15 15

Implementation (1/8) Implementation (1/8)

SW module linking method IPC module linking method HW integration method SW may over- ride task size

slide-16
SLIDE 16

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 16 16

Implementation (2/8) Implementation (2/8)

The user

  • Selects

Deadlock detection SW module Semaphores for synchronization SoCLC for critical section

  • Clicks Generate button

The tool

  • Generates

Makefile & User.h Verilog header file

with Example Use of GUI Tool

slide-17
SLIDE 17

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 17 17

Implementation (3/8) Implementation (3/8)

1) Software module linking method

  • Command-line object inclusion method (well-known)
  • Used for the same function but different implementations
  • Implemented in Makefile
  • Used for linking the deadlock detection SW module in the example
slide-18
SLIDE 18

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 18 18

Implementation (4/8) Implementation (4/8)

1) Software module linking method (cont’d)

ddm.o gmm.o ddutest.x Makefile Linking Stage Software component selection OTHER_OBJS = ddutest.o … OPT_OBJ1 = ddm.o OPT_OBJ2 = (blank)

GUI Tool $(LD) –o $@ $(OTHER_OBJS) $(OPT_OBJ1) $(OPT_OBJ2) $(LIBRARY) to gcc –o ddutest.x [all other objects including ddutest.o] ddm.o atalanta.a

X Making Stage

slide-19
SLIDE 19

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 19 19

Implementation (5/8) Implementation (5/8)

2) Inter-process communication module linking method

  • Library function linking method (common)
  • Implemented in User.h
  • Used for the semaphore function in the example
slide-20
SLIDE 20

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 20 20

Implementation (6/8) Implementation (6/8)

2) IPC module linking method (cont’d)

Making Stage user.c user.h user.i

Semaphore functions Queue functions Mailbox functions ddutest.x Generated Configuration

Linking Stage Library (Atalanta.a)

… Event functions Other functions

Selection flow of IPC methods

Pre-processing included

GUI Tool

application user.o #define semaphores TRUE

ddutest.c ddutest.o

slide-21
SLIDE 21

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 21 21

Implementation (7/8) Implementation (7/8)

3) HW RTOS component integration method

Novel HW integration method Construct a Verilog header file Integrate user-selected HW RTOS components into the Base architecture Start with an SoCLC architecture description (an example with SoCLC)

SoCLC Reconfig. Logic Memory controller and memory Arbiter, Intr. controller, Clock MPC750-2 L1 MPC750-1 L1 MPC750-3 L1 MPC750-4 L1

slide-22
SLIDE 22

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 22 22

Implementation (8/8) Implementation (8/8)

Verilog header file generation example

module PE ~~~ endmodule IP Library

Start with SoCLC description

module PE ~~~ endmodule module clock ~~~ endmodule (i) Extract module soclc ~~~ endmodule

Extract modules

(ii) Add wires module PE ~~~ endmodule module clock ~~~ endmodule module soclc ~~~ endmodule wires and signals

Add wires

(iii) Instan- tiation code PEs 1,2,3,… Memory 1,2,… SoCLC Arbiter Clock

Insert the instantiation code for each module

slide-23
SLIDE 23

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 23 23

Experimental Setup Experimental Setup

Five custom RTOSes

  • With semaphores and spin-locks, no

HW components in the RTOS

  • With SoCLC, no SW IPCs
  • With deadlock detection software, no

HW RTOS components

  • With SoCDDU, no SW IPCs
  • With SoCLC and SoCDDU

Each with the Base architecture Each with application(s) Each executable in Seamless CVE

4 MPC750 processors Reconfigurable logic Single bus

RTOS1 SW RTOS w/ sem

Hardware RTOS library Software RTOS library GUI tool

SW RTOS + SoCLC SW RTOS w/ ddm SW RTOS + SoCDDU SW RTOS + SoCLC, SoCDDU Compile Stage for each system Application Executable HW file for each Executable SW file for each Simulation in Seamless CVE

User Input Base Architecture library

VCS XRAY RTOS2 RTOS3 RTOS4 RTOS5

slide-24
SLIDE 24

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 24 24

Experimental Results (1/4) Experimental Results (1/4)

Example 1: Database transaction application [1]

[1] M. A. Olson, “Selecting and implementing an embedded database system,” IEEE Computer, pp.27-34, September 2000.

long_Req1 Access of Object O2 by transaction1 transaction1 transaction2 transaction3

O4

transaction4 short_Req4 short_Req3

O2 O3

long_Req3

O4 O2

Access of Object O4 by transaction3

Server Client

slide-25
SLIDE 25

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 25 25

Experimental Results (2/4) Experimental Results (2/4)

Comparison with database application example [2]

  • RTOS1 with semaphores and spin-locks
  • RTOS2 with SoCLC, no SW semaphores or spin-locks

(clock cycles) * Without SoCLC With SoCLC Speedup Lock Latency 1200 908 1.32x Lock Delay 47264 23590 2.00x Execution Time 36.9M 29M 1.27x * Semaphores for long critical sections (CSes) and spin-locks for short CSes are used instead of SoCLC.

[2] B. S. Akgul, J. Lee and V. Mooney, “System-on-a-chip processor synchronization hardware unit with task preemption support,” CASES ‘01, pp.149-157, November 2001.

slide-26
SLIDE 26

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 26 26

Experimental Results (3/4) Experimental Results (3/4)

Example 2: Interactions between multiple processors and resources [3]

[3] S. Morgan, “Jini to the rescue,” IEEE Spectrum, 37(4), pp 44-49, April 2000.

Memory controller and memory Arbiter, Intr. controller, Clock DDU Reconfig. Logic FFT R1 MPEG R2 PCI R3 WI R4 MPC750-1 MPC750-2 MPC750-3 MPC750-4

slide-27
SLIDE 27

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 27 27

Experimental Results (4/4) Experimental Results (4/4)

Comparison with deadlock detection example [4]

  • RTOS3 with a software deadlock detection module, no HW RTOS
  • RTOS4 with SoCDDU

Method of Deadlock Detection Software Algorithm SoCDDU Detection Time ∆ (clock cycles) 16928 2 Execution time up to deadlock detection 61131 44205

[4] P. H. Shiu, Y. Tan and V. Mooney, “A novel parallel deadlock detection algorithm and architecture,” CODES ‘01, pp.30-36, April 2001.

Speedup 8463x 1.38x

slide-28
SLIDE 28

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 28 28

Hardware Area Hardware Area

Total area in

SoCLC SoCLC

(For 64 short CS locks + 64 long CS locks)

SoCDDU SoCDDU

(For 5 Processors x 5 Resources) Semi-custom VLSI 7435 gates using TSMC 0.25µm standard cell library 364 gates using AMI 0.3µm standard cell library Xilinx XC4000E 4003EPC84 # Seq. logic 532 10 # Other gates 9036 559

slide-29
SLIDE 29

June 2002 at ERSA June 2002 at ERSA HW/SW RTOS Project HW/SW RTOS Project 29 29

Conclusion Conclusion

A framework for automatic generation of configuration files for a custom HW/SW RTOS A novel HW header file generation methodology Experimental results showing

  • the configured systems are correct

A framework used to explore the RTOS design space. Future work support for heterogeneous processors support for multiple bus systems/structures