JOP: A Java Optimized Processor for Embedded Real-Time Systems - - PowerPoint PPT Presentation

jop a java optimized processor for embedded real time
SMART_READER_LITE
LIVE PREVIEW

JOP: A Java Optimized Processor for Embedded Real-Time Systems - - PowerPoint PPT Presentation

JOP: A Java Optimized Processor for Embedded Real-Time Systems Martin Schberl University of Technology Vienna, Austria Overview Motivation Related work JOP architecture WCET Analysis Results Conclusions, future work


slide-1
SLIDE 1

JOP: A Java Optimized Processor for Embedded Real-Time Systems

Martin Schöberl University of Technology Vienna, Austria

slide-2
SLIDE 2

Embedded Java Systems Java Optimized Processor 2

Overview

Motivation Related work JOP architecture WCET Analysis Results Conclusions, future work Demo

slide-3
SLIDE 3

Embedded Java Systems Java Optimized Processor 3

Embedded Systems

An embedded system is a computer

systems that is part of a larger system

Examples

Washing machine Car engine control Mobile phone

slide-4
SLIDE 4

Embedded Java Systems Java Optimized Processor 4

Real-Time Systems

A definition by John A. Stankovic:

In real-time computing the correctness of the system depends not only on the logical result of the computation but also on the time at which the result is produced.

slide-5
SLIDE 5

Embedded Java Systems Java Optimized Processor 5

Real-Time Systems

Imagine a car accident

What happens when the airbag is fired too

late?

Even one ms too late is too late!

Timing is an important property Conservative programming styles

slide-6
SLIDE 6

Embedded Java Systems Java Optimized Processor 6

RT System Properties

Often safety critical Execution time has to be known

Analyzable system

Application software Scheduling Hardware properties

Worst case execution time (WCET)

slide-7
SLIDE 7

Embedded Java Systems Java Optimized Processor 7

Issues with COTS

COTS are for average case performance

Make the common case fast Very complex to analyze WCET

Pipeline Cache Multiple execution units

slide-8
SLIDE 8

Embedded Java Systems Java Optimized Processor 8

The Idea

Build a processor for RT System

Optimize for the worst case

Design philosophy

Only WCET analyzable features

No unbound pipeline effects New cache structure

Shall not be slow

slide-9
SLIDE 9

Embedded Java Systems Java Optimized Processor 9

Related Work

picoJava

SUN, never released

aJile JEMCore

Available, RTSJ, two versions

Komodo

Multithreaded Java processor

FemtoJava

Application specific processor

slide-10
SLIDE 10

Embedded Java Systems Java Optimized Processor 10

JOP Architecture

Overview Microcode Processor pipeline An efficient stack machine Instruction cache

slide-11
SLIDE 11

Embedded Java Systems Java Optimized Processor 11

JOP Block Diagram

slide-12
SLIDE 12

Embedded Java Systems Java Optimized Processor 12

JVM Bytecode Issue

Simple and complex instruction mix No bytecodes for native functions Common solution (e.g. in picoJava):

Implement a subset of the bytecodes SW trap on complex instructions Overhead for the trap – 16 to 926 cycles Additional instructions (115!)

slide-13
SLIDE 13

Embedded Java Systems Java Optimized Processor 13

JOP Solution

Translation to microcode in hardware Additional pipeline stage No overhead for complex bytecodes

1 to 1 mapping results in single cycle

execution

Microcode sequence for more complex

bytecodes

Bytecodes can be implemented in Java

slide-14
SLIDE 14

Embedded Java Systems Java Optimized Processor 14

Microcode

Stack-oriented Compact Constant length Single cycle Low-level HW

access

An example

dup: dup nxt // 1 to 1 mapping // a and b are scratch variables // for the JVM code. dup_x1: stm a // save TOS stm b // and TOS−1 ldm a // duplicate TOS ldm b // restore TOS−1 ldm a nxt // restore TOS // and fetch next bytecode

slide-15
SLIDE 15

Embedded Java Systems Java Optimized Processor 15

Processor Pipeline

slide-16
SLIDE 16

Embedded Java Systems Java Optimized Processor 16

An Efficient Stack Machine

JVM stack is a logical stack

Frame for return information Local variable area Operand stack

Argument-passing regulates the layout Operand stack and local variables need

caching

slide-17
SLIDE 17

Embedded Java Systems Java Optimized Processor 17

Stack Access

Stack operation

Read TOS and TOS-1 Execute Write back TOS

Variable load

Read from deeper stack location Write into TOS

Variable store

Read TOS Write into deeper stack location

slide-18
SLIDE 18

Embedded Java Systems Java Optimized Processor 18

Two-Level Stack Cache

  • Dual read only from TOS and

TOS-1

  • Two register (A/B)
  • Dual-port memory
  • Simpler Pipeline
  • No forwarding logic
  • Instruction fetch
  • Instruction decode
  • Execute, load or store
slide-19
SLIDE 19

Embedded Java Systems Java Optimized Processor 19

JVM Properties

Short methods Maximum method size is restricted No branches out of or into a method Only relative branches

slide-20
SLIDE 20

Embedded Java Systems Java Optimized Processor 20

Proposed Cache Solution

Full method cached Cache fill on call and return

Cache misses only at these bytecodes

Relative addressing

No address translation necessary

No fast tag memory Simpler WCET analysis

slide-21
SLIDE 21

Embedded Java Systems Java Optimized Processor 21

Architecture Summary

Microcode 1+ 3 stage pipeline Two-level stack cache Method cache

The JVM is a CISC stack architecture, whereas JOP is a RISC stack architecture.

slide-22
SLIDE 22

Embedded Java Systems Java Optimized Processor 22

WCET Analysis

WCET has to be known

Needed for schedulability analysis Measurement usually not possible

Would require test of all possible cases

Static analysis

Theory is mature Low-level analysis is the issue

slide-23
SLIDE 23

Embedded Java Systems Java Optimized Processor 23

WCET Analysis

Path analysis Low-level analysis (bytecodes) Global low-level analysis WCET Calculation

slide-24
SLIDE 24

Embedded Java Systems Java Optimized Processor 24

WCET Analysis for JOP

Simple low-level analysis Bytecodes are independent

No shared state No timing anomalies

Bytecode timing is known and

documented

Simpler caches

slide-25
SLIDE 25

Embedded Java Systems Java Optimized Processor 25

WCET Tool

Execution time of basic blocks Annotated loop bounds ILP problem solved Simple cache analysis included

Only two block cache in loops Will be extended

slide-26
SLIDE 26

Embedded Java Systems Java Optimized Processor 26

Results

Size

Compared to soft-core processors

General performance

Application benchmark (KFL & UDP/IP) Various Java systems

slide-27
SLIDE 27

Embedded Java Systems Java Optimized Processor 27

Size of FPGA processors

119 5.5 2923 NIOS 4 ? 2000 FemtoJava 33/4 ? 2600 Komodo 40 1 3400 Lightfoot 101 3.25 1831 JOP typ. 98 3.25 1077 JOP min. (MHz) (KB) (LC) fmax Memory Resources Processor

slide-28
SLIDE 28

Embedded Java Systems Java Optimized Processor 28

Application Benchmark

1 10 100 1000 10000 100000 1000000 J O P l e J O S T I N I K

  • m
  • d
  • J

S t a m p S a J e E J C S u n j v m g c j X i n t Preformance [iteration/s]

slide-29
SLIDE 29

Embedded Java Systems Java Optimized Processor 29

Applications

Kippfahrleitung

Distributed motor control

ÖBB

Vereinfachtes Zugleitsystem GPS, GPRS, supervision

TeleAlarm

Remote tele-control Data logging Automation

slide-30
SLIDE 30

Embedded Java Systems Java Optimized Processor 30

JOP in Research

University of Lund, SE

Application specific hardware (Java-> VHDL) Hardware garbage collector

Technical University Graz, AT

HW accelerator for encryption

University of York, GB

Javamen – HW for real-time systems

Institute of Informatics at CBS, DK

Real-time GC Embedded RT Machine Learning

slide-31
SLIDE 31

Embedded Java Systems Java Optimized Processor 31

JOP for Teaching

Easy access – open-source

Computer architecture Embedded systems

UT Vienna

JVM in hardware course Digital signal processing lab

CBS

Distributed data mining (WS 2005) Very small information systems (SS 2006)

Wikiversity

slide-32
SLIDE 32

Embedded Java Systems Java Optimized Processor 32

Conclusions

Real-time Java processor

Exactly known execution time of the BCs Time-predictable method cache Simple real-time profile

Resource-constrained processor

RISC stack architecture Efficient stack cache Flexible architecture

slide-33
SLIDE 33

Embedded Java Systems Java Optimized Processor 33

Future Work

Real-time garbage collector Instruction cache WC analysis Hardware accelerator Multiprocessor JVM Java computer

slide-34
SLIDE 34

Embedded Java Systems Java Optimized Processor 34

More Information

Two pages short paper JOP Thesis and source

http://www.jopdesign.com/thesis/index.jsp http://www.jopdesign.com/download.jsp

Various papers

http://www.jopdesign.com/docu.jsp