An Approach to Programming Configurable Computers for Numeric - PowerPoint PPT Presentation

An Approach to Programming Configurable Computers for Numeric Applications* A simple Language for Algorithms on the Reals F. Mayer-Lindenberg, TUHH * computer components with applications of their own: 1. compute units (ALUs) 2. memory 3. control automata

von Neumann architecture: memory holding ALUs for data instruction list arithmetics on CA memory (M-Programm) number codes arbitrary contents → (limited) universality FPGA architecture / digital circuits / embedded computing: chips integrating sets of micro modules ALU primitives and memory + + + + configurable interconnection network for the micro modules + + + + (configurability → universality) permits configurations of ALU circuits + cfgb., e.g. full adder for various number codes and CAs 'Programmierung': formal definition of computations composed of many* arithmetic operations ('algorithms'), to be automatically transformed to control the processing on a maschine including the data flow * a finite number, maybe unbounded if depending on the data

vN computer Levels of programming FPGA: (not programmable) machine ressources: can implement many types of a few ALUs for a compute units of ALUs for diverse number codes maybe different types (mandatory for eff. use of resources) few number codes large memories memory, I/O (input/output) control automata/soft processors (hidden hierarchy) int./ext. Interfaces, network functions automatic caching configuration could change dynamically using libraries, OS application programming: using multiple number codes may fixed number types algorithmen need simulation (number codes, I/O) HW oriented types of mathematical origin and error handling SIMD/MIMD with execution control: available ALU circuits work in parallel control by the OS control of parallelism/timing at high rates excuting threads PL/PS control for distributing operations to single FPGA needs processor and threads, comm. ALUs and data to memory memory allocation, communications

Approach to the application programming of networks of standard and FPGA based procs: ( → requirements on a programming language for numeric applications .. emb'd. and HPC) 1) The implementation of new number codes and of ALU circuits/soft processors, interfaces and IO automata based on FPGA shall not be supported. Number codes are simply selected, ALUs, IO automata, and networking functions are separately developed and configured system components. FPGA applications build on libraries of predefined configurations. 2) Compiling an application requires the specification of the available programmable processors and automata. Target systems are heterogeneous networks of processors. 3) The multi-threading and the distribtion of data and operations to the ALUs and memory are specified allowing for automatic optimizations. Timing conditions must be explicit. 4) Apart from the specification of the target networks and the usage of their resources 'only' the formal definition of numerical algorithms is needed and preferred to be given abstracly and in some notation close to the mathematical one. For embedded applications any PL will be compared to PLs like 'C' regarding simplicity and compilation. | 4a) The diverse number codes are not treated as individual types with operations | and conversions of their own. Instead, a single abstract type of real number is | used with the error-free arithmetic operations. The diverse number codes are | only represented by the corresponding rounding operations on the reals. There | is no need for pointers, bit fields or Boolean data. | 4b) As algorithms encompassing many numeric operations have to be supported, the tuple sets IR n are available with extra tuple operations and optional roundings. | | Tuple ops are useful to eliminate loops, and can be implemented efficiently.

A large group of projects – efficiently usable computer systems .. on CS | efficiency in the usage of the HW .. eng/sci | efficient application programming .. e:mult.sol. | .. std/acc. .. also for the usage of FPGA components | | |-----------→ modular processor based on FPGA HW architecture | | standardized control automaton | ALU modules for various number codes | composite operations, vector data | |-----------→ programming language  –Nets (small/simple) F&PL&Compiler | | for numeric applications on processor networks | parallelism supported by processes, realtime functions | implementation of a compiler and a prog/sim environment | |-----------→ FPGA based heterogene. processor networks S-Archit., OS | | network/system architecture, infrastructure includes ser.ctl. | exper. platforms for parallel and distributed computing | and for evaluating the system architecture and the PL

Soft controllers / modular soft processors Required to: support non-standard ALUs, wide data codes provide maximum ALU efficiency, parallel CF and mem.acc. .. support multiple threads be a low complexity circuit to allow for large MIMD sub nets .. have a simple memory architecture DMA IMEM DMEM port (on-chip BRAM) (on-chip BRAM) soft data registers & address registers I/O controller interface path control circuit ALU host arithmetic I/O pipelines - controller performs instruction sequencing, memory control and I/O, 4 threads - ALU data word size independent from controller address/index word sizes - VLIW type instructions for ALU ops. executed in parallel with controller ops. - no memory bus, no cache/MMU, DMA supported I/O to ext.mem.(SW caching) - controller design adapted to FPGA resources

Example: Floating point ALU / data path attaching to soft controller (Spartan-6) 45-bit number codes: 34-bit mantissa+sign, 9-bit exponent, no non-normals, round → 0 supporting parallel chained +/* operations and dual memory accesses (effic. dot product) registers and data RAM are 45-bit wide, data RAM with one 'rw', one 'r' port, 4 threads D0 D1 '+' pipeline D2 S D3 E D4 S L D5 D6 E (r) D7 '*' pipeline L 45-bit DMEM E D8 (r) C DMEM(w) D9 T S D10 E D11 flag L D12 D13 data cvt cvt D14 (controller) D15

V144-ALU 144-bit data size (4-vectors), fixed/BFP, 16 regs/ctx, separate exponents Arithmetic operations: 18-bit instruction codes for data path (SP-6) 110 010 0rrr tttt ssss dr=(dt,ds)2 .. 2-f.SIMD dbl. .. no par.transfer, n.f. 110 011 0rrr tttt ssss dr+=(dt,ds)2 .. 2-f-SIMD dbl. .. no par.transfer, n.f. 110 000 0rrr tttt ssss dr=½(dt,ds) .. dbl. .. no par.transfer, n.f. *** 000 *rrr tttt 1*01 dr=bfly(dt) .. uses extra pars from ctrl word *** 000 *rrr tttt 1*10 drl=½(dtl+dth) *** 000 *rrr tttt 1*11 drl=dsum(dt) .. double prec. add *** 001 *rrr tttt ssss dr=ds*dt .. SIMD *** 010 *rrr tttt ssss dr=ds+dt .. SIMD *** 011 *rrr tttt ssss dr=dt–ds .. SIMD *** 100 *rrr tttt ssss drl=dtl*dsl .. cmpl. mpy .. not fused on SP-6 *** 101 *rrr tttt ssss drh=dtl*dsh .. cmpl. mpy .. not fused on SP-6 *** 110 *rrr tttt ssss drh+/2=dth*dsl .. quat. mpy .. not fused on SP-6 *** 111 *rrr tttt ssss drl+/2=dth*dsh .. quat. mpy .. not fused on SP-6 Parallel transfer operations, using reg codes from controller instruction: 001 *** 1 *** **** **** shift .. normalize 010 *** 1 *** **** **** copy (dr=dt) 011 *** 1 *** **** **** conj (dr=conj(dt)) .. shift direction from controller instr 100 *** 1 *** **** **** rsh 110 *** 1 *** **** **** wshc (r/w sh cnt) .. access associated 9-bit count regs .. ALU w/o BF uses 12 multipliers, 5500 LUTs incl.controller, fits into XC6SLX9

 –Nets supports the various number codes by a unique type of real number Coding data and operations on codes to be evaluated by processors: Numbers need to be encoded by bit strings before they can be digitally computed with. digit enc: IR → B* encoding function partially defined dec: B* → IR decoding function partially defined such that r nd = dec°enc: IR → IR the 'rounding' function fulfils enc(r) = enc(r nd(r)) hence r nd°r nd = r nd Operations op: IR → IR are substituted by op'=enc°op°dec: B* → B* on the machine r = r nd(r)  dec(op'( enc(r)) ) = r nd(op(r)) (substitution inserts rounding) Operations op: IR  IR → IR etc. are handled similarly. Algorithms (compositions of operations) are executed on the machine by substituting every operation op on the reals by the corresponding op' on number codes. Tuple codes can be different from tuples of codes. Selected op's can add extra approximation errors. Certain composite operations can be implemented as 'fused' operation w/o intermediate roundings.  -Nets supports several standard and non-standard encodings including I32, X16, X35, V144, F32, F64, G45 .. to be expanded (rnd'd from int'l num) by their (unique) rounding operations only and as attributes to the abtract computations performed by its processes telling the compiler how to substitute operations.

Target architecture (example): – red blocks: application specific ALUs – fixed networking infrastructure/RTS in the FPGA – disjoint memory partitions (includes CAs, memorycontrol, data commun./protocol impl., HIF) DRAM DRAM MC MC C+PR C+PR C+PR C+PR A1+DR A2+DR A1+DR A2+DR data data FPGA FPGA NoC ARM NoC ARM NW NW (SoC) (SoC) A3+DR A4+DR A3+DR A4+DR C+PR C+PR C+PR C+PR LAN MC MC DRAM DRAM PC

An Approach to Programming Configurable Computers for Numeric - PowerPoint PPT Presentation

An Approach to Programming Configurable Computers for Numeric Applications* A simple Language for Algorithms on the Reals F. Mayer-Lindenberg, TUHH * computer components with applications of their own: 1. compute units (ALUs) 2. memory

Fibre Optic Multiplexer Configurable The What is the Badger Fully configurable Audio/Data

Sampling Effect on Performance Prediction of Configurable Systems : A Case Study Juliana Alves

Overview of Overview of configurable architectures configurable architectures Prof. Kurt

Dual-Mode Configurable RISC-V Processor IP Nuclei System Technology Dual-Mode

Designing a Web of Highly-Configurable Designing a Web of Highly-Configurable Intrusion Detection

Configurable software- -based based Configurable software edge router architecture edge router

An Architecture for An Architecture for Configurable Dependability of Configurable Dependability

Reinforcement Learning in Configurable Continuous Environments Alberto Maria Metelli, Emanuele

Maca a configurable tool to Maca a configurable tool to integrate Polish morphological

A Configurable Hardware Scheduler A Configurable Hardware Scheduler (CHS) for Real- -Time

Language and Computers where to start? Outline Computers Computers Computers Topic 1: Text

Quantum Mechanics; a Blessing and a Curse By Elias Marcopoulos Quantum Computers Quantum

Performance Prediction of Configurable Software Systems by Fourier Learning Yi Zhang, Jianmei

Configurable and Extensible Processors Change System Design Ricardo E. Gonzalez Tensilica, Inc.

Sampling Effect on Performance Prediction of Configurable Systems : A Case Study Juliana Alves

Transfer Learning for Improving Model Predictions in Highly Configurable Software Pooyan

Beam Instrumentation Hermann Schmickler (CERN Beam Instrumentation Group) Hermann Schmickler

Choquet integral in decision making and metric learning Vicen c Torra Hamilton Institute,

PLUG PLUG Presentation Layer Universal Generator Presentation Layer Universal Generator

Single European Sky ATM Research (SESAR) Friday 6 th June 2008 Agenda 1. 11:00

22t0712015 Rationale for change - DSD perspective .Evaluation of current strands, .Planned new

Pattern Discovery in Biosequences Pattern Discovery in Biosequences ISMB 2002 tutorial ISMB 2002

Corpus Frequency of Relative Clause Association in Japanese [isi-ga syokusinsiteiru] syzyo-no

Provable security of Internet cryptography protocols Douglas Stebila Based on joint works with

An Approach to Programming Configurable Computers for Numeric - PowerPoint PPT Presentation

An Approach to Programming Configurable Computers for Numeric Applications* A simple Language for Algorithms on the Reals F. Mayer-Lindenberg, TUHH * computer components with applications of their own: 1. compute units (ALUs) 2. memory

Fibre Optic Multiplexer Configurable The What is the Badger Fully configurable Audio/Data

Sampling Effect on Performance Prediction of Configurable Systems : A Case Study Juliana Alves

Overview of Overview of *configurable* architectures *configurable* architectures Prof. Kurt

Dual-Mode Configurable RISC-V Processor IP Nuclei System Technology Dual-Mode

Designing a Web of Highly-Configurable Designing a Web of Highly-Configurable Intrusion Detection

Configurable software- -based based Configurable software edge router architecture edge router

An Architecture for An Architecture for Configurable Dependability of Configurable Dependability

Reinforcement Learning in Configurable Continuous Environments Alberto Maria Metelli, Emanuele

Maca a configurable tool to Maca a configurable tool to integrate Polish morphological

A Configurable Hardware Scheduler A Configurable Hardware Scheduler (CHS) for Real- -Time

Language and Computers where to start? Outline Computers Computers Computers Topic 1: Text

Quantum Mechanics; a Blessing and a Curse By Elias Marcopoulos Quantum Computers Quantum

Performance Prediction of Configurable Software Systems by Fourier Learning Yi Zhang, Jianmei

Configurable and Extensible Processors Change System Design Ricardo E. Gonzalez Tensilica, Inc.

Sampling Effect on Performance Prediction of Configurable Systems : A Case Study Juliana Alves

Transfer Learning for Improving Model Predictions in Highly Configurable Software Pooyan

Beam Instrumentation Hermann Schmickler (CERN Beam Instrumentation Group) Hermann Schmickler

Choquet integral in decision making and metric learning Vicen c Torra Hamilton Institute,

PLUG PLUG Presentation Layer Universal Generator Presentation Layer Universal Generator

Single European Sky ATM Research (SESAR) Friday 6 th June 2008 Agenda 1. 11:00

22t0712015 Rationale for change - DSD perspective .Evaluation of current strands, .Planned new

Pattern Discovery in Biosequences Pattern Discovery in Biosequences ISMB 2002 tutorial ISMB 2002

Corpus Frequency of Relative Clause Association in Japanese [isi-ga syokusinsiteiru] syzyo-no

Provable security of Internet cryptography protocols Douglas Stebila Based on joint works with

Overview of Overview of configurable architectures configurable architectures Prof. Kurt