reconfigurable and
play

Reconfigurable and Homepage: http://ces.itec.kit.edu/teaching/ you - PowerPoint PPT Presentation

Organisation Institut fr Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel Lecture time: Mi., 15.45 - 17.15 Vorlesung im SS 2014 Bld. 50.34, HS -102 Reconfigurable and Homepage: http://ces.itec.kit.edu/teaching/


  1. Organisation Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel � Lecture time: Mi., 15.45 - 17.15 Vorlesung im SS 2014 Bld. 50.34, HS -102 Reconfigurable and � Homepage: http://ces.itec.kit.edu/teaching/ you can also find the slides from Adaptive Systems (RAS) previous years there � Slides Login: Login: “student” Lars Bauer, Jörg Henkel Passwd: “CES-Student” � Contact: lars.bauer@kit.edu Haid-und-Neu-Str. 7 Bld. 07.21, Rm. 316.2 (2 nd Floor!!) - 1 - - 2 - L. Bauer, CES, KIT, 2014 CES @ Technologiefabrik (TFI) Questions during the lecture Info-Bau TFI Mensa � Simply let me know / interrupt me - 3 - - 4 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014

  2. RAS Examine Teaching @ CES, SS 2014 � Lectures � Seminars � CS Diploma: ◦ RAS ◦ Rekonfigurierbare ◦ Vertiefungsfach 8: Entwurf eingebetteter Systeme und Eingebettete Systeme ◦ Low Power Design Rechnerarchitekturen ◦ Dependability in Embedded ◦ Embedded Systems for Systems Multimedia and Image � CS Master: Processing ◦ Distributed Decision ◦ Modul: Rekonfigurierbare und Adaptive Systeme Making � Labs [IN4INRAS] (3 ECTS) ◦ Stereo Video Processing ◦ Entwurf eingebetteter ◦ Multicore for Multimedia ◦ Modul: Eingebettete Systeme: Weiterführende Themen Systeme Processors [IN4INESWTN] (10 ECTS) ◦ Entwurf von eingebetteten ◦ Sensor Networks applikationsspezifischen ◦ Modul: Advanced Computer Architecture Prozessoren [IN4INACA] (10 ECTS) ◦ Low Power Design and Embedded Systems � Other Study Courses (e.g. EE): ask individually More Info: ces.itec.kit.edu/teaching - 5 - - 6 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 Theses @ CES Beneficial Previous Knowledge � Note: Info on homepage is typically not up-to-date � Rechnerstrukturen ◦ If you are interested in a particular topic: better ask individually ◦ Prerequisites � There are nearly always SADABAMA theses or Hiwi jobs � Eingebettete Systeme available in the scope of reconfigurable systems � Main projects: ◦ ES1: Optimierung und Synthese Eingebetteter Systeme ◦ i -Core: invasive Core ◦ ES2: Entwurf und Architekturen für Eingebettete Systeme ◦ OTERA: Online Test Strategies for Reliable Reconfigurable Architectures ◦ The core topics (e.g. details about FPGA architectures) ◦ Compilers for reconfigurable architectures will be recapitulated in the scope of this lecture � Topics: ◦ Thus, the contents of ES1 and ES2 are beneficial but not ◦ Algorithms for Runtime System, Operating System, … required in full detail ◦ Toolchain, Compiler, Synthesis, … ◦ Architecture, Hardware Prototype, Simulation Environment, … - 7 - - 8 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014

  3. General Literature Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel � “Fine- and Coarse-Grain Reconfigurable Computing”, S. Vassiliadis and D. Soudris, Springer 2007. Reconfigurable and � “Runtime adaptive extensible embedded processors – a survey”, H. P. Huynh and T. Mitra, SAMOS, pp. 215–225, Adaptive Systems (RAS) 2009. � “Reconfigurable computing: architectures and design methods”, T.J. Todman et al., IEE Proceedings Computers & 1. Introduction and Motivation: Digital Techniques, vol. 152, no. 2, pp. 193-207, 2005. � “Reconfigurable Instruction Set Processors from a The Demand for Adaptivity Hardware/Software Perspective”, F. Barat et al., IEEE Transactions on Software Engineering, vol. 28, no. 9, pp. 847-862, 2002. - 9 - - 10 - L. Bauer, CES, KIT, 2014 Designing Embedded Systems Definition ‘Computational Hot Spot’ � Typical approach: � A rather small part of the application that ◦ Static analysis of system corresponds to a rather large part of the requirements (e.g. com- execution time putational hot spots) ◦ Also called ‘Computational Kernel’ ◦ Build optimized system ◦ Typically: inner loop � Today’s requirements: ◦ 80/20 rule (90/10 rule etc.) ◦ Increasing complexity ◦ More functionality � Problem: 20 80 ◦ Statically chosen design 20 point has to match all 80 requirements ◦ Typically inefficient for individual components Code Size Execution Time (e.g. tasks or hot spots) - 11 - - 11 - - 11 - - 12 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014

  4. Example Application: H.324 Video Typical Implementation Alternatives Conferencing Efficiency: Mips/$, MHz/mW, Mips/area, … � Video En-/Decoding Remote “ Hardware solution ” Mic Phone CVBS CVHS Control � Audio En-/Decoding IR AUDIO INPUT VIDEO INPUT ASIC: � Data (De-)Multi- Digital Video Input - Non-programmable, - highly specialized plexing AUDIO ENCODER VIDEO ENCODER - Instruction set extension H.245 CONTROL G.723 H.263 / H.264 � Control protocol - parameterization - inclusion/exclusion of MULTIPLEXER ASIP: Application H.223 functional blocks specific instruction set processor DE-MULTIPLEXER H.223 “ Software AUDIO DECODER VIDEO DECODER H.245 CONTROL GPP: General pur- G.723 H.263 / H.264 solution ” pose processor MODEM PSTN AUDIO OUPUT VIDEO OUPUT INTERFACE Display src: cityrockz.com Line Phone Speakers Flexibility, 1/time-to-market, … src: Henkel, ESII Screen - 13 - - 14 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 Hotspots in H.324 Video ASIP Implementation Conferencing � Design accele- 12 rators for the hot 10 spots Processing Time [%] 8 � Connect them as Execution Units, 6 Register Files, 4 and Interfaces 2 0 I_ME S_ME PMV TQ_PL TQ_IL TQ_C LF MC_L MC_C IP_L16 MD_I4 CABAC CAVLC Dec_MB get_pos IDQ_PL CABAC_d CAVLC_d FM Q UP Enc Qt Pred_0 Reconst ED BC TC BA CS NF LPF HPF EE DRF Dt FGA H245_C H223_M H223_DM V34Mod USB MAC Processing Functions src: Tensilica, Inc.: “Xtensa LC Product Brief” - 15 - - 16 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014

  5. ASIP Implementation (cont’d) ASIP Implementation (cont’d) CAVLC � Scalability CABAC � Provides noticeably improved problem when performance after targe- rather many hot- ting the ma- pots exist jor hot spots ◦ Note: still not I_ME I_ME � However, all relevant hot spots are covered performance TQ_PL TQ_PL still not suf- ficient to Dec_ achieve real- MC_L MC_L H245_C MB time require- MAC FM ments V34 ◦ More hot spots need to be mod accelerated S_ME src: Tensilica, Inc.: “Xtensa LC Product Brief” src: Tensilica, Inc.: “Xtensa LC Product Brief” - 17 - - 18 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 Example Application: H.264 video Summary of ASIP Implementation Encoder MB Encoding Loop � ASIPs perform well when 1. rather few hot spots need to be accelerated and � MB-Type Decision (I or P) � Mode Decision (for I or P) DCT / IDCT / If MB_Type = P_MB MC Loop Over MB 2. those hot spots are well known in advance Loop Over MB Loop Over MB Blocking Filter Q IQ then ME: SA(T)D In-Loop De- � ASIPs are less efficient when targeting rather many Encoding hot spots RD CAVLC Engine ◦ All accelerators are provided statically (i.e. they require area else and consume power) even though typically just a few of DCT / IDCT / them are needed at a certain time IPRED HT / Q IHT / IQ � ASIPs are less efficient when targeting unknown hot spots � Iterates on MacroBlocks (MBs, i.e. 16x16 pixels) ◦ Even for a given application it is not necessarily clear, which � 2 different MB-types parts of it are ‘hot’ during execution as this may depend on � different computational paths with different input data (as demonstrated in the following) computational requirements ◦ I-MB (spatial prediction) ◦ P-MB (temporal prediction) - 19 - - 20 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014

  6. Example: Distribution of I-MBs in Example: Football Video Medium-to-VeryHigh Motions I-MB Rafting Rugby Football 100% P-MB 90% 80% Scene with Very INTRA MB in a Frame [%] High Motion 70% Note: 16x16 60% MBs can be 50% Scene with Medium- partitioned to-Slow Motion 40% into sub- 30% MBs, e.g. 16x8, 8x8, 20% Scene with High-to- down to 4x4 10% Medium Motion 0% 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 Frame Number - 21 - - 22 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 Potentials of RAS Conclusion: Demand for Adaptivity , MIPS/area, … � Even for a well known application it is not always clear “ Hardware solution ” which parts will be ‘hot’ (e.g. according computational ASIC: complexity) and thus benefit from accelerators Reconfigurable - Non-programmable, ◦ This depends on changing input data and control flow and Adaptive - highly specialized Efficiency: MIPS/$, MHz/mW � Even more complex: multi-tasking scenarios Systems ◦ Not clear, which applications will execute at the same time ◦ Not clear, which applications will execute at all (user can ASIP: Application tion download new applications) specific instruction ion ◦ This significantly increases the number of potential hot spots set processor r � hardly possible to address this with an ASIP � Systems that fulfill the demand for adaptivity may lead to “ Software GPP: General pur- ◦ Better performance (absolute criteria) solution ” pose processor ◦ Higher Efficiency (relative criteria e.g. performance per area etc.) ◦ Lower cost (no redesign if specifications change, no overdesign to cover all scenarios) Flexibility, 1/time-to-market, … - 23 - - 24 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend