philips
play

Philips Parallel Programming Models for Heterogeneous MPSoCs - PowerPoint PPT Presentation

Philips Parallel Programming Models for Heterogeneous MPSoCs Pieter van der Wolf Philips Research MPSoC05 July 11-15, 2005 Outline Introduction Task Transaction Level interface: TTL Abstract interface for streaming in MPSoCs


  1. Philips Parallel Programming Models for Heterogeneous MPSoCs Pieter van der Wolf Philips Research MPSoC’05 July 11-15, 2005

  2. Outline • Introduction • Task Transaction Level interface: TTL – Abstract interface for streaming in MPSoCs • Programming TTL multiprocessors – Constraint-driven code transformations • Design cases – Sea-of-DSP – Smart Camera – Cake / Wasabi • Conclusion MPSoC’05 2 Philips Confidential

  3. MPSoC Design • Need for MPSoCs: – Implement advanced functionalities – Low cost – Power efficient – Flexible • Increasing complexity of MPSoCs: – Increasing design efforts Gates/cm 2 – SW effort overtaking HW effort Moore’s Law Log Scale (59% CAGR) – Increasing time-to-market Design Productivity • Productivity increase through: (20-25% CAGR) – Raise level of abstraction – Structured design Software Productivity (8-10% CAGR) – IP reuse 0.35µ 0.25µ 0.18µ 0.15µ 0.12µ 0.1µ – EDA support MPSoC’05 3 Philips Confidential

  4. Example TV application audio Audio decoding decoding Audio out 1 Audio in 1 AC-3 Audio decoding Audio out 2 Audio in 2 AC-3 Spatial video pixel Picture rate up-conversion scaling Analog processing NR ME, MC DEINT UPC VS, HS Sharpness improvement Video Video LTI out PEAK Picture rate up-conversion Spatial PCOMP DA CTI scaling MPEG MPEG ME, MC DEINT UPC VS, HS bit stream Sharpness improvement Spt . Scal. video VCR LTI VS, HS PEAK PCOMP DA decoding CTI Many task graphs like this have to be supported MPSoC’05 4 Philips Confidential

  5. Example MPSoC Hardware • Philips's advanced set-top box and digital TV SoC (Viper2) • 0.13 µ m MBS • 50 M transistors VMPG TM3260 • 100 clock domains • > 60 IP blocks TDCS VIP MIPS3960 TM3260 QVCP5L MSP MDCS QVCP2L MPSoC’05 5 Philips Confidential

  6. Example MPSoC Software Stack Applications Middleware Kernel: pSOS, WinCE, JavaOS JavaTV, TVPAK, OpenTV, MHP/Java, proprietary ... Streaming Components Streaming Components Streaming Infrastructure Streaming Infrastructure Nexperia Hardware Hardware Nexperia MPSoC’05 6 Philips Confidential

  7. MPSoC Integration • Current practice – Ad hoc approaches Computation – Low-level interfaces • Examples IP Module – Synchronization via low-level primitives • Interrupts, MMIO, semaphores Communication – Data access services partly in IP • Buffering, DMA control, address generation DTL, AXI, … • Consequence – Part of IP is specific for underlying communication infrastructure • IP just wants the next pixel or block or … • But also knows about burst transfers, interrupts, semaphores, …. MPSoC’05 7 Philips Confidential

  8. MPSoC Integration • Low-level interfaces – Hardware / software IP designer must deal with low-level issues • Increases design effort • Same problems solved again and again: error prone – IP becomes specific for particular use • Hampers reusability – IP integrator must deal with low-level issues • Increases design effort – Infrastructures cannot evolve • Changes in infrastructure affect hardware / software IP MPSoC’05 8 Philips Confidential

  9. Interface Centric Design: TTL • Aim: Improve MPSoC integration • Means: Raise level of abstraction • TTL Task Transaction Level interface: – Parallel application models Task • Executable specifications – Platform interface Task Task • Integration of HW and SW tasks TTL Mapping • Mapping technology – Structured design & programming T A S K S – Based on TTL TTL Platform Infrastructure MPSoC’05 9 Philips Confidential

  10. TTL Requirements • Well-defined semantics for application modeling – Focus: stream processing applications – Make concurrency and communication explicit • High-level interface Computation – Make high-level services available IP module • Inter-task communication • Multi-tasking IP Module – Easy to use for IP development TTL – Facilitate reuse and integration of IP Communication Shell – Provide implementation freedom • Allow efficient and cheap implementations – E.g. supporting fine grain synchronization for on-chip memory • Support integration of hardware and software tasks MPSoC’05 10 Philips Confidential

  11. TTL in Example Architecture • Platform interface for integration of HW and SW tasks – Enable communication in heterogeneous MPSoCs Task 1 Task 2 TTL Task 3 SW-API SW Shell ASP TTL CPU HW-interface HW Shell DTL, AHB, AXI, OCP Interconnect MPSoC’05 11 Philips Confidential

  12. TTL Inter-Task Communication Logical model and terminology private variable full token with value empty token channel port task TTL interface • Communicating tasks are organized as task graph • Tasks communicate by invoking TTL interface functions on their ports • Uni-directional channels with reliable ordered communication • Arbitrary data types, but single type per channel • Support for multi-cast MPSoC’05 12 Philips Confidential

  13. Example: Message Passing Interface Producer side • write(port, data, …) – Write data into channel connected to port Consumer side • data = read(port, …) – Read data from channel connected to port • Abstract interface for tasks • Right interface ? – Appropriate for modeling application ? – Appropriate for implementation on architecture ? MPSoC’05 13 Philips Confidential

  14. TTL Interface Types • Different needs for communication arising from: – Different applications • In-order – out-of-order – Different implementation styles • Hardware – software • Shared memory – message passing • Support set of interface types – Each interface type offers narrow interface • Easy to use • Simple to implement – Each interface type supports particular communication style – Offer multiple interface types in one framework – Based on single model for interoperability MPSoC’05 14 Philips Confidential

  15. TTL Interface Types • TTL offers a number of different interface types • Allow selection of interface type per port of task • Enable interoperability by allowing mix & match T1 T2 T3 T4 T5 T6 T7 MPSoC’05 15 Philips Confidential

  16. TTL Interface Types Acronym Full name CB Combined Blocking RB Relative Blocking RN Relative Non-blocking DBI Direct Blocking In-order DNI Direct Non-blocking In-order DBO Direct Blocking Out-of-order DNO Direct Non-blocking Out-of-order MPSoC’05 16 Philips Confidential

  17. Interface Type CB Producer side • write(port, vector, size) – Write vector of size values into channel Consumer side • read(port, vector, size) – Read vector of size values from channel • Most abstract TTL interface type • Blocking semantics • Combined synchronization and data transfer • Vector operations • Based on earlier work on YAPI for KPN style modeling MPSoC’05 17 Philips Confidential

  18. Pros / Cons Interface Type CB + Easy to use + Reusable tasks – Copying overhead if private variables not in local buffers – Smart compiler may help in some cases – If local buffers: – Large tokens / vectors � large local buffers – Small tokens / vectors � large synchronization overhead Task 1 Task 2 TTL SW Shell Mem ASP TTL CPU HW Shell 1 4 2 3 Interconnect MPSoC’05 18 Philips Confidential

  19. Separate Synchronization and Data Transfer Producer Consumer acquireRoom (2) acquireData (2) store/dereference load/dereference releaseData (2) releaseRoom (2) MPSoC’05 19 Philips Confidential

  20. Interface Types RB and RN Producer side • reAcquireRoom(port, count) (RB) • tryReAcquireRoom(port, count) (RN) – Acquire count empty tokens, blocking (RB) / non-blocking (RN) • store(port, offset, vector, size) – Store vector of size values into the tokens with offset..offset+size-1 to the oldest acquired token • releaseData(port, count) – Release count oldest acquired tokens as full tokens • Separate synchronization and data transfer • Vector operations • Re-acquire operations do not change state of the channel MPSoC’05 20 Philips Confidential

  21. Pros / Cons Interface Types RB / RN + Coarse grain synchronization with fine grain data transfer – Low synchronization overhead with small local buffers + Out-of-order data accesses – Reduce cost of private variables + Load only subset of tokens from channel – Reduce cost of data transfers – Less abstract than CB – Increases programming effort – Makes tasks less reusable – Inefficiencies upon data transfers – Function call, access to channel admin, address calculations – Copying may still occur MPSoC’05 21 Philips Confidential

  22. Interface Types DBI and DNI Producer side • acquireRoom(port, &token) (DBI) • tryAcquireRoom(port, &token) (DNI) – Acquire empty token, blocking (DBI) / non-blocking (DNI) • token->field = value; – Assign value to (part of) token • releaseData(port) – Release oldest acquired token as full token • Separate synchronization and data transfer • Direct access to data via token references (pointers) • Scalar operations only • Tokens are released in same order as they are acquired MPSoC’05 22 Philips Confidential

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend