Low Contention Mapping of Real-Time Tasks onto a TilePro 64 Core - PDF document

Low Contention Mapping of Real-Time Tasks onto a TilePro 64 Core Processor Christopher Zimmer and Frank Mueller North Carolina State University, Raleigh, NC 27695-8206, mueller@cs.ncsu.edu Abstract Predictability of task execution is paramount for real-time systems so that upper bounds of execution times can be determined via static timing analysis. Static timing analysis on network-on-chip (NoC) processors may result in unsafe underestimations when the underlying communication paths are not considered. This stems from contention on the Figure 1. NoC Contention (Config 1) underlying network when data from multiple sources share detection in satellites using the Opera Maestro proces- parts of a routing path in the NoC. Contention analysis sor [10], a radiation hardened TilePro with 49 cores devel- must be performed to provide safe and reliable bounds. In oped by Boeing. A drawback of these processors is posed addition, the overhead incurred by contention due to inter- by NoC contention of multiple tasks. Such contention exists process communication (IPC) can be reduced by mapping for shared-memory accesses, for off-chip memory references tasks to cores in such a way that contention is minimized. and for message passing when utilizing distributed software This paper makes several contributions to increase pre- models instead of shared memory. Our work focuses on dictability of real-time tasks on NoC architectures. First, we message passing over the NoC assuming separate NoC contribute a constraint solver that exhaustively maps real- interconnects for memory, coherence, I/O and messaging [3]. time tasks onto cores to minimize contention and improve Other work on increasing predictability and coping with non- predictability. Second, we develop a novel TDMA-like ap- uniform memory latencies is orthogonal [4]. proach to map communication traces into time frames to Message-based communication over the NoC has been ensure separation of analysis for temporally disjoint communication. Third, we contribute a novel multi-heuristic ap- shown to increase scalability compared to shared-memory proximation, HSolver, for rapid discovery of low contention programming [7]. We conjecture that it can also assist in solutions. HSolver reduces contention by up to 70% when increasing predictability by decreasing contention as it is compared with na¨ ıve and constrained exhaustive solutions. easier to analyze messages statically than shared memory We evaluate our experiments using a micro-benchmark of references [21]. Even under message passing, poor task- task system IPC on the TilePro64, a real, physical NoC to-core mappings can result in a loss of predictability due processor with 64 cores. To the best of our knowledge, this to latencies incurred through NoC contention. Consider is the first work to consider IPC for worst-case time frames a mesh NoC with full-duplex links, i.e., two messages to simplify analysis and to measure the impact on actual traveling in opposite directions over a link do not result hardware for NoC-based real-time multicore systems. in contention, that utilizes static dimension-ordered worm- hole routing favoring horizontal routing before vertical [3]. 1. Introduction Consider the example “Config 1” in Figure 1 of nine cores with a mesh NoC. Two messages are sent, one from core Distributed software models on network-on-chip (NoC) 4 → 2 and the other from 3 → 8 , as depicted by the processor architectures provide significant advancements but lines with arrows. When sent at the same time, contention also challenges for real-time systems. These advancements on the link 4 → 5 (depicted as a thick link in the NoC come from simplifications in processor cores that result mesh) results in a delay for one of these messages due to in increased accuracy of static timing analysis, simplified arbitration within the NoC hardware routers. (Packets are scheduling algorithms due to an abundance of cores, and not interleaved as an open virtual channel monopolize links synchronization free data resource models implemented between endpoints.) As a result, sending tasks experience through explicit inter-process communication (IPC) in the highly variable latencies. Such variability can be reduced form of messages. Due to these advancements, this processor or even eliminated when tasks are layed out intelligently to architecture is seeing increased use in hard real-time systems lower or even completely avoid contention, respectively. The such as in [24] where the authors explore real-time hazard effect shown in this example is amplified as the size of NoC meshes increases resulting in larger paths through networks This work was supported in part by NSF grants CNS-0720496 and CNS- and communication that is more frequent. 0905181

Low Contention Mapping of Real-Time Tasks onto a TilePro 64 Core - PDF document

Low Contention Mapping of Real-Time Tasks onto a TilePro 64 Core Processor Christopher Zimmer and Frank Mueller North Carolina State University, Raleigh, NC 27695-8206, mueller@cs.ncsu.edu Abstract Predictability of task execution is paramount

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Contention-Related Crash Failures Anas Durand LIP6, Sorbonne Universit, Paris April 1st,

Mapping data Representing data with maps Geographic analysis tasks Mapping where things are

Real Real- -Time Systems Time Systems Low- Low -level programming level programming Low-

Real- Real -Time Systems Time Systems Real- -Time Systems Time Systems Real

Real Real- -Time Systems Time Systems Designing a real- Designing a real -time system time

Real- Real -time systems time systems Real- Real -time programming time programming

Onto lo gy Co nstruc tio n fro m Online Onto lo gie s Harith Alani 15 th Int. World Wide Web

Time Management Beth Asbury Outline Time Bandits Scheduling tasks Prioritising tasks

Real graduates, Real graduates, real transitions, real transitions, real stories: real

Scheduling Aperiodic Tasks Background Scheduling Treat aperiodic tasks as lowest-priority

Advanced Texturing Environment Mapping Environment Mapping reflections Environment Mapping

Texture Mapping Texture Mapping 1 Texture Mapping Texture Mapping Motivation Motivation:

Texture Mapping Surface mapping OpenGl and Implementation Details Texture mapping Bump

URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public

Analytical Performance Modeling of Hierarchical Interconnect Fabrics Nikita Nikitin, Javier de

Tanima Dey Wei Wang, Jack W. Davidson, Mary L. Soffa e a g, Jac a dso , a y So a Department

COOPERATION INSTEAD OF CONTENTION! THE NEBULOUS CONCEPT OF WIRELESS LINK. Network

What well talk about 2 ZSim has a full-featured memory system (originally designed for

On the Performance of Window-Based Contention Managers for Transactional Memory Gokarna Sharma

Interference-aware Scheduling for Data-processing Frameworks in Container-based Clusters Miguel

Database Systems Do Not Scale to 1000 CPU Cores And Other Tales of the Macabre @ andy_pavlo 2