MaPU: A Novel Mathematical Computing Architecture
Shashank Kedia & Robert Macy III
1
MaPU: A Novel Mathematical Computing Architecture Shashank Kedia - - PowerPoint PPT Presentation
MaPU: A Novel Mathematical Computing Architecture Shashank Kedia & Robert Macy III 1 Why MaPU? High performance CPUs and GPUs have good theoretical performance but low power efficiency relative to performance Superscalar and
Shashank Kedia & Robert Macy III
1
to performance
gpus and cpus
2
Three main components:
with the system on chip and controls microcode pipeline.
functional units (FUs) defining data flow.
Memory System (MGP) allows efficient custom data access patterns.
3
MGP allows efficient data access patterns. Given parameters W, the number of bytes that can be accessed in parallel, N, the total capacity in bytes, and G, the number of bytes available for reading/writing, the memory system can be partitioned to define memory accesses. Physical banks combine to form logic banks. Each logic bank consists of G physical banks.
4
Matrices can be accessed in row or column
Matrix accesses in MGP requires storing the i-th row in the i mod W-th logic bank. Rows can be accessed by setting G=W and columns by setting G=1.
5
Dataflow can change to fit desired algorithm Facilitated by customizing FUs used and their interactions via microcode. State machines can be used to describe each FU Allows easier FU organization, user specifies each FU state machine and a final state machine specifying delays for ensuring appropriate execution
6
Overview of tape-out design implemented by authors. APE (Algebraic Processing Engine) refers to the MaPU cores. CSU is a DMA controller.
7
All comparisons shown here are in simulations APE runs at 1GHz and C66x at 1.25 GHz
8
9
Figure 15 in the paper seems to be incorrect and a copy of Figure 14
10
Source: M. H. Ionica and D. Gregg, “The movidius myriad architecture’s potential for scientific computing,” Micro, IEEE, vol. 35, no. 1, pp. 6–14, 2015 11
12
Introduces a new architecture for fast and efficient matrix-related computations. Defines a process for molding architecture to specific uses via defining state machines in microcode pipeline. Demonstrates an improvement in power efficiency over CPUs/GPUs. Few points for comparison against competing architectures.
13
1. Does the amount of overhead (defining state machine) and compiler
2. Is this as generic an architecture as claimed? 3. Are simulation results as useful given a physical chip tape out is there?
14
15