MaPU: A Novel Mathematical Computing Architecture Shashank Kedia - - PowerPoint PPT Presentation

▶

Jan 13, 2024 363 likes •533 views

MaPU: A Novel Mathematical Computing Architecture Shashank Kedia & Robert Macy III 1 Why MaPU? High performance CPUs and GPUs have good theoretical performance but low power efficiency relative to performance Superscalar and

SLIDE 1

MaPU: A Novel Mathematical Computing Architecture

Shashank Kedia & Robert Macy III

SLIDE 2

High performance CPUs and GPUs have good theoretical performance but low power efficiency relative

to performance

Superscalar and GPGPU have been proven to be power inefficient
Most systems operate at 60% of peak performance
Supercomputers using thousands of processors have massive power and space requirements
Develop a chip that can do mathematical calculations at a good performance to power ratio relative to

gpus and cpus

Why MaPU?

SLIDE 3

Architecture overview

Three main components:

Scalar Pipeline: communicates

with the system on chip and controls microcode pipeline.

Microcode Pipeline: Consists of

functional units (FUs) defining data flow.

Multi-Granularity Parallel

Memory System (MGP) allows efficient custom data access patterns.

SLIDE 4

Architecture Details: MGP Memory System

MGP allows efficient data access patterns. Given parameters W, the number of bytes that can be accessed in parallel, N, the total capacity in bytes, and G, the number of bytes available for reading/writing, the memory system can be partitioned to define memory accesses. Physical banks combine to form logic banks. Each logic bank consists of G physical banks.

SLIDE 5

Architecture Details: MGP Memory System (matrix accesses)

Matrices can be accessed in row or column

rder.

Matrix accesses in MGP requires storing the i-th row in the i mod W-th logic bank. Rows can be accessed by setting G=W and columns by setting G=1.

SLIDE 6

Architecture Details: Cascading pipeline with state machine-based program model

Dataflow can change to fit desired algorithm Facilitated by customizing FUs used and their interactions via microcode. State machines can be used to describe each FU Allows easier FU organization, user specifies each FU state machine and a final state machine specifying delays for ensuring appropriate execution

rder.

SLIDE 7

Architecture Details: SoC Architecture

Overview of tape-out design implemented by authors. APE (Algebraic Processing Engine) refers to the MaPU cores. CSU is a DMA controller.

SLIDE 8

Results: Comparison with C66x

All comparisons shown here are in simulations APE runs at 1GHz and C66x at 1.25 GHz

SLIDE 9

Result: Power Usage

SLIDE 10

Results: Power Usage

Figure 15 in the paper seems to be incorrect and a copy of Figure 14

SLIDE 11

Results: Comparison with other processors

Source: M. H. Ionica and D. Gregg, “The movidius myriad architecture’s potential for scientific computing,” Micro, IEEE, vol. 35, no. 1, pp. 6–14, 2015 11

SLIDE 12

Results: Microcode Statistics

SLIDE 13

Conclusion

Introduces a new architecture for fast and efficient matrix-related computations. Defines a process for molding architecture to specific uses via defining state machines in microcode pipeline. Demonstrates an improvement in power efficiency over CPUs/GPUs. Few points for comparison against competing architectures.

SLIDE 14

Discussion

1. Does the amount of overhead (defining state machine) and compiler

ptimizations still make it better than an ASIC?

2. Is this as generic an architecture as claimed? 3. Are simulation results as useful given a physical chip tape out is there?

SLIDE 15