CUstom Built hEterogeneous Multi-core ArCHitecture design paradigm - - PowerPoint PPT Presentation

custom built heterogeneous multi core architecture design
SMART_READER_LITE
LIVE PREVIEW

CUstom Built hEterogeneous Multi-core ArCHitecture design paradigm - - PowerPoint PPT Presentation

CUstom Built hEterogeneous Multi-core ArCHitecture design paradigm based simulator : Towards integrated design automation of supercomputing clusters WAran Research FoundaTion Introducing the User Guide Part I Custom Built Heterogeneous


slide-1
SLIDE 1

CUstom Built hEterogeneous Multi-core ArCHitecture design paradigm based simulator : Towards integrated design automation of supercomputing clusters

WAran Research FoundaTion

slide-2
SLIDE 2

Introducing the User Guide

Part I

Custom Built Heterogeneous Multi-core architecture design paradigm

WAran Research FoundaTion

slide-3
SLIDE 3

CUBEMACH Design Paradigm

  • Simultaneous execution of multiple

applications without space-time sharing – for increased resource utilization

  • Inter-core heterogeneity and Intra-core

heterogeneity

WAran Research FoundaTion

slide-4
SLIDE 4

CUBEMACH Design Space

Architecture Space

Algorithm Level Functional Units Pseudo Compiler

  • n Silicon

Algorithm Level ISA On Core Network Memory Architecture

WAran Research FoundaTion

slide-5
SLIDE 5

Algorithm Level Functional Units (ALFU) / Algorithm Level Instruction Set Architecture(ALISA)

  • ALFUs

– Hardwired Functional units executing algorithms (KL graph partitioning, Crout’s algorithm etc) of small problem size

  • ALISA

– Algorithm Level instructions triggering the execution of an ALFU

WAran Research FoundaTion

slide-6
SLIDE 6

Algorithm Level Functional Units (ALFU) / Algorithm Level Instruction Set Architecture(ALISA)

  • Advantages:

– Reduced instruction fetches – Reduced control signal generation – Reduced cache misses – Reduced compilation time – Increased performance

WAran Research FoundaTion

slide-7
SLIDE 7

pseudo Compiler On Silicon:

Dynamic code generator-cum-scheduler for simultaneous multiple application execution

  • Hardware code-generator and scheduler – to

cope up with high instruction generation and issue rate

  • Table based code generator – customizable

with respect to architecture

WAran Research FoundaTion

slide-8
SLIDE 8

pseudo Compiler On Silicon:

Dynamic code generator-cum-scheduler for simultaneous multiple application execution

PCOS

Instruction (Sub-Libraries) Instruction

Application (Libraries)

(Sub-Libraries)

SCOS SCOS

Hierarchical Compiler Primary COS – Converts application in the form of libraries to sub-libraries Secondary COS – Converts sub- libraries to instructions

WAran Research FoundaTion

slide-9
SLIDE 9

On Core Network: High Bandwidth NoC architecture

  • Designed for varying and high bandwidth

requirements of Algorithm Level Functional Units

  • Uses hierarchical and scalable Multistage

Interconnect Network (MIN) for reduced hardware complexity & power consumption

  • Self routing techniques employed to reduce

power consumption

WAran Research FoundaTion

slide-10
SLIDE 10

Cache organization, Mapping and replacement for simultaneous multiple application

  • Increased Data mapping required – for

simultaneous execution of multiple applications without space-time sharing

  • Scheduler – Memory integrated mapping

scheme adopted

  • Advanced Replacement strategy for

CUBEMACH design paradigm adopted →

WAran Research FoundaTion

slide-11
SLIDE 11

Sample CUBEMACH Architecture

WAran Research FoundaTion

slide-12
SLIDE 12

Optimization of heterogeneous multi- core architecture parameters

  • Exhaustive Design Space Exploration not

possible – due to very large design space

  • Optimizer employs

– Simulated Annealing to find architectures matching input specification – Game Theory to choose parameters to be perturbed – KL graph partitioning to group highly communication Functional Units

WAran Research FoundaTion

slide-13
SLIDE 13

BENSIM (BENchmark SIMulator): Application Cloning and benchmarking CUBEMACH based architecture

  • Application modeled as a graph

– Algorithms form the nodes of a graph – Edges forms the communication across algorithms

  • By choosing suitable algorithms for the nodes

any application can be cloned based on their communication and computation pattern

WAran Research FoundaTion

slide-14
SLIDE 14

BENSIM (BENchmark SIMulator): Application Cloning and benchmarking CUBEMACH based architecture

WAran Research FoundaTion

slide-15
SLIDE 15

Input to CUBEMACH simulator

Application ALISA based workload User heuristics Application Clone Pseudo CUBEMACH Language Pseudo Language Compiler Application in terms of ALISA CUBEMACH Simulator SAGT based

  • ptimization

Optimized architecture Input specification met? SAGT – Simulated Annealing + Game Theory No Yes

To be added in future

WAran Research FoundaTion

slide-16
SLIDE 16

Input to CUBEMACH simulator

  • ALISA based workloads (algorithms) provided

to user

  • User models his/her application as BENSIM

graphs using the given workloads

  • In future, provisions will be provided for users

to code their applications in the pseudo CUBEMACH language

WAran Research FoundaTion

slide-17
SLIDE 17

Comparison of CUBEMACH simulator with other simulators

WAran Research FoundaTion

slide-18
SLIDE 18

Part II

Custom Built Heterogeneous Multi-core Architecture design paradigm based simulator

WAran Research FoundaTion

slide-19
SLIDE 19

CUBEMACH Simulator Architecture

Architectural Structure Generation CLOCK GENERATOR EVENT HANDLER Simulation Results Dump

COS SUB-SIMULATOR ONNET SUB-SIMULATOR ALFU SUB-SIMULATOR MEMORY SUB-SIMULATOR

LOG

Parameter Values of the heterogeneous architecture

WAran Research FoundaTion

slide-20
SLIDE 20

Parameter Value Selection

  • GUI is provided to the user to enter the architecture

parameter values grouped into various tabs

To understand what these parameters mean, read the user guide here

WAran Research FoundaTion

slide-21
SLIDE 21

Application Input Selection – User generated workload

  • The application clone (BENSIM graph) developed is given as

an adjacency matrix

  • In the row and the column labels, enter the IDs corresponding

to the algorithms

  • Enter the adjacency matrix in the text area

WAran Research FoundaTion

slide-22
SLIDE 22

Running the CUBEMACH simulator

  • The simulator allows the user to check for the

validity of the architecture parameter values

  • After user checks are completed close the UI

to start the simulation

WAran Research FoundaTion