Automated Task Distribution in Multicore Network Processors using - - PowerPoint PPT Presentation

automated task distribution in multicore network
SMART_READER_LITE
LIVE PREVIEW

Automated Task Distribution in Multicore Network Processors using - - PowerPoint PPT Presentation

Automated Task Distribution in Multicore Network Processors using Statistical Analysis Arindam Mallik, Yu Zhang, Gokhan Memik Electrical Engineering and Computer Science Dept. Northwestern University Network Demand Gap Gap increases with the


slide-1
SLIDE 1

Automated Task Distribution in Multicore Network Processors using Statistical Analysis

Arindam Mallik, Yu Zhang, Gokhan Memik Electrical Engineering and Computer Science Dept. Northwestern University

slide-2
SLIDE 2

2008-1-9 2

Network Demand Gap

ANCS 2007

Gap increases with the time [Intel]

slide-3
SLIDE 3

The Path to ASIPs

Application Specific IC design

Costly Unpredictable

Fuels the rise of programmable devices or ASIPs

(Application Specific Instruction Processors)

Networking Multimedia Graphics

ASIPs

Architectures have been explored in great depth Modest progress on programming environments But, the success of users is dependent on their ability to

program effectively

ANCS 2007 3 2008-1-9

slide-4
SLIDE 4

2008-1-9 4

Why Network Processors ?

Traditional processors in networks

General-purpose CPU

Not fast enough to handle new link speeds

ASIC

Good performance, but lack flexibility. New applications

  • r protocols make the old processor obsolete

Frequent new applications

Solution: Network Processors

Programmable processors optimized for

networking applications

Reusability of the same processor core for

different network applications

ANCS 2007

slide-5
SLIDE 5

Overview

Chip Multiprocessors

Most current processor architectures Ideal for networking application

Data level parallelism Task level parallelism

Dominating from the start - Intel IXP

Low scalability of interconnect networks

Importance of local communication Uniform task distribution

ANCS 2007 5 2008-1-9

slide-6
SLIDE 6

2008-1-9 6

Outline

Introduction Click Router Architecture Statistical Task Allocation Results Conclusion

ANCS 2007

slide-7
SLIDE 7

2008-1-9 7

Modularity in Networking Apps

Presence of well defined data segments

(packets)

Independent packet processing Overlooked modularity

Set of independent tasks performed on each

packet - module

Majority of networking applications – collection of

standard modules (ttl, checksum calculation)

ANCS 2007

slide-8
SLIDE 8

2008-1-9 8

Click Architecture

Unit of processing

‘element’–(From/ToDevice, GetHeader, Discard, Count…) element encapsulates processing actions and state elements have input and output ports language level compositions of elements

Router configuration

directed graph of elements (cycles ok), connected by

‘connections’ (at ports)

Each packet follows connections Configuration string

parameters and initial state to instantiate an element

ANCS 2007

slide-9
SLIDE 9

2008-1-9 9

Click Configuration Example

Configuration checking the TTL value of a

packet

FromDevice(eth0) DecIPTTL ToDevice(eth1) Discard

ANCS 2007

slide-10
SLIDE 10

2008-1-9 10 Packet Source CheckIPHeader Strip(14)

IPv4 Router Example

Different destinations DropBroadcast0 DecIPTTL Discard Discard DropBroadcast1 DecIPTTL Discard Discard Packet Source CheckIPHeader Strip(14) StaticIP- Lookup Discard 8 Different sources ANCS 2007

slide-11
SLIDE 11

2008-1-9 11

Statistical Task Allocation

Systolic Array Architecture

Execution cores arranged in pipelined fashion Global communication using shared bus

Goal : Uniform Task Allocation

Automated Each core sends partially processed packet to the

next one

ANCS 2007

slide-12
SLIDE 12

2008-1-9 12

Module Distribution Algorithm

Profiling

Statistical Analysis of packet processing time

Streamlining

Find total execution time of a packet Use DFS on the element tree

Task Distribution

Assign elements to different stages/modules

Local optimization

ANCS 2007

slide-13
SLIDE 13

2008-1-9 13

Statistical Analysis of Packet Processing

Individual Elements

Executed for 5000 packets Execution time recorded for each packet Mean (μ) and standard deviation (σ) calculated

from the statistics

expression (μ+kσ) estimates variation of utilization

ANCS 2007

slide-14
SLIDE 14

2008-1-9 14

  • Prob. Distn. of IPv4 Elements

Processing time threshold Elements Mean (μ) SD (σ) μ μ +σ μ+2σ μ+3σ μ+4σ strip0 241.28 29.31 50 0.64 0.64 0.64 chkip0 713.01 59.77 50 0.64 0.64 0.64 0.64 RtLkUp 336.56 266.88 20.03 20.03 10.01 0.03 0.03 DBC0 212.30 21.18 34.32 28.57 1.29 0.18 0.18 DcTTL0 317.78 20.34 26.45 12.98 2.09

ANCS 2007

slide-15
SLIDE 15

2008-1-9 15

  • Prob. Distn. of IPv4 Router Stages

Processing time threshold Stages Mean (μ) SD (σ) μ μ+σ μ+2σ μ+3σ μ+4σ Stage0 227.38 24.14 35.06 20.00 3.64 0.00 0.00 Stage1 691.18 30.48 23.19 14.29 1.86 0.08 0.00 Stage2 500.43 29.52 27.18 24.31 5.66 0.11 0.11 Stage3 314.72 20.33 27.78 23.14 7.14 0.28 0.00

ANCS 2007

slide-16
SLIDE 16

2008-1-9 16

Optimized Strategies

Base Task Distribution - BTD

Uniform task allocation depending on the mean

execution time

Extended Task Distribution - ETD

Slack kσ added to estimated processing time

Selective Replication - SR

Replicate modules parallelize packet processing

Extended Selective Replication - ESR

Select elements with longer execution time

ANCS 2007

slide-17
SLIDE 17

2008-1-9 17 DecIPTTL CheckIPHeader Strip(14)

Module Distribution Illustration

Different destinations DropBroadcast0 DecIPTTL Discard Discard DropBroadcast1 DecIPTTL Discard Discard DecIPTTL CheckIPHeader Strip(14) StaticIP- Lookup Discard 8 Different sources

2 Stages 4 Stages

ANCS 2007

slide-18
SLIDE 18

2008-1-9 18

Relative Throughput Analysis

1 2 3 4 5 6 7 8 BTD ETD SR ESR

Relative Processor Throughput 2 4 8

ANCS 2007

Processor throughput for DRR application

slide-19
SLIDE 19

2008-1-9 19

Resource Utilization Analysis

70 75 80 85 90 95 100 2 4 8

Processor Utilization BTD ETD SR ESR

ANCS 2007

Resource utilization in DRR application

slide-20
SLIDE 20

2008-1-9 20

Contributions

Analyzed modularity in networking

applications using statistical methods

Proposed intelligent task allocation based on

variation in processing time

Generic nature of the task allocation method

applicable to CMP task distribution

ANCS 2007

slide-21
SLIDE 21

2008-1-9 21

Acknowledgements

Click Development Group Anonymous reviewers

THANK YOU yzh702@eecs.northwestern.edu

ANCS 2007