aks249 Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda - - PowerPoint PPT Presentation

aks249 parallel metropolis hastings walker sampling for
SMART_READER_LITE
LIVE PREVIEW

aks249 Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda - - PowerPoint PPT Presentation

aks249 Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield Topic : probability distribution across words (P(how) = 0.05, P(cow) = 0.001). Document : a list of tokens (how now brown cow). Topic model


slide-1
SLIDE 1

aks249

slide-2
SLIDE 2

Parallel Metropolis-Hastings-Walker Sampling for LDA

Xanda Schofield

 Topic: probability distribution across words (P(“how”) = 0.05, P(“cow”) = 0.001).  Document: a list of tokens (“how now brown cow”).  Topic model: a way of describing how a set of topics could generate

documents (e.g. Latent Dirichlet Allocation [Blei et. al., 2003]). Inferring topic models is HARD, SLOW, and DIFFICULT TO PARALLELIZE. Goal: to parallelize an optimized inference algorithm (MHW for LDA) efficiently for

  • ne consumer-grade computer.
slide-3
SLIDE 3

Parallel Metropolis-Hastings-Walker Sampling for LDA

Xanda Schofield

Gibbs Sampling [Griffiths et. al. 2004]: O(number of iterations * number of tokens * number of topics) Metropolis-Hastings-Walker Sampling [Li et. al. 2014]: O(number of iterations * number of tokens * number of topics in a token’s document) Needed for computations:

 Nkd: tokens in document d assigned to topic k  Nwk: tokens of word w assigned to topic k  qw: cached sampled topics for word w  A few user-set parameters

slide-4
SLIDE 4

Parallel Metropolis-Hastings-Walker Sampling for LDA

Xanda Schofield

How we do it:

 Split documents across processors (Nkd)  Keep updated Nwk

 Share Nwk  Synchronize Nwk each iteration  Gossip Nwk to a random processor each iteration

 Keep valid qw

 Share  Make per-processor

Measuring comparative performance and held-out likelihood with # processors

slide-5
SLIDE 5

ers273

slide-6
SLIDE 6

An Analysis of the CPU/ GPU Unified Memory Model

Eston Schweickart

slide-7
SLIDE 7

Unified Memory

slide-8
SLIDE 8

3 Contexts

  • Multi-stream Cross-device Mapping
  • Basic, intended use case for UM
  • Big integer addition
  • Linked lists: hard to transfer
  • Nonlinear Exponential Time Integration (Michels et al

2014)

  • Both GPU and CPU bound computation, nontrivial

implementation

slide-9
SLIDE 9

Analysis

  • Ease of implementation
  • Lines of code
  • Required concepts
  • Performance
  • Memory Transfer Optimizations?
slide-10
SLIDE 10

Results

  • UM is best as an introductory concept
  • Removes burden of explicit memory transfer
  • UM is hard to optimize
  • No control over data location
  • Recommend: compiler hints, better profiling

tools

slide-11
SLIDE 11

fno2

slide-12
SLIDE 12
slide-13
SLIDE 13

Provide insight + maximize privacy

slide-14
SLIDE 14

Lifestreams (format) Bolt (chunk) CryptDB (encrypt)

slide-15
SLIDE 15

™ Built Lifestreams DPU ™ Encrypted database & queries ™ Demoed visualizations ™ Developed 3rd party API

slide-16
SLIDE 16

fz84

slide-17
SLIDE 17

Timing Channel Mitigation in Scheduler

a Case Study of GHC Fan Zhang

  • Dept. of Computer Science
Cornell University

December 4, 2014

Fan Zhang ( Dept. of Computer Science Cornell University ) Timing Channel Mitigation in Scheduler December 4, 2014 1 / 9
slide-18
SLIDE 18

Timing Channel

in Scheduler

A timing channel is a secret channel for passing unauthorized information, which is encoded in certain timing information

E.g.: Cache timing: response time of a memory access can reveal information about whether the page is in cache or not by observing running time of AES encryption thread, one can guess AES key.

In OS, scheduler is an main source of secret information leakage

Fan Zhang ( Dept. of Computer Science Cornell University ) Timing Channel Mitigation in Scheduler December 4, 2014 2 / 9
slide-19
SLIDE 19

Timing Channel

in Scheduler

Consider round-robin scheduling with epoch T Ideally, context switch happens at nT. However, for many reasons, context switch happens at nT + δ (where δ > 0 is a random variable) as at nT,

thread is performing atomic operation (uninterruptable) interrupt is disabled (so timer interrupt is ignored)

δ is exploitable to pass secret information (even don’t know how) H = − pi log(pi)

Fan Zhang ( Dept. of Computer Science Cornell University ) Timing Channel Mitigation in Scheduler December 4, 2014 3 / 9
slide-20
SLIDE 20

Problem

How to measurement δ in a real world scheduler (GHC)?

Glasgow Haskell Compiler, is a state-of-the-art, open source compiler and interactive environment for the functional language Haskell.

How to mitigate this timing channel, i.e. eliminate δ

Fan Zhang ( Dept. of Computer Science Cornell University ) Timing Channel Mitigation in Scheduler December 4, 2014 4 / 9
slide-21
SLIDE 21

Problem I

How to measurement δ in a real world scheduler (GHC)? Probe GHC → break GHC scheduler down → read GHC code.. Workload dependent? Three samples: calc: an encryption thread which is computation intensive get: a networking thread retrieving files via HTTP file: a thread reading and writing files on disk

Fan Zhang ( Dept. of Computer Science Cornell University ) Timing Channel Mitigation in Scheduler December 4, 2014 5 / 9
slide-22
SLIDE 22

Results

2 4 6 8 10 12 14 16 18 20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 time (ms) CDF CDF of δ file get calc

(a) CDF of δ

2 4 6 8 10 12 14 16 18 20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 time (ms) CDF Power law fit for δ CDF
  • riginal data
power law fit

(b) Power law fit (p = ax−b + c)

calc file get Delay(δ) 6.05 14.52 49.82

Table: Timing channel capacity H = − pi log(pi) (bit/s)

Fan Zhang ( Dept. of Computer Science Cornell University ) Timing Channel Mitigation in Scheduler December 4, 2014 6 / 9
slide-23
SLIDE 23

Problem

How to mitigate this timing channel, i.e. eliminate δ Algorithm 1: Incremental round-robin schedule Data: initial time slice T0, and incremental value b, timer t tc = t.expiration if current thread can be switched ∨ tc ≥ T0/2 then set context switch = 1 reset t.expiration = T0 else reset t.expiration = tc + b

Fan Zhang ( Dept. of Computer Science Cornell University ) Timing Channel Mitigation in Scheduler December 4, 2014 7 / 9
slide-24
SLIDE 24

Results

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 b (ms) capacity decrease (%) file get calc

Figure: y = x

Fan Zhang ( Dept. of Computer Science Cornell University ) Timing Channel Mitigation in Scheduler December 4, 2014 8 / 9
slide-25
SLIDE 25

Conclusion

Timing channel exists in scheduler, δ is an example δ can be approximated by power law distribution! I/O-bound threads tend to leak more information in its schedule footprint. Though the simulation shows that incremental round robin scheduling can effectively erase information entropy, timing channel mitigation is a difficult problem.

Fan Zhang ( Dept. of Computer Science Cornell University ) Timing Channel Mitigation in Scheduler December 4, 2014 9 / 9
slide-26
SLIDE 26

gjm97

slide-27
SLIDE 27

PROOF OF STAKE IN THE ETHEREUM DECENTRALIZED STATE MACHINE

Galen Marchetti

slide-28
SLIDE 28

Decentralized State Machine

¨ Ethereum

¤ second-generation cryptocurrency – coins called “ether” ¤ Bitcoin Blockchain + Ethereum Virtual Machine

¨ Stakeholders in network purchase state transitions

¤ Mining fee includes cost per opcode in EVM ¤ Miners provide Proof-of-Work to register transitions in

blockchain

slide-29
SLIDE 29

Establishing Consensus

¨ Proof-Of-Work

¤ Consensus group is all CPU power in existence ¤ Miners solve crypto-puzzles ¤ Employed by Bitcoin, Namecoin, Ethereum ¤ Subject to 51% outsider attack

¨ Proof-Of-Stake

¤ Consensus group is all crypto-coins in the network ¤ Miners provide evidence of coin possession

slide-30
SLIDE 30

Mining Procedure

¨ Select parent block and “uncles” in blockchain ¨ Generate nonce and broadcast block header ¨ Nodes receiving empty header deterministically

select N pseudo-random stakeholders

¨ Each stakeholder signs blockheader and broadcasts

to network

¨ Last stakeholder adds state transistions, signs total

block with its own signature, broadcasts to network.

¨ Mining profit evenly distributed among stakeholders

and original node

slide-31
SLIDE 31

Evaluation

¨ Mining now requires several broadcast steps ¨ Use Amazon’s EC2 with geographically separate

nodes

¨ Measure time for pure Ethereum cluster to

propagate state transitions in blockchain

¨ Measure time for Proof of Stake Ethereum cluster to

propagate state transitions

slide-32
SLIDE 32

ica23

slide-33
SLIDE 33

Multicast Channelization for RDMA

Isaac Ackerman

slide-34
SLIDE 34

Channelization

  • Routing multicast traffic is difficult
  • Share resource for highest performance
  • Verbs interface
  • Pre-allocate buffers
  • Sender needs buffer descriptor

RDMA

slide-35
SLIDE 35

Model

  • Cost for each send/receive
  • Cost for client to hold buffers open
  • Cost to coordinate senders
slide-36
SLIDE 36

Existing Solutions

  • Clustering
  • MCMD

Doesn't consider memory consumption

10 20 30 40 50 60 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Naive MCMD

NYSE simple IBM simple

Channels Relative Cost to Unlimited Channels

slide-37
SLIDE 37

Fixes

  • Consider different MTU

– Pseudopolynomial – Still slow

  • Using channels incurs memory cost

– Cautiously introduce new channels

slide-38
SLIDE 38

Results

5 10 15 20 25 30 35 40 45 50 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

Channelization Extensions

NYSE Simple NYSE MTU NYSE Passive IBM Simple IBM MTU IBM Passive

Channels Relative Cost

slide-39
SLIDE 39

Adaptivity

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0.5 1 1.5 2 2.5 3

Adaptability of Channelization Solutions

Passive 10 Channel Passive 20 Channel Simple 10 Channels Simple 20 Channels

Solution Staleness (ms) Cost

slide-40
SLIDE 40

Future Work

  • Making use of unused channels
  • Incorporating UDP for low rate flows
  • Reliability, Congestion Control
slide-41
SLIDE 41

km676

slide-42
SLIDE 42

Kai Mast

slide-43
SLIDE 43

The Brave New World of NoSQL

  • Key-Value Store – Is this Big Data?
  • Document Store – The solution?
  • Eventual Consistency – Who wants this?
slide-44
SLIDE 44

HyperDex

  • Hyperspace Hashing
  • Chain-Replication
  • Fast & Reliable
  • Imperative API
  • But...Strict Datatypes
slide-45
SLIDE 45
slide-46
SLIDE 46

ktc34

slide-47
SLIDE 47

Transaction Rollback in Bitcoin

  • Forking makes rollback unavoidable, but can

we minimize the loss of valid transactions?

Source: Bitcoin Developer Guide

slide-48
SLIDE 48

Motivation

  • Extended Forks

– August 2010 Overflow bug (>50 blocks) – March 2013 Fork (>20 blocks)

  • Partitioned Networks
  • Record double spends
slide-49
SLIDE 49

Merge Protocol

  • Create a new block combining the hash of

both previous headers

  • Add a second Merkle tree containing

invalidated transactions

– Any input used twice – Any output of an invalid transaction used as input

slide-50
SLIDE 50

Practicality (or: Why this is a terrible idea)

  • Rewards miners who deliberately fork the

blockchain

  • Cascading invalidations
  • Useful for preserving transactions when the

community deliberately forks the chain

– Usually means something else bad has happened

  • Useful for detecting double spends
slide-51
SLIDE 51

ml2255

slide-52
SLIDE 52

Vu ¡Pham ¡

Topology Prediction for Distributed Systems

Moontae Lee Department of Computer Science Cornell University December 4th, 2014

0/8 CS6410 Final Presentation

slide-53
SLIDE 53
  • People use various cloud services
  • Amazon / VMWare / Rackspace
  • Essential for big-data mining and learning

Introduction

CS6410 Final Presentation 1/8

slide-54
SLIDE 54
  • People use various cloud services
  • Amazon / VMWare / Rackspace
  • Essential for big-data mining and learning

without knowing how computer nodes are interconnected!

Introduction

CS6410 Final Presentation 1/8

??

slide-55
SLIDE 55
  • What if we can predict underlying topology?

Motivation

CS6410 Final Presentation 2/8

slide-56
SLIDE 56
  • What if we can predict underlying topology?
  • For computer system (e.g., rack-awareness for Map Reduce)

Motivation

CS6410 Final Presentation 2/8

slide-57
SLIDE 57
  • What if we can predict underlying topology?
  • For computer system (e.g., rack-awareness for Map Reduce)
  • For machine learning (e.g., dual-decomposition)

Motivation

CS6410 Final Presentation 2/8

slide-58
SLIDE 58
  • Let’s combine ML technique with computer system!

latency info à à topology

  • Assumptions
  • Topology structure is tree (even simpler than DAG)
  • Ping can provide useful pairwise latencies between nodes
  • Hypothesis
  • Approximately knowing the topology is beneficial!

How?

CS6410 Final Presentation 3/8

Black ¡Box ¡

slide-59
SLIDE 59
  • Unsupervised hierarchical agglomerative clustering

à

Merge the closest two nodes every time!

Method

CS6410 Final Presentation 4/8 Outline

a ¡ b ¡ c ¡ d ¡ e ¡ f ¡ a ¡ 0 ¡ 7 ¡ 8 ¡ 9 ¡ 8 ¡ 11 ¡ b ¡ 7 ¡ 0 ¡ 1 ¡ 4 ¡ 4 ¡ 6 ¡ c ¡ 8 ¡ 1 ¡ 0 ¡ 4 ¡ 4 ¡ 5 ¡ d ¡ 9 ¡ 5 ¡ 5 ¡ 0 ¡ 1 ¡ 3 ¡ e ¡ 8 ¡ 5 ¡ 5 ¡ 1 ¡ 0 ¡ 3 ¡ f ¡ 11 ¡ 6 ¡ 5 ¡ 3 ¡ 3 ¡ 0 ¡

slide-60
SLIDE 60

Sample Results (1/2)

CS6410 Final Presentation 5/8

slide-61
SLIDE 61

Sample Results (2/2)

CS6410 Final Presentation 6/8

slide-62
SLIDE 62
  • How to evaluate distance? (Euclidean vs other)
  • What is the linkage type? (single vs complete)
  • How to determine cutoff points? (most crucial)
  • How to measure the closeness of two trees?
  • Average hops two the lowest common ancestor
  • What other baselines?
  • K-means clustering / DP-means clustering
  • Greedy partitioning

Design Decisions

CS6410 Final Presentation 7/8

slide-63
SLIDE 63
  • Intrinsically (within simulator setting)
  • Compute the similarity with the ground-truth trees
  • Extrinsically (within real applications)
  • Short-lived (e.g., Map Reduce)
  • Underlying topology does not change drastically while running
  • Better performance by configuring with the initial prediction
  • Long-lived: (e.g., Streaming from sensors to monitor the powergrid)
  • Topology could change drastically when failures occur
  • Repeat prediction and configuration periodically
  • Stable performance even if the topology changes frequently

Evaluation

CS6410 Final Presentation 8/8

slide-64
SLIDE 64

nja39

slide-65
SLIDE 65

Noah Apthorpe

slide-66
SLIDE 66

 Commodity Ethernet

  • Spanning tree topologies
  • No link redundancy

A B

slide-67
SLIDE 67

 Commodity Ethernet

  • Spanning tree topologies
  • No link redundancy

A B

slide-68
SLIDE 68

 IronStack spreads packet flows over disjoint paths

  • Improved bandwidth
  • Stronger security
  • Increased robustness
  • Combinations of the three

A B

slide-69
SLIDE 69

 IronStack controllers must learn and monitor

network topology to determine disjoint paths

 One controller per OpenFlow switch  No centralized authority  Must adapt to switch joins and failures  Learned topology must reflect actual physical links

  • No hidden non-IronStack bridges
slide-70
SLIDE 70

 Protocol reminiscent of IP link-state routing  Each controller broadcasts adjacent links and port

statuses to all other controllers

  • Provides enough information to reconstruct network topology
  • Edmonds-Karp maxflow algorithm for disjoint path detection

 A “heartbeat” of broadcasts allows failure detection  Uses OpenFlow controller packet handling to

differentiate bridged links from individual wires

 Additional details to ensure logical update ordering

and graph convergence

slide-71
SLIDE 71

 Traffic at equilibrium

slide-72
SLIDE 72

 Traffic and time to topology graph convergence

slide-73
SLIDE 73

 Node failure and partition response rates

slide-74
SLIDE 74

 Questions?

slide-75
SLIDE 75

pj97

slide-76
SLIDE 76

Soroush Alamdari Pooya Jalaly

slide-77
SLIDE 77
slide-78
SLIDE 78
  • Distributed schedulers
slide-79
SLIDE 79
  • Distributed schedulers
  • E.g. 10,000 16-core machines, 100ms average processing times
  • A million decisions per second
slide-80
SLIDE 80
  • Distributed schedulers
  • E.g. 10,000 16-core machines, 100ms average processing times
  • A million decisions per second
  • No time to waste
  • Assign the next job to a random machine.
slide-81
SLIDE 81
  • Distributed schedulers
  • E.g. 10,000 16-core machines, 100ms average processing times
  • A million decisions per second
  • No time to waste
  • Assign the next job to a random machine.
  • Two choice method
  • Choose two random machines
  • Assign the job to the machine with smaller load.
slide-82
SLIDE 82
  • Distributed schedulers
  • E.g. 10,000 16-core machines, 100ms average processing times
  • A million decisions per second
  • No time to waste
  • Assign the next job to a random machine.
  • Two choice method
  • Choose two random machines
  • Assign the job to the machine with smaller load.
  • Two choice method works exponentially better than random

assignment.

slide-83
SLIDE 83
slide-84
SLIDE 84
  • Partitioning the machines among the schedulers
slide-85
SLIDE 85
  • Partitioning the machines among the schedulers
  • Reduces expected maximum latency
  • Assuming known rates of incoming tasks
slide-86
SLIDE 86
  • Partitioning the machines among the schedulers
  • Reduces expected maximum latency
  • Assuming known rates of incoming tasks
  • Allows for locality respecting assignment
  • Smaller communication time, faster decision making.
slide-87
SLIDE 87
  • Partitioning the machines among the schedulers
  • Reduces expected maximum latency
  • Assuming known rates of incoming tasks
  • Allows for locality respecting assignment
  • Smaller communication time, faster decision making.
  • Irregular patterns of incoming jobs
  • Soft partitioning
slide-88
SLIDE 88
  • Partitioning the machines among the schedulers
  • Reduces expected maximum latency
  • Assuming known rates of incoming tasks
  • Allows for locality respecting assignment
  • Smaller communication time, faster decision making.
  • Irregular patterns of incoming jobs
  • Soft partitioning
  • Modified two choice model
  • Probe a machine from within, one from outside
slide-89
SLIDE 89
slide-90
SLIDE 90
  • Simulated timeline
slide-91
SLIDE 91
  • Simulated timeline
  • Burst of tasks

No Burst Burst p p 1-p 1-p

slide-92
SLIDE 92
  • Simulated timeline
  • Burst of tasks
  • Metric response times

No Burst Burst p p 1-p 1-p 𝑇1 𝑇2 𝑁1 𝑁2

slide-93
SLIDE 93
  • Simulated timeline
  • Burst of tasks
  • Metric response times

No Burst Burst p p 1-p 1-p 𝑇1 𝑇2 𝑁1 𝑁2

slide-94
SLIDE 94
  • Simulated timeline
  • Burst of tasks
  • Metric response times

No Burst Burst p p 1-p 1-p 𝑇1 𝑇2 𝑁1 𝑁2

slide-95
SLIDE 95
  • Simulated timeline
  • Burst of tasks
  • Metric response times

No Burst Burst p p 1-p 1-p 𝑇1 𝑇2 𝑁1 𝑁2

slide-96
SLIDE 96

pk467

slide-97
SLIDE 97

Software-Defined Routing for Inter- Datacenter Wide Area Networks

Praveen Kumar

slide-98
SLIDE 98

Problems

1. Inter-DC WANs are critical and highly expensive 2. Poor efficiency - average utilization over time of busy links is only 30-50% 3. Poor sharing - little support for flexible resource sharing MPLS Example: Flow arrival order: A, B, C; each link can carry at most one flow * Make smarter routing decisions - considering the link capacities and flow demands

Source: Achieving High Utilization with Software-Driven WAN, SIGCOMM 2013

slide-99
SLIDE 99

Merlin: Software-Defined Routing

  • Merlin Controller

○ MCF solver ○ RRT generation

  • Merlin Virtual Switch (MVS) - A modular software switch

○ Merlin ■ Path: ordered list of pathlets (VLANs) ■ Randomized source routing ■ Push stack of VLANs ○ Flow tracking ○ Network function modules - pluggable ○ Compose complex network functions from primitives

slide-100
SLIDE 100

Some results

No VLAN Stack Open vSwitch CPqD ofsoftswitch13 MVS 941 Mbps 0 (N/A) 98 Mbps 925 Mbps

SWAN topology *

  • Source: Achieving High Utilization with Software-Driven WAN, SIGCOMM 2013
slide-101
SLIDE 101

rmo26

slide-102
SLIDE 102

AMNESIA-FREEDOM AND EPHEMERAL DATA

CS6410 December 4, 2014

1

slide-103
SLIDE 103

Ephemeral Data

¨ “Overcoming CAP” describes using soft-state

replication to keep application state in the first-tier

  • f the cloud.

¨ Beyond potential performance advantages, this

architecture may be the basis for “ephemerality” wherein data is intended to disappear.

¨ “Subpoena-freedom” ¨ No need to wipe disks, just restart your instances.

2

slide-104
SLIDE 104

Cost and Architecture

¨ “Overcoming CAP” does not address questions of

cost.

¨ Using reliable storage to preserve state has

significant cost consequences.

¨ First goal of this project is to produce a model of

the cost with cloud architecture choices.

¨ Key cost drivers: compute hours, data movement,

storage.

3

slide-105
SLIDE 105

Performance Numbers

¨ “Overcoming CAP” claims but does not demonstrate

superior performance with the amnesia-free approach.

¨ Second goal of this project is to compare

performance in live systems.

¨ A cost-determined amnesia-free architecture

compared against architectures that rely on reliable storage.

4

slide-106
SLIDE 106

sh954

slide-107
SLIDE 107

Soroush Alamdari Pooya Jalaly

slide-108
SLIDE 108
slide-109
SLIDE 109
  • Distributed schedulers
slide-110
SLIDE 110
  • Distributed schedulers
  • E.g. 10,000 16-core machines, 100ms average processing times
  • A million decisions per second
slide-111
SLIDE 111
  • Distributed schedulers
  • E.g. 10,000 16-core machines, 100ms average processing times
  • A million decisions per second
  • No time to waste
  • Assign the next job to a random machine.
slide-112
SLIDE 112
  • Distributed schedulers
  • E.g. 10,000 16-core machines, 100ms average processing times
  • A million decisions per second
  • No time to waste
  • Assign the next job to a random machine.
  • Two choice method
  • Choose two random machines
  • Assign the job to the machine with smaller load.
slide-113
SLIDE 113
  • Distributed schedulers
  • E.g. 10,000 16-core machines, 100ms average processing times
  • A million decisions per second
  • No time to waste
  • Assign the next job to a random machine.
  • Two choice method
  • Choose two random machines
  • Assign the job to the machine with smaller load.
  • Two choice method works exponentially better than random

assignment.

slide-114
SLIDE 114
slide-115
SLIDE 115
  • Partitioning the machines among the schedulers
slide-116
SLIDE 116
  • Partitioning the machines among the schedulers
  • Reduces expected maximum latency
  • Assuming known rates of incoming tasks
slide-117
SLIDE 117
  • Partitioning the machines among the schedulers
  • Reduces expected maximum latency
  • Assuming known rates of incoming tasks
  • Allows for locality respecting assignment
  • Smaller communication time, faster decision making.
slide-118
SLIDE 118
  • Partitioning the machines among the schedulers
  • Reduces expected maximum latency
  • Assuming known rates of incoming tasks
  • Allows for locality respecting assignment
  • Smaller communication time, faster decision making.
  • Irregular patterns of incoming jobs
  • Soft partitioning
slide-119
SLIDE 119
  • Partitioning the machines among the schedulers
  • Reduces expected maximum latency
  • Assuming known rates of incoming tasks
  • Allows for locality respecting assignment
  • Smaller communication time, faster decision making.
  • Irregular patterns of incoming jobs
  • Soft partitioning
  • Modified two choice model
  • Probe a machine from within, one from outside
slide-120
SLIDE 120
slide-121
SLIDE 121
  • Simulated timeline
slide-122
SLIDE 122
  • Simulated timeline
  • Burst of tasks

No Burst Burst p p 1-p 1-p

slide-123
SLIDE 123
  • Simulated timeline
  • Burst of tasks
  • Metric response times

No Burst Burst p p 1-p 1-p 𝑇1 𝑇2 𝑁1 𝑁2

slide-124
SLIDE 124
  • Simulated timeline
  • Burst of tasks
  • Metric response times

No Burst Burst p p 1-p 1-p 𝑇1 𝑇2 𝑁1 𝑁2

slide-125
SLIDE 125
  • Simulated timeline
  • Burst of tasks
  • Metric response times

No Burst Burst p p 1-p 1-p 𝑇1 𝑇2 𝑁1 𝑁2

slide-126
SLIDE 126
  • Simulated timeline
  • Burst of tasks
  • Metric response times

No Burst Burst p p 1-p 1-p 𝑇1 𝑇2 𝑁1 𝑁2

slide-127
SLIDE 127

vdk23

slide-128
SLIDE 128

FPGA Packet Processor

For IronStack

Vasily Kuksenkov

slide-129
SLIDE 129

Problem

  • Power grid operators use an intricate feedback system for stability
  • Run using microwave relays and power cable signal multiplexing
  • Data network issues

○ Vulnerable to attacks ○ Vulnerable to disruptions ○ Low capacity links

  • Solution: switch to simple Ethernet
slide-130
SLIDE 130

Problem

  • Ethernet employs a loop-free topology

○ Hard to use link redundancies ○ Failure recovery takes too long

  • Solution: IronStack SDN

○ Uses redundant network paths to improve ■ Bandwidth/Latency ■ Failure Recovery ■ Security

slide-131
SLIDE 131

Problem

  • Packet Processing

○ Cannot be done on the switch ○ Cannot be done at line rate (1-10Gbps) on the controller

  • Solution: NetFPGA as a middle-man

○ Controller sets up routing rules and signals NetFPGA ○ Programmed once, continues to work ○ Scalable, efficient, cost-effective

slide-132
SLIDE 132

Implementation/Analysis

  • Improvements in

○ Bandwidth (RAIL 0) ○ Latency (RAIL 1) ○ Tradeoffs (RAIL 6)

  • Future

○ Security ○ Automatic tuning

slide-133
SLIDE 133

Questions?

slide-134
SLIDE 134

vs442

slide-135
SLIDE 135 CS 6410 Project Presentation Dec 4, 2014

Studying the effect of traffic pacing on TCP throughput

Vishal Shrivastav Dec 4, 2014

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 1 / 10
slide-136
SLIDE 136 CS 6410 Project Presentation Dec 4, 2014

Motivation

Burstiness: clustering of packets on the wire Pacing: making the inter packet gaps uniform

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 2 / 10
slide-137
SLIDE 137 CS 6410 Project Presentation Dec 4, 2014

Motivation

Burstiness: clustering of packets on the wire Pacing: making the inter packet gaps uniform TCP traffic tends to be inherently bursty

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 2 / 10
slide-138
SLIDE 138 CS 6410 Project Presentation Dec 4, 2014

Motivation

Burstiness: clustering of packets on the wire Pacing: making the inter packet gaps uniform TCP traffic tends to be inherently bursty

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 2 / 10
slide-139
SLIDE 139 CS 6410 Project Presentation Dec 4, 2014

Motivation

Burstiness: clustering of packets on the wire Pacing: making the inter packet gaps uniform TCP traffic tends to be inherently bursty Other potential benefits of pacing Better short-term fairness among flows of similar RTTs May allow much larger initial congestion window to be used safely

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 2 / 10
slide-140
SLIDE 140 CS 6410 Project Presentation Dec 4, 2014

Previous works

Focused on implementing pacing at the transport layer Some major limitations of that approach Less precision - Not very fine granular control of the flow NIC features like TCP segment offload lead to batching and short-term packet bursts

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 3 / 10
slide-141
SLIDE 141 CS 6410 Project Presentation Dec 4, 2014

Key Insight

Implement pacing at the PHY layer

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 4 / 10
slide-142
SLIDE 142 CS 6410 Project Presentation Dec 4, 2014

Key Insight

Implement pacing at the PHY layer Problem: commodity NICs do not provide software access to PHY layer

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 4 / 10
slide-143
SLIDE 143 CS 6410 Project Presentation Dec 4, 2014

Key Insight

Implement pacing at the PHY layer Problem: commodity NICs do not provide software access to PHY layer Solution: SoNIC [NSDI 2013]

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 4 / 10
slide-144
SLIDE 144 CS 6410 Project Presentation Dec 4, 2014

Key Insight

Implement pacing at the PHY layer Problem: commodity NICs do not provide software access to PHY layer Solution: SoNIC [NSDI 2013]

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 4 / 10
slide-145
SLIDE 145 CS 6410 Project Presentation Dec 4, 2014

Key Insight

Implement pacing at the PHY layer Problem: commodity NICs do not provide software access to PHY layer Solution: SoNIC [NSDI 2013]

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 4 / 10
slide-146
SLIDE 146 CS 6410 Project Presentation Dec 4, 2014

Implementation Challenges

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 5 / 10
slide-147
SLIDE 147 CS 6410 Project Presentation Dec 4, 2014

Implementation Challenges

Online Algorithm - No batching, one packet at a time

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 5 / 10
slide-148
SLIDE 148 CS 6410 Project Presentation Dec 4, 2014

Implementation Challenges

Online Algorithm - No batching, one packet at a time Very small packet processing time - simple algorithm, extremely fast implementation

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 5 / 10
slide-149
SLIDE 149 CS 6410 Project Presentation Dec 4, 2014

Implementation Challenges

Online Algorithm - No batching, one packet at a time Very small packet processing time - simple algorithm, extremely fast implementation Where to place pacing middleboxes in the network

Given a maximum of k pacing middleboxes, where should we place them in the network to achieve optimal throughput?

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 5 / 10
slide-150
SLIDE 150 CS 6410 Project Presentation Dec 4, 2014

Network topology for experiments

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 6 / 10
slide-151
SLIDE 151 CS 6410 Project Presentation Dec 4, 2014

Testing the behavior of pacing algorithm

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 7 / 10
slide-152
SLIDE 152 CS 6410 Project Presentation Dec 4, 2014

Experimental results

n : a value within [0,1], parameter for number of pkt bursts in a flow. p : a value within [0,1], parameter for the geometric dist. used to generate the number of packets within a pkt burst.

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 8 / 10
slide-153
SLIDE 153 CS 6410 Project Presentation Dec 4, 2014

Experimental results

n : a value within [0,1], parameter for number of pkt bursts in a flow. p : a value within [0,1], parameter for the geometric dist. used to generate the number of packets within a pkt burst.

Vishal Shrivastav, Cornell University Studying the effect of traffic pacing on TCP throughput 9 / 10
slide-154
SLIDE 154