Using TCP/IP Traffic shaping to achieve iSCSI service predictability - - PowerPoint PPT Presentation

using tcp ip traffic shaping to achieve iscsi service
SMART_READER_LITE
LIVE PREVIEW

Using TCP/IP Traffic shaping to achieve iSCSI service predictability - - PowerPoint PPT Presentation

Using TCP/IP Traffic shaping to achieve iSCSI service predictability Paper presentation Jarle Bjrgeengen University of Oslo / USIT November 11, 2010 Outline About resource sharing in storage devices Lab setup / job setup Experiment


slide-1
SLIDE 1

Using TCP/IP Traffic shaping to achieve iSCSI service predictability

Paper presentation Jarle Bjørgeengen

University of Oslo / USIT

November 11, 2010

slide-2
SLIDE 2

Outline

About resource sharing in storage devices Lab setup / job setup Experiment illustrating the problem One half of the solution: the throttle Live demo

The throttle Part two of the solution: the controller

How the controller works Conclusion and future work

slide-3
SLIDE 3

General problem of sharing resources

QoS bridge QoS bridge QoS bridge QoS bridge QoS bridge Consumers Shared physical resources SAN Virtual disks Centralized storage pool

Free competition causes unpredictable I/O performance for any given consumer.

slide-4
SLIDE 4

Lab setup

b HP SC10 10 x 36GB 10k vg_perc lv_b2 lv_b3 lv_b4 lv_b5 b2 b4 b5 b3 iSCSI target (iet) TCP Connections TCP/IP Layer Striped logical

  • volumes. 64KB

stripe size across 10 disks /dev/iscsi_0 iqn.iscsilab:perc_b2 iqn.iscsilab:perc_b3 iqn.iscsilab:perc_b4 iqn.iscsilab:perc_b5 Block Layer Block Layer

bm

b1 Argus

slide-5
SLIDE 5

Is read response time affected by write activity ?

b lv_b2 lv_b3 lv_b4 lv_b5 b2 b4 b5 b3

bm

Random read rate=256kB/s Seq write full speed Seq write full speed Seq write full speed

slide-6
SLIDE 6

The Answer is yes

Long response times adversely affect application service availability.

100 200 300 400 500

Time (s)

20 40 60 80 100 120

Wait time (ms)

No interference 1 thread (1 machines) 3 threads (3 machines) 12 threads (3 machines)

slide-7
SLIDE 7

Throttling method

SYN SYN+ACK

Initiator Target

ACK Write

Timeline without delay

ACK Write ACK Write ACK Write ACK Time SYN SYN+ACK

Initiator Target

ACK Write ACK Write ACK Write ACK Write ACK

Timeline with delay

Throttling delay

slide-8
SLIDE 8

Relation between packet delay and average rate

0.6 1.6 2.6 3.6 4.6 5.6 6.6 7.6 8.6 9.6 Introduced delay (ms) Time to read 200MB of data (s) 10 20 30 40 0.6 1.6 2.6 3.6 4.6 5.6 6.6 7.6 8.6 9.6 Introduced delay (ms) Time to write 200MB of data (s) 20 40 60 80

Write rate 15 MB/s - 2.5 MB/s Read rate 22 MB/s - 5 MB/s

slide-9
SLIDE 9

Managing consumers

Need to operate on sets of consumers (throttlable={10.0.0.243,10.0.0.244}) Ipset: One rule to match them all

✞ ☎

ipset -N $throttlable ipmap --network 10.0.0.0/24 ipset -A $throttlable 10.0.0.243 ipset -A $throttlable 10.0.0.244 iptables --match-set $throttlable dst -j MARK --set-mark $mark

✝ ✆

The mark is a step in the range of available packet delays

slide-10
SLIDE 10

Live demonstration

Manual throttling and QoS specification An automatic QoS policy and automated throttling

slide-11
SLIDE 11

Dynamic throttling decision

Figure: Block diagram of a PID controller. Created by SilverStar(at)en.wikipedia. Licensed under the terms of Creative Commons Attribution 2.5 Generic.

slide-12
SLIDE 12

Modified PID function

Start Stop

Calculate Up,Ui,Ud

Up = Kp × ek Ui = Uik−1 + Ts × Kp Ti × ek Ud = Kp × Td × ek − ek−1 Ts

0 < Ui < Ukmax Ui < 0 Ui > Ukmax

N N

Ui=0 Ui=Uik-1 Uk = Up+Ui+Ud 0 < Uk < Ukmax

Y Y

mark = int(ceil(Uk)) Uk < 0 Uk > Ukmax Uk=Ukmax Uk=0

N Y Y Y Y N

slide-13
SLIDE 13

The completely automated approach

ISCSIMAP set_maintaner.pl Create /proc/net/iet/sessions /proc/net/iet/volumes IP-sets Create & maintain members Read perf_maintainer.pl PDATA Read Saturation indicators /proc/diskstats Read pid_reg.pl Read pid_threads Read Spawn($resource) Throttles Files Shared memory Processes Legend: lvs Run Command Dependency Read output perf_server.pl CMEM Throttle values gnuplot

slide-14
SLIDE 14

Impact

The packet delay throttle is very efficient

Solves the throttling need completely for iSCSI (likely other TCP based storage networks too)

The modified PID controller is consistently keeping response time low in spite of rapidly changing load interference. The concept is widely applicable

slide-15
SLIDE 15

Impact

The packet delay throttle is very efficient

Solves the throttling need completely for iSCSI (likely other TCP based storage networks too)

The modified PID controller is consistently keeping response time low in spite of rapidly changing load interference. The concept is widely applicable

slide-16
SLIDE 16

Impact

The packet delay throttle is very efficient

Solves the throttling need completely for iSCSI (likely other TCP based storage networks too)

The modified PID controller is consistently keeping response time low in spite of rapidly changing load interference. The concept is widely applicable

slide-17
SLIDE 17

Future work

iSCSI disk array Ethernet sw. QoS bridge QoS bridge QoS bridge QoS bridge QoS bridge Consumers QoS bridge Resource/consumer maps Virtual disk latencies Array specific plugin SNMPGET

Packet delay throttle with other algorithms PID controller with other throttles

slide-18
SLIDE 18

Thanks for the attention !

slide-19
SLIDE 19

Overhead

Negligeble overhead introduced by TC filters Differences measured 20 times t-test 99% confidence shows 0.4% / 1.7 %• overhead for read/write (worst case)

slide-20
SLIDE 20

Is response time improved by throttling ?

100 200 300 400 500

Time (s)

20 40 60 80

Wait time (ms)

10000 20000 30000 40000 50000

Aggregated interference (kB/s)

Small job average wait time (Left) Interference aggregated throughput (Right). Throttling period with 4.6 ms delay Throttling period with 9.6 ms delay

slide-21
SLIDE 21

Automatically controlled wait time

100 200 300 400

Time (s)

20 40 60 80 100

Average wait time (ms)

No regulation 20 ms treshold 15 ms threshold 10 ms threshold

slide-22
SLIDE 22

The throttled rates

100 200 300 400 500

Time (s)

10000 20000 30000 40000 50000

Aggregate Write (kB/s)

No regulation 20 ms threshold (smoothed) 15 ms threshold (smoothed) 10 ms threshold (smoothed)

slide-23
SLIDE 23

Exposing the throttling value

50 100 150 200

Time (s)

10 20 30 40 50

(ms)

10000 20000 30000 40000

(kB/s)

vg_aic read wait time with automatic regulation, thresh=15ms Packet delay introduced to writers Aggregated write rate

slide-24
SLIDE 24

Effect of the packet delay throttle: Reads

50 100 150 200 250 300

Time (s)

5000 10000 15000 20000

Read (kB/s)

b2 b3 b4 b5

slide-25
SLIDE 25

Effect of the packet delay throttle: Writes

50 100 150 200 250 300

Time (s)

5000 10000 15000 20000

Write (kB/s)

b2 b3 b4 b5

slide-26
SLIDE 26

The tc delay queues

110: netem parent 1:10 limit 1000 delay 4.1ms 110:1 netem 111: netem parent 1:11 limit 1000 delay 4.6ms 111:1 netem 112: netem parent 1:12 limit 1000 delay 5.1ms 112:1 netem 113: netem parent 1:13 limit 1000 delay 5.6ms 113:1 netem 114: netem parent 1:14 limit 1000 delay 6.1ms 114:1 netem 115: netem parent 1:15 limit 1000 delay 6.6ms 115:1 netem 116: netem parent 1:16 limit 1000 delay 7.1ms 116:1 netem 117: netem parent 1:17 limit 1000 delay 7.6ms 117:1 netem 118: netem parent 1:18 limit 1000 delay 8.1ms 118:1 netem 119: netem parent 1:19 limit 1000 delay 8.6ms 119:1 netem 120: netem parent 1:20 limit 1000 delay 9.1ms 120:1 netem 121: netem parent 1:21 limit 1000 delay 9.6ms 121:1 netem 12: netem parent 1:2 limit 1000 delay 99us 12:1 netem 13: netem parent 1:3 limit 1000 delay 598us 13:1 netem 14: netem parent 1:4 limit 1000 delay 1.1ms 14:1 netem 15: netem parent 1:5 limit 1000 delay 1.6ms 15:1 netem 16: netem parent 1:6 limit 1000 delay 2.1ms 16:1 netem 17: netem parent 1:7 limit 1000 delay 2.6ms 17:1 netem 18: netem parent 1:8 limit 1000 delay 3.1ms 18:1 netem 19: netem parent 1:9 limit 1000 delay 3.6ms 19:1 netem 1: htb root r2q 10 default 1 direct_packets_stat 4399042 ver 3.17 1:10 htb prio 0 quantum 200000 rate 8000Mbit ceil 8000Mbit burst 0b/8 mpu 0b
  • verhead 0b
cburst 0b/8 mpu 0b
  • verhead 0b
level 0 1:11 htb prio 0 quantum 200000 rate 8000Mbit ceil 8000Mbit burst 0b/8 mpu 0b
  • verhead 0b
cburst 0b/8 mpu 0b
  • verhead 0b
level 0 1:12 htb prio 0 quantum 200000 rate 8000Mbit ceil 8000Mbit burst 0b/8 mpu 0b
  • verhead 0b
cburst 0b/8 mpu 0b
  • verhead 0b
level 0 1:13 htb prio 0 quantum 200000 rate 8000Mbit ceil 8000Mbit burst 0b/8 mpu 0b
  • verhead 0b
cburst 0b/8 mpu 0b
  • verhead 0b
level 0 1:14 htb prio 0 quantum 200000 rate 8000Mbit ceil 8000Mbit burst 0b/8 mpu 0b
  • verhead 0b
cburst 0b/8 mpu 0b
  • verhead 0b
level 0 1:15 htb prio 0 quantum 200000 rate 8000Mbit ceil 8000Mbit burst 0b/8 mpu 0b
  • verhead 0b
cburst 0b/8 mpu 0b
  • verhead 0b
level 0 1:16 htb prio 0 quantum 200000 rate 8000Mbit ceil 8000Mbit burst 0b/8 mpu 0b
  • verhead 0b
cburst 0b/8 mpu 0b
  • verhead 0b
level 0 1:17 htb prio 0 quantum 200000 rate 8000Mbit ceil 8000Mbit burst 0b/8 mpu 0b
  • verhead 0b
cburst 0b/8 mpu 0b
  • verhead 0b
level 0 1:18 htb prio 0 quantum 200000 rate 8000Mbit ceil 8000Mbit burst 0b/8 mpu 0b
  • verhead 0b
cburst 0b/8 mpu 0b
  • verhead 0b
level 0 1:19 htb prio 0 quantum 200000 rate 8000Mbit ceil 8000Mbit burst 0b/8 mpu 0b
  • verhead 0b
cburst 0b/8 mpu 0b
  • verhead 0b
level 0 1:2 htb prio 0 quantum 200000 rate 8000Mbit ceil 8000Mbit burst 0b/8 mpu 0b
  • verhead 0b
cburst 0b/8 mpu 0b
  • verhead 0b
level 0 1:20 htb prio 0 quantum 200000 rate 8000Mbit ceil 8000Mbit burst 0b/8 mpu 0b
  • verhead 0b
cburst 0b/8 mpu 0b
  • verhead 0b
level 0 1:21 htb prio 0 quantum 200000 rate 8000Mbit ceil 8000Mbit burst 0b/8 mpu 0b
  • verhead 0b
cburst 0b/8 mpu 0b
  • verhead 0b
level 0 1:3 htb prio 0 quantum 200000 rate 8000Mbit ceil 8000Mbit burst 0b/8 mpu 0b
  • verhead 0b
cburst 0b/8 mpu 0b
  • verhead 0b
level 0 1:4 htb prio 0 quantum 200000 rate 8000Mbit ceil 8000Mbit burst 0b/8 mpu 0b
  • verhead 0b
cburst 0b/8 mpu 0b
  • verhead 0b
level 0 1:5 htb prio 0 quantum 200000 rate 8000Mbit ceil 8000Mbit burst 0b/8 mpu 0b
  • verhead 0b
cburst 0b/8 mpu 0b
  • verhead 0b
level 0 1:6 htb prio 0 quantum 200000 rate 8000Mbit ceil 8000Mbit burst 0b/8 mpu 0b
  • verhead 0b
cburst 0b/8 mpu 0b
  • verhead 0b
level 0 1:7 htb prio 0 quantum 200000 rate 8000Mbit ceil 8000Mbit burst 0b/8 mpu 0b
  • verhead 0b
cburst 0b/8 mpu 0b
  • verhead 0b
level 0 1:8 htb prio 0 quantum 200000 rate 8000Mbit ceil 8000Mbit burst 0b/8 mpu 0b
  • verhead 0b
cburst 0b/8 mpu 0b
  • verhead 0b
level 0 1:9 htb prio 0 quantum 200000 rate 8000Mbit ceil 8000Mbit burst 0b/8 mpu 0b
  • verhead 0b
cburst 0b/8 mpu 0b
  • verhead 0b
level 0 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20)
slide-27
SLIDE 27

The tc bandwidth queues

1: htb root r2q 10 default 1 direct_packets_stat 4665509 ver 3.17 1:1 htb rate 1000Mbit ceil 1000Mbit burst 130875b/8 mpu 0b
  • verhead 0b
cburst 130875b/8 mpu 0b
  • verhead 0b
level 7 1:10 htb prio 0 quantum 200000 rate 550000Kbit ceil 550000Kbit burst 277612b/8 mpu 0b
  • verhead 0b
cburst 277612b/8 mpu 0b
  • verhead 0b
level 0 1:11 htb prio 0 quantum 200000 rate 500000Kbit ceil 500000Kbit burst 252500b/8 mpu 0b
  • verhead 0b
cburst 252500b/8 mpu 0b
  • verhead 0b
level 0 1:12 htb prio 0 quantum 200000 rate 450000Kbit ceil 450000Kbit burst 227418b/8 mpu 0b
  • verhead 0b
cburst 227418b/8 mpu 0b
  • verhead 0b
level 0 1:13 htb prio 0 quantum 200000 rate 400000Kbit ceil 400000Kbit burst 202350b/8 mpu 0b
  • verhead 0b
cburst 202350b/8 mpu 0b
  • verhead 0b
level 0 1:14 htb prio 0 quantum 200000 rate 350000Kbit ceil 350000Kbit burst 177231b/8 mpu 0b
  • verhead 0b
cburst 177231b/8 mpu 0b
  • verhead 0b
level 0 1:15 htb prio 0 quantum 200000 rate 300000Kbit ceil 300000Kbit burst 152137b/8 mpu 0b
  • verhead 0b
cburst 152137b/8 mpu 0b
  • verhead 0b
level 0 1:16 htb prio 0 quantum 200000 rate 250000Kbit ceil 250000Kbit burst 127062b/8 mpu 0b
  • verhead 0b
cburst 127062b/8 mpu 0b
  • verhead 0b
level 0 1:17 htb prio 0 quantum 200000 rate 200000Kbit ceil 200000Kbit burst 101975b/8 mpu 0b
  • verhead 0b
cburst 101975b/8 mpu 0b
  • verhead 0b
level 0 1:18 htb prio 0 quantum 200000 rate 150000Kbit ceil 150000Kbit burst 76875b/8 mpu 0b
  • verhead 0b
cburst 76875b/8 mpu 0b
  • verhead 0b
level 0 1:19 htb prio 0 quantum 200000 rate 100000Kbit ceil 100000Kbit burst 51787b/8 mpu 0b
  • verhead 0b
cburst 51787b/8 mpu 0b
  • verhead 0b
level 0 1:2 htb prio 0 quantum 200000 rate 950000Kbit ceil 950000Kbit burst 478325b/8 mpu 0b
  • verhead 0b
cburst 478325b/8 mpu 0b
  • verhead 0b
level 0 1:20 htb prio 0 quantum 200000 rate 50000Kbit ceil 50000Kbit burst 26693b/8 mpu 0b
  • verhead 0b
cburst 26693b/8 mpu 0b
  • verhead 0b
level 0 1:21 htb prio 0 quantum 200000 rate 45000Kbit ceil 45000Kbit burst 24181b/8 mpu 0b
  • verhead 0b
cburst 24181b/8 mpu 0b
  • verhead 0b
level 0 1:22 htb prio 0 quantum 200000 rate 35000Kbit ceil 35000Kbit burst 19162b/8 mpu 0b
  • verhead 0b
cburst 19162b/8 mpu 0b
  • verhead 0b
level 0 1:23 htb prio 0 quantum 200000 rate 25000Kbit ceil 25000Kbit burst 14146b/8 mpu 0b
  • verhead 0b
cburst 14146b/8 mpu 0b
  • verhead 0b
level 0 1:24 htb prio 0 quantum 187500 rate 15000Kbit ceil 15000Kbit burst 9127b/8 mpu 0b
  • verhead 0b
cburst 9127b/8 mpu 0b
  • verhead 0b
level 0 1:25 htb prio 0 quantum 62500 rate 5000Kbit ceil 5000Kbit burst 4Kb/8 mpu 0b
  • verhead 0b
cburst 4Kb/8 mpu 0b
  • verhead 0b
level 0 1:3 htb prio 0 quantum 200000 rate 900000Kbit ceil 900000Kbit burst 453262b/8 mpu 0b
  • verhead 0b
cburst 453262b/8 mpu 0b
  • verhead 0b
level 0 1:4 htb prio 0 quantum 200000 rate 850000Kbit ceil 850000Kbit burst 428187b/8 mpu 0b
  • verhead 0b
cburst 428187b/8 mpu 0b
  • verhead 0b
level 0 1:5 htb prio 0 quantum 200000 rate 800000Kbit ceil 800000Kbit burst 403100b/8 mpu 0b
  • verhead 0b
cburst 403100b/8 mpu 0b
  • verhead 0b
level 0 1:6 htb prio 0 quantum 200000 rate 750000Kbit ceil 750000Kbit burst 378000b/8 mpu 0b
  • verhead 0b
cburst 378000b/8 mpu 0b
  • verhead 0b
level 0 1:7 htb prio 0 quantum 200000 rate 700000Kbit ceil 700000Kbit burst 352887b/8 mpu 0b
  • verhead 0b
cburst 352887b/8 mpu 0b
  • verhead 0b
level 0 1:8 htb prio 0 quantum 200000 rate 650000Kbit ceil 650000Kbit burst 327762b/8 mpu 0b
  • verhead 0b
cburst 327762b/8 mpu 0b
  • verhead 0b
level 0 1:9 htb prio 0 quantum 200000 rate 600000Kbit ceil 600000Kbit burst 302700b/8 mpu 0b
  • verhead 0b
cburst 302700b/8 mpu 0b
  • verhead 0b
level 0 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24)
slide-28
SLIDE 28

Input signal

100 200 300 400 500 Times 20 40 60 80 100 Wait ms

Red: Exponential Weighted Moving Average (EWMA) Green: Moving median L(t) = l(t)α + L(t−1)(1 − α) EWMA, also called low pass filter

slide-29
SLIDE 29

u(t) =

Continous

  • Kpe(t)

Proportional

+ Kp Ti

t

  • e(τ)dτ
  • Integral

+ KpTde′(t)

  • Derivative

uk = uk−1

  • Previous

+ Kp(1 + T Ti )ek − Kpek−1 + KpTd T (ek − 2ek−1 + ek−2)

  • Delta
  • Incremental form

uk = Kpek

Proportional

+ ui(k−1) + KpT Ti ek

  • Integral

+ KpTd T (ek − ek−1)

  • Derivative
  • Absolute form