Buffer sizing and Video QoE Measurements at Netflix Bruce Spang , - - PowerPoint PPT Presentation

buffer sizing and video qoe measurements at netflix
SMART_READER_LITE
LIVE PREVIEW

Buffer sizing and Video QoE Measurements at Netflix Bruce Spang , - - PowerPoint PPT Presentation

Buffer sizing and Video QoE Measurements at Netflix Bruce Spang , Brady Walsh, Te-Yuan Huang, Tom Rusnock, Joe Lawrence, Nick McKeown February 10, 2020 What are we talking about? What are we talking about? Buffer Server 1 ISP Server 2


slide-1
SLIDE 1

Buffer sizing and Video QoE Measurements at Netflix

Bruce Spang, Brady Walsh, Te-Yuan Huang, Tom Rusnock, Joe Lawrence, Nick McKeown February 10, 2020
slide-2
SLIDE 2

What are we talking about?

slide-3
SLIDE 3 Server 1 Server 2 Buffer ISP …

What are we talking about?

slide-4
SLIDE 4

How big should a buffer be?

Too big: packets wait for too long Too small: too many packets thrown away
slide-5
SLIDE 5 “A buffer should be at least one BDP” [Villamizar, Song 1994]
slide-6
SLIDE 6 “A buffer should be at least one BDP” [Villamizar, Song 1994] BDP=Bandwidth x Delay # of packets in a link for full utilization
slide-7
SLIDE 7 “A buffer should be at least one BDP” [Villamizar, Song 1994] Time Congestion Window BDP=Bandwidth x Delay # of packets in a link for full utilization
slide-8
SLIDE 8 “A buffer should be at least one BDP” [Villamizar, Song 1994] Time Congestion Window BDP + B Loss happens when link and buffer are full BDP=Bandwidth x Delay # of packets in a link for full utilization
slide-9
SLIDE 9 “A buffer should be at least one BDP” [Villamizar, Song 1994] Time Congestion Window BDP + B Loss happens when link and buffer are full ½(BDP + B) TCP stops sending until ½ (BDP+B) packets received BDP=Bandwidth x Delay # of packets in a link for full utilization
slide-10
SLIDE 10 “A buffer should be at least one BDP” [Villamizar, Song 1994] Time Congestion Window BDP + B Loss happens when link and buffer are full ½(BDP + B) TCP stops sending until ½ (BDP+B) packets received BDP=Bandwidth x Delay # of packets in a link for full utilization

}

Buffer needs to hold this many packets
slide-11
SLIDE 11

How big should a buffer be?

BDP: Villamizar and Song 1994 BDP/√n: Appenzeller, McKeown, Keslassy 2004 O(n): Dhamdhere, Jiang, Dovrolis 2005 O(1): Enachescu, Ganjali, Goel, McKeown, Roughgarden 2006
slide-12
SLIDE 12

Which is correct?

slide-13
SLIDE 13

It’s complicated

slide-14
SLIDE 14
  • 1. TCP New Reno (mostly) behaves

as expected

  • 2. Video performance varies
  • 3. Real routers complicate this story
slide-15
SLIDE 15

Our Experiment

slide-16
SLIDE 16 Catalog servers Uses spinning disks, cheaply stores entire catalog
slide-17
SLIDE 17 Offload servers Use SSDs to serve top ~30%
  • f content faster
slide-18
SLIDE 18 These three racks are called a stack
slide-19
SLIDE 19
slide-20
SLIDE 20 Make this buffer small… …and this
  • ne large
slide-21
SLIDE 21
  • 1. TCP New Reno (mostly) behaves

as expected

  • 2. Video performance varies
  • 3. Real routers complicate this story
slide-22
SLIDE 22 Large buffer has higher latency during congested hour
slide-23
SLIDE 23 Sometimes the large buffer has much higher latency
slide-24
SLIDE 24 Large buffer has lower loss during congested hour
slide-25
SLIDE 25
  • 1. TCP New Reno (mostly) behaves

as expected

  • 2. Video performance varies
  • 3. Real routers complicate this story
slide-26
SLIDE 26 Good buffer size: + Fewer rebuffers + Better video quality + Videos start faster Bad buffer size:
  • More rebuffers
  • Worse video quality
  • Videos start slower
slide-27
SLIDE 27 Good buffer size: + Fewer rebuffers + Better video quality + Videos start faster Bad buffer size:
  • More rebuffers
  • Worse video quality
  • Videos start slower }
This happens when buffer is too large or too small.
slide-28
SLIDE 28 Site #2: A smaller buffer is better Reducing the buffer from 500MB to 25MB
  • 15.6% decrease in sessions with a rebuffer
  • 5.3% decrease in low quality video
  • 13.5% decrease in play delay
slide-29
SLIDE 29 Site #3: A smaller buffer is better Reducing the buffer from 500MB to 50MB
  • 22.1% decrease in sessions with a rebuffer
  • 7.0% decrease in low quality video
  • 14.8% decrease in play delay
slide-30
SLIDE 30 Site #1: A smaller buffer is worse Reducing the buffer from 500MB to 50MB +46.3% increase in sessions with a rebuffer +5.7% increase in low quality video
  • 5.9% decrease in play delay
slide-31
SLIDE 31
  • 1. TCP New Reno (mostly) behaves

as expected

  • 2. Video performance varies
  • 3. Real routers complicate this story
slide-32
SLIDE 32 Large buffer has higher latency during congested hour
slide-33
SLIDE 33 Remember how the large buffer has much higher latency…
slide-34
SLIDE 34 Servers have different very latency distributions Min RTT (ms)
slide-35
SLIDE 35
slide-36
SLIDE 36 Server 1 Server 2 Buffer ISP …

What I imagined

slide-37
SLIDE 37 Server 1 Server 2 Buffer ISP …

What I imagined

LIES!

slide-38
SLIDE 38 Line card #1 Line card #2 Line card #3 Line card #4
slide-39
SLIDE 39 VOQ #1 VOQ #2 VOQ #3 VOQ #4 VOQ #5 VOQ #6 VOQ #7 VOQ #8
slide-40
SLIDE 40 Buffer architecture Server #1 Server #2 “Offload” VOQ “Catalog” VOQ ISP 100Gbps 2/3 1/3 Server #3
slide-41
SLIDE 41 Traffic is fairly split when load is equal 40 Gbps 40 Gbps “Offload” VOQ “Catalog” VOQ ISP 100Gbps 67 Gbps 33 Gbps 40 Gbps
slide-42
SLIDE 42 When one VOQ offers less than its fair share, it sees no congestion 50 Gbps 50 Gbps “Offload” VOQ “Catalog” VOQ ISP 100Gbps 90 Gbps 10 Gbps 10 Gbps No delay!
slide-43
SLIDE 43 VOQs explain the RTT differences Min RTT (ms) This VOQ is served faster This VOQ is served slower This VOQ is all
  • ver the place
slide-44
SLIDE 44 Switches prioritize long-tail content
slide-45
SLIDE 45 Switches prioritize long-tail content Same latency during uncongested hours
slide-46
SLIDE 46 Switches prioritize long-tail content Same latency during uncongested hours Popular content is congested Long-tail content not congested
slide-47
SLIDE 47 New scheduling algorithm! Server #1 Server #2 “Offload” VOQ “Catalog” VOQ ISP 100Gbps Load-dependent Load-dependent Server #3
slide-48
SLIDE 48 Default scheduling algorithm New scheduling algorithm is more consistent
slide-49
SLIDE 49
  • 1. TCP New Reno (mostly) behaves

as expected

  • 2. Video performance varies
  • 3. Real routers complicate this story
slide-50
SLIDE 50

How big should a buffer be?

slide-51
SLIDE 51

Thanks!

For more details, please see: https://brucespang.com/papers/netflix-buffer-sizing.pdf