Salsify: Low-Latency Network Video Through Tighter Integration - - PowerPoint PPT Presentation

salsify low latency network video through tighter
SMART_READER_LITE
LIVE PREVIEW

Salsify: Low-Latency Network Video Through Tighter Integration - - PowerPoint PPT Presentation

Salsify: Low-Latency Network Video Through Tighter Integration Between a Video Codec and a Transport Protocol Sadjad Fouladi, John Emmons, Emre Orbay, Catherine Wu, Riad S. Wahby, Keith Winstein https://snr.stanford.edu/salsify Internet video


slide-1
SLIDE 1

Salsify: Low-Latency Network Video Through Tighter Integration Between a Video Codec and a Transport Protocol

Sadjad Fouladi, John Emmons, Emre Orbay, Catherine Wu, Riad S. Wahby, Keith Winstein

https://snr.stanford.edu/salsify

slide-2
SLIDE 2

Internet video delivery

2

Sender Receiver network

slide-3
SLIDE 3

Internet video delivery

  • Different classes of Internet video delivery applications:
  • Video streaming (previous class)
  • Live video
  • Real-time video (today’s class)

3

slide-4
SLIDE 4
  • Latency target: the amount of time between when

Each of these classes has a different latency target

4

something happens you see it

and

slide-5
SLIDE 5

Class 1: Video streaming

  • Latency target: tens of minutes
  • BBA [T.-Y. Huang et al., SIGCOMM ’14]


MPC [X. Yin et al., SIGCOMM ’15]
 Pensieve [H. Mao et al., SIGCOMM ’17]
 Oboe [Z. Akhtar, SIGCOMM ’18]

5

slide-6
SLIDE 6

Class 2: Live video

  • Latency target: 5—30 seconds
  • Vic [S. McCanne et al., MULTIMEDIA ’95]


VDN [M. K. Mukerjee et al., SIGCOMM ’15]

6

slide-7
SLIDE 7

Class 3: Real-time Video

  • Latency target: tens of milliseconds
  • Why would an application need that?

7

slide-8
SLIDE 8

Real-time video systems transmit video with low latency…

8

Sender Receiver network

slide-9
SLIDE 9

…to maintain the interactivity of the application.

9

Sender Receiver network

Interaction

slide-10
SLIDE 10

Cloud Video Gaming

slide-11
SLIDE 11

Remote Surgery

CLOUD EDITION

slide-12
SLIDE 12

MIT

INTERNET CO.

Teleoperation of Robots and Vehicles

slide-13
SLIDE 13

Video Conferencing

slide-14
SLIDE 14

Video Conferencing

slide-15
SLIDE 15

Video Conferencing (reality)

slide-16
SLIDE 16

WebRTC (Chrome 65)

Watch the video at: https://snr.stanford.edu/salsify

slide-17
SLIDE 17

Current systems do not react fast enough to network variations, end up congesting the network, causing stalls and glitches.

slide-18
SLIDE 18

Researchers already knew about this…

  • K. Winstein, A. Sivaraman, H. Balakrishnan,

Stochastic Forecasts Achieve High Throughput and Low Delay over Cellular Networks, NSDI ’13

18

100 1000 10 20 30 40 50 60

throughput (kbps)

50 100 500 1000 5000 10 20 30 40 50 60

Skype

delay (ms) time (s)

slide-19
SLIDE 19

…and designed better throughput prediction algorithms

  • K. Winstein, A. Sivaraman, H. Balakrishnan,

Stochastic Forecasts Achieve High Throughput and Low Delay over Cellular Networks, NSDI ’13

19

100 1000 10 20 30 40 50 60

throughput (kbps)

50 100 500 1000 5000 10 20 30 40 50 60

Sprout

delay (ms) time (s)

slide-20
SLIDE 20

But better congestion control alone couldn’t save the day.

slide-21
SLIDE 21

video codec transport protocol

Conventional design: two control loops at arm’s length

21

slide-22
SLIDE 22

video codec transport protocol

Conventional design: two control loops at arm’s length

22

slide-23
SLIDE 23

target bit rate video codec transport protocol

Transport estimates the network throughput and communicates that to the codec

23

slide-24
SLIDE 24

compressed frames video codec transport protocol

Codec produces compressed frames, targeting that bit rate

24

slide-25
SLIDE 25

new target bit rate compressed frames video codec transport protocol

Transport occasionally updates that estimate

25

slide-26
SLIDE 26

compressed frames video codec transport protocol

Codec updates frame rate and quality accordingly

26

slide-27
SLIDE 27

The problem: codec and transport are too decoupled

  • The codec can only respond to changes in target bit rate over

coarse time intervals.

  • Individual frames may cause packet loss/queueing.
  • The transport has little control over what codec produces.

⇒ The resulting system is slow to react to network variations.

27

slide-28
SLIDE 28

video codec transport protocol

Decades of research and development on these components…

28

MPEG-1 MPEG-2 H.263 H.264 H.265 VP8 VP9 AV1 VC-1 Sprout BBR NADA LEDBAT CDG GCC RemyCC FBRA

slide-29
SLIDE 29

Salsify explores a more tightly-integrated design

transport protocol & video codec

slide-30
SLIDE 30

Salsify, a new heart from old parts

  • Individual component of Salsify are not exactly new:
  • The transport protocol is inspired by “packet pair” and “Sprout-EWMA”.
  • The video format, VP8, was finalized in 2008.
  • The functional video codec was introduced in [Fouladi et al., NSDI ’17].
  • Salsify is a new architecture for real-time video that integrates these

components in a way that responds quickly to network variations.

slide-31
SLIDE 31

Salsify’s architecture:

Video-aware transport protocol

31

transport protocol & video codec

slide-32
SLIDE 32
  • There’s no notion of bit rate, only the next frame size!
  • Inspired by packet pair and Sprout-EWMA, transport uses packet

inter-arrival time, reported by the receiver.

Video-aware transport protocol

32

“What should be the size of the next frame?”

* without causing excessive delay

slide-33
SLIDE 33

Receiver keeps a moving average of packet inter-arrival times

33

Receiver

t₁ t₂ t₃ t₄ t₅

frame i frame i+1 Sender

T ← α ⋅ ti + (1 − α) ⋅ T

OLD IDEA!

T : average packet inter-arrival time

slide-34
SLIDE 34

Sender does not transmit continuously

34

Receiver

t₁ t₂ t₃ t₄ t₅

grace period

frame i frame i+1 Sender

T ← α ⋅ (ti − gi) + (1 − α) ⋅ T

slide-35
SLIDE 35

What should be the size of the next frame?

  • The receiver sends back an acknowledgement for every received packet,

containing the latest inter-arrival time.

  • The sender calculates the next frame size as follows:

35

frame size = ( d T − N) packets

N : number of packets in flights d : target delay T : average inter-arrival time

slide-36
SLIDE 36

Salsify’s architecture:

Functional video codec

transport protocol & video codec

slide-37
SLIDE 37

The encoder can only know the output size after the fact.

It’s challenging for any codec to choose the appropriate
 quality settings upfront to meet a target size—they tend to

  • ver-/undershoot the target.

37

slide-38
SLIDE 38

Video codec

  • A piece of software or hardware that compresses and

decompresses digital video.

38

1011000101101010001 0001111111011001110 0110011101110011001 0010000...001001101 0010011011011011010 1111101001100101000 0010011011011011010

Encoder Decoder

slide-39
SLIDE 39

Video codecs provide a simple interface

encode([🌇,🏚,...]) → [frames...]
 decode([frames...]) → [🌇,🏚,...]

39

slide-40
SLIDE 40

First frame in the sequence is stored in its entirety

40

keyframe

slide-41
SLIDE 41

The encoder exploits the similarities between frames to compress the video

41

keyframe

slide-42
SLIDE 42

The encoder exploits the similarities between frames to compress the video

42

keyframe

slide-43
SLIDE 43

The encoder exploits the similarities between frames to compress the video

43

"previous frame,

  • ffset (a, b)"

interframe keyframe

  • frame
  • ffset (a, b)
slide-44
SLIDE 44

Predictions are not always perfect

44

current block

slide-45
SLIDE 45

Predictions are not always perfect

45

current block closest block in previous frame

slide-46
SLIDE 46

The encoder also needs to store the residues

46

=

current block closest block in previous frame residue

slide-47
SLIDE 47

Decode process

47

=

actual block frame

  • ffset (a, b)

prediction residue

+

slide-48
SLIDE 48

What frames can be referenced?

  • The codec keeps a list of references that can be used by frames.
  • Each frame can replace one of the reference slots with its output.

48

slide-49
SLIDE 49

Decode process

49

=

actual block slot

  • ffset (a, b)

prediction residue

+

slide-50
SLIDE 50

The encoder can only achieve the bit rate on average

50

target bit rate (2 Mbps)

frame size (KB) 6 12 18 24 frame number 10 20 30 40 47

slide-51
SLIDE 51

Why real-time video is a different problem

  • 1. Video streaming


Latency target: tens of minutes

  • 2. Live video


Latency target: 5—30 seconds

  • 3. Real-time video


Latency target: tens of milliseconds

51

Encoder timescale to achieve target bit rate: 1-2 seconds How often a keyframe can be inserted: 2 seconds

slide-52
SLIDE 52

The challenge:
 Getting an accurate frame out of an inaccurate codec

  • Trial and error


Encode with different quality settings, pick the one that fits.

52

SOUNDS GOOD, DOESN’T WORK!

slide-53
SLIDE 53

The video codec is stateful

53

  • Frames are depending on references that live in the mind of the

decoder.

  • Every frame can change this set of references, and future frames

are counting on that.

slide-54
SLIDE 54

Video codec

54

source state

prob tables

slide-55
SLIDE 55

Video codec

55

source state

frame

  • utput

prob tables

slide-56
SLIDE 56

Video codec

56

source state target state

frame

  • utput

prob tables prob tables’

slide-57
SLIDE 57

Video codec is an automaton

57

keyframe interframe interframe interframe

slide-58
SLIDE 58

There’s no way to undo an encoded frame in current codecs

encode([🌇,🏚,...]) → [frames...] decode([frames...]) → [🌇,🏚,...]

The state is internal to the encoder—no way to save/restore the state.

58

slide-59
SLIDE 59

Functional video codec to the rescue

encode(state, 🏚) → state′, frame
 decode(state, frame) → state′, 🏚


59

Salsify’s functional video codec exposes the state that can be saved/restored.

Described in Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads, NSDI ’17

slide-60
SLIDE 60

Order two, pick the one that fits!

  • Salsify’s functional video codec can explore different execution paths

without committing to them.

  • For each frame, codec presents the transport with three options:

A slightly-higher-quality version, A slightly-lower-quality version,

❌ Discarding the frame.

60

better worse

5 K B 1 K B

slide-61
SLIDE 61

Salsify’s architecture:

Unified control loop

61

transport protocol & video codec

slide-62
SLIDE 62

Codec → Transport
 “Here’s two versions of the current frame.”

62

b e t t e r w

  • r

s e

5 K B 2 5 K B

30 KB

target frame size

slide-63
SLIDE 63

Transport → Codec
 “I picked option 2. Base the next frame on its exiting state.”

63

2 5 K B

30 KB

target frame size

slide-64
SLIDE 64

Codec → Transport
 “Here’s two versions of the latest frame.”

64

b e t t e r w

  • r

s e

5 K B 2 5 K B

55 KB

target frame size

slide-65
SLIDE 65

Transport → Codec
 “I picked option 1. Base the next frame on its exiting state.”

65

5 K B

55 KB

target frame size

slide-66
SLIDE 66

Codec → Transport
 “Here’s two versions of the latest frame.”

66

b e t t e r w

  • r

s e

7 K B 2 5 K B 5 K B

5 KB

target frame size

slide-67
SLIDE 67

Transport → Codec
 “I cannot send any frames right now. Sorry, but discard them.”

67

5 KB

target frame size

slide-68
SLIDE 68

Codec → Transport
 “Fine. Here’s two versions of the latest frame.”

68

better worse

45 KB 20 KB

50 KB

target frame size

slide-69
SLIDE 69

Transport → Codec
 “I picked option 1. Base the next frame on its exiting state.”

69

50 KB

45 KB

target frame size

slide-70
SLIDE 70

There’s no notion of frame rate or bit rate in the system.
 Frames are sent when the network can accommodate them.

slide-71
SLIDE 71

Loss recovery

  • Option 1: Ignore dropped packets.
  • Option 2: Retransmit dropped packets.
  • Option 3: Restart the video stream with a keyframe or a “key-slice.”

71

slide-72
SLIDE 72

Loss corrupts the current frame...

72

frame corrupted frame

slide-73
SLIDE 73

Loss corrupts the current frame... and the rest!

73

frame corrupted frame frame frame

slide-74
SLIDE 74

Loss recovery

  • Option 1: Ignore dropped packets.
  • Option 2: Retransmit dropped packets.
  • Option 3: Restart the video stream with a keyframe or a “key-slice.”

74

slide-75
SLIDE 75

Option 5, Salsify’s way: Jump back to the last correct state

75

slide-76
SLIDE 76

Measurement Testbed

slide-77
SLIDE 77

Goals for the measurement testbed

  • A system with


reproducible input video and
 reproducible network traces that runs
 unmodified version of the system-under-test.

  • Target QoE metrics: image quality and video delay.

77

slide-78
SLIDE 78

Goals for the measurement testbed

  • A system with


reproducible input video and
 reproducible network traces that runs
 unmodified version of the system-under-test.

  • Target QoE metrics: image quality and video delay.

78

slide-79
SLIDE 79

Goals for the measurement testbed

  • A system with


reproducible input video and
 reproducible network traces that runs
 unmodified version of the system-under-test.

  • Target QoE metrics: image quality and video delay.

79

slide-80
SLIDE 80

Video delay

80

! ! ! ! ! ! !

sender receiver

  • time
slide-81
SLIDE 81

Video System AV.io

Measurement System Sender Receiver

Network Simulator

slide-82
SLIDE 82

82

slide-83
SLIDE 83

barcoded video video in/out (HDMI) HDMI to USB camera emulated network receiver HDMI output

slide-84
SLIDE 84

barcoded video video in/out (HDMI) HDMI to USB camera emulated network receiver HDMI output

slide-85
SLIDE 85

barcoded video video in/out (HDMI) HDMI to USB camera emulated network receiver HDMI output

slide-86
SLIDE 86

barcoded video video in/out (HDMI) HDMI to USB camera emulated network receiver HDMI output

slide-87
SLIDE 87

barcoded video video in/out (HDMI) HDMI to USB camera emulated network receiver HDMI output

slide-88
SLIDE 88

barcoded video video in/out (HDMI) HDMI to USB camera emulated network receiver HDMI output

slide-89
SLIDE 89

barcoded video video in/out (HDMI) HDMI to USB camera emulated network receiver HDMI output

slide-90
SLIDE 90

Sent Image Timestamp: T+0.000s Received Image Timestamp: T+0.765s Quality: 9.76 dB SSIM

slide-91
SLIDE 91

Evaluation of Salsify

slide-92
SLIDE 92

Evaluation results: Verizon LTE Trace

8 10 12 14 16 18 500 700 1000 2000 5000 7000

Video Quality (SSIM dB) Video Delay (95th percentile ms) WebRTC (VP9-SVC) Skype FaceTime Hangouts WebRTC

Better

slide-93
SLIDE 93

Evaluation results: Verizon LTE Trace

8 10 12 14 16 18 500 700 1000 2000 5000 7000

Video Quality (SSIM dB) Video Delay (95th percentile ms) WebRTC (VP9-SVC) Skype FaceTime Hangouts WebRTC Status Quo

(conventional transport and codec)

slide-94
SLIDE 94

Evaluation results: Verizon LTE Trace

8 10 12 14 16 18 500 700 1000 2000 5000 7000

Video Quality (SSIM dB) Video Delay (95th percentile ms) WebRTC (VP9-SVC) Skype FaceTime Hangouts WebRTC Status Quo

(conventional transport and codec)

Salsify (conventional codec)

slide-95
SLIDE 95

Evaluation results: Verizon LTE Trace

8 10 12 14 16 18 500 700 1000 2000 5000 7000

Video Quality (SSIM dB) Video Delay (95th percentile ms) Salsify WebRTC (VP9-SVC) Skype FaceTime Hangouts WebRTC Status Quo

(conventional transport and codec)

Salsify (conventional codec)

slide-96
SLIDE 96

Evaluation results: Grace Period

8 10 12 14 16 18 500 700 1000 2000 5000 7000

Video Quality (SSIM dB) Video Delay (95th percentile ms) Salsify WebRTC (VP9-SVC) Skype FaceTime Hangouts WebRTC Salsify (no grace period) Status Quo

(conventional transport and codec)

Salsify (conventional codec)

slide-97
SLIDE 97

Evaluation results: AT&T LTE Trace

8 9 10 11 12 13 14 15 16 200 300 500 700 1000 2000 5000

Video Quality (SSIM dB) Video Delay (95th percentile ms) WebRTC (VP9-SVC) Skype FaceTime Hangouts Salsify WebRTC

B e t t e r

slide-98
SLIDE 98

Evaluation results: T-Mobile UMTS Trace

9 10 11 12 13 14 3500 5000 7000 10000 14000 18000

Video Quality (SSIM dB) Video Delay (95th percentile ms) WebRTC (VP9-SVC) Skype FaceTime Hangouts Salsify WebRTC

B e t t e r

slide-99
SLIDE 99

WebRTC (Google Chrome 65.0 dev) Salsify

Network Variations

Watch the video at: https://snr.stanford.edu/salsify

slide-100
SLIDE 100

WebRTC (Google Chrome 65.0 dev) Salsify

Network Outages

Watch the video at: https://snr.stanford.edu/salsify

slide-101
SLIDE 101

Final Remarks

slide-102
SLIDE 102

Where Salsify is not a good fit

102

  • Long latency budgets
  • Non-variable links
  • Low-power devices
slide-103
SLIDE 103

Where Salsify is not a good fit

103

  • Long latency budgets
  • Non-variable links
  • Low-power devices
slide-104
SLIDE 104

Where Salsify is not a good fit: non-variable links

7 8 9 10 11 12 300 500 700 1000 2000 5000 15000

Video Quality (SSIM dB) Video Delay (95th percentile ms) WebRTC (VP9-SVC) Skype Hangouts Salsify FaceTime WebRTC

B e t t e r

slide-105
SLIDE 105

Where Salsify is not a good fit

105

  • Long latency budgets
  • Non-variable links
  • Low-power devices
slide-106
SLIDE 106

Codecs have been treated as black boxes in video systems for a long time.

slide-107
SLIDE 107

Takeaways

  • Salsify is a new architecture for real-time Internet video.
  • Salsify tightly integrates a video-aware transport protocol, with a

functional video codec, allowing it to respond quickly to changing network conditions.

  • Changes to the architecture of systems can sometimes yield significant

benefits.

  • More info at: snr.stanford.edu/salsify

Thank you: NSF, DARPA, Google, Dropbox, VMware, Huawei, Facebook, Stanford Platform Lab, and James.

slide-108
SLIDE 108
slide-109
SLIDE 109
slide-110
SLIDE 110