Salsify: Low-Latency Network Video Through Tighter Integration Between a Video Codec and a Transport Protocol
Sadjad Fouladi, John Emmons, Emre Orbay, Catherine Wu, Riad S. Wahby, Keith Winstein
https://snr.stanford.edu/salsify
Salsify: Low-Latency Network Video Through Tighter Integration - - PowerPoint PPT Presentation
Salsify: Low-Latency Network Video Through Tighter Integration Between a Video Codec and a Transport Protocol Sadjad Fouladi, John Emmons, Emre Orbay, Catherine Wu, Riad S. Wahby, Keith Winstein https://snr.stanford.edu/salsify Internet video
Salsify: Low-Latency Network Video Through Tighter Integration Between a Video Codec and a Transport Protocol
Sadjad Fouladi, John Emmons, Emre Orbay, Catherine Wu, Riad S. Wahby, Keith Winstein
https://snr.stanford.edu/salsify
Internet video delivery
2
Sender Receiver network
Internet video delivery
3
Each of these classes has a different latency target
4
something happens you see it
and
Class 1: Video streaming
MPC [X. Yin et al., SIGCOMM ’15] Pensieve [H. Mao et al., SIGCOMM ’17] Oboe [Z. Akhtar, SIGCOMM ’18]
5
Class 2: Live video
VDN [M. K. Mukerjee et al., SIGCOMM ’15]
6
Class 3: Real-time Video
7
Real-time video systems transmit video with low latency…
8
Sender Receiver network
…to maintain the interactivity of the application.
9
Sender Receiver network
MIT
INTERNET CO.
WebRTC (Chrome 65)
Watch the video at: https://snr.stanford.edu/salsify
Researchers already knew about this…
Stochastic Forecasts Achieve High Throughput and Low Delay over Cellular Networks, NSDI ’13
18
100 1000 10 20 30 40 50 60
throughput (kbps)
50 100 500 1000 5000 10 20 30 40 50 60
Skype
delay (ms) time (s)
…and designed better throughput prediction algorithms
Stochastic Forecasts Achieve High Throughput and Low Delay over Cellular Networks, NSDI ’13
19
100 1000 10 20 30 40 50 60
throughput (kbps)
50 100 500 1000 5000 10 20 30 40 50 60
Sprout
delay (ms) time (s)
video codec transport protocol
Conventional design: two control loops at arm’s length
21
video codec transport protocol
Conventional design: two control loops at arm’s length
22
target bit rate video codec transport protocol
Transport estimates the network throughput and communicates that to the codec
23
compressed frames video codec transport protocol
Codec produces compressed frames, targeting that bit rate
24
new target bit rate compressed frames video codec transport protocol
Transport occasionally updates that estimate
25
compressed frames video codec transport protocol
Codec updates frame rate and quality accordingly
26
The problem: codec and transport are too decoupled
coarse time intervals.
⇒ The resulting system is slow to react to network variations.
27
video codec transport protocol
Decades of research and development on these components…
28
MPEG-1 MPEG-2 H.263 H.264 H.265 VP8 VP9 AV1 VC-1 Sprout BBR NADA LEDBAT CDG GCC RemyCC FBRA
Salsify explores a more tightly-integrated design
transport protocol & video codec
Salsify, a new heart from old parts
components in a way that responds quickly to network variations.
Salsify’s architecture:
Video-aware transport protocol
31
transport protocol & video codec
inter-arrival time, reported by the receiver.
Video-aware transport protocol
32
* without causing excessive delay
Receiver keeps a moving average of packet inter-arrival times
33
Receiver
t₁ t₂ t₃ t₄ t₅
frame i frame i+1 Sender
T : average packet inter-arrival time
Sender does not transmit continuously
34
Receiver
t₁ t₂ t₃ t₄ t₅
grace period
frame i frame i+1 Sender
What should be the size of the next frame?
containing the latest inter-arrival time.
35
N : number of packets in flights d : target delay T : average inter-arrival time
Salsify’s architecture:
Functional video codec
transport protocol & video codec
The encoder can only know the output size after the fact.
37
Video codec
decompresses digital video.
38
1011000101101010001 0001111111011001110 0110011101110011001 0010000...001001101 0010011011011011010 1111101001100101000 0010011011011011010
Encoder Decoder
Video codecs provide a simple interface
39
First frame in the sequence is stored in its entirety
40
keyframe
The encoder exploits the similarities between frames to compress the video
41
keyframe
The encoder exploits the similarities between frames to compress the video
42
keyframe
The encoder exploits the similarities between frames to compress the video
43
"previous frame,
interframe keyframe
Predictions are not always perfect
44
current block
Predictions are not always perfect
45
current block closest block in previous frame
The encoder also needs to store the residues
46
current block closest block in previous frame residue
Decode process
47
actual block frame
prediction residue
What frames can be referenced?
48
Decode process
49
actual block slot
prediction residue
The encoder can only achieve the bit rate on average
50
target bit rate (2 Mbps)
frame size (KB) 6 12 18 24 frame number 10 20 30 40 47
Why real-time video is a different problem
Latency target: tens of minutes
Latency target: 5—30 seconds
Latency target: tens of milliseconds
51
Encoder timescale to achieve target bit rate: 1-2 seconds How often a keyframe can be inserted: 2 seconds
The challenge: Getting an accurate frame out of an inaccurate codec
Encode with different quality settings, pick the one that fits.
52
The video codec is stateful
53
decoder.
are counting on that.
Video codec
54
source state
prob tables
Video codec
55
source state
frame
prob tables
Video codec
56
source state target state
frame
prob tables prob tables’
Video codec is an automaton
57
keyframe interframe interframe interframe
There’s no way to undo an encoded frame in current codecs
The state is internal to the encoder—no way to save/restore the state.
58
Functional video codec to the rescue
59
Salsify’s functional video codec exposes the state that can be saved/restored.
Described in Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads, NSDI ’17
Order two, pick the one that fits!
without committing to them.
A slightly-higher-quality version, A slightly-lower-quality version,
❌ Discarding the frame.
60
better worse
5 K B 1 K B
Salsify’s architecture:
Unified control loop
61
transport protocol & video codec
Codec → Transport “Here’s two versions of the current frame.”
62
b e t t e r w
s e
5 K B 2 5 K B
30 KB
target frame size
Transport → Codec “I picked option 2. Base the next frame on its exiting state.”
63
2 5 K B
30 KB
target frame size
Codec → Transport “Here’s two versions of the latest frame.”
64
b e t t e r w
s e
5 K B 2 5 K B
55 KB
target frame size
Transport → Codec “I picked option 1. Base the next frame on its exiting state.”
65
5 K B
55 KB
target frame size
Codec → Transport “Here’s two versions of the latest frame.”
66
b e t t e r w
s e
7 K B 2 5 K B 5 K B
5 KB
target frame size
Transport → Codec “I cannot send any frames right now. Sorry, but discard them.”
67
5 KB
target frame size
Codec → Transport “Fine. Here’s two versions of the latest frame.”
68
better worse
45 KB 20 KB
50 KB
target frame size
Transport → Codec “I picked option 1. Base the next frame on its exiting state.”
69
50 KB
45 KB
target frame size
There’s no notion of frame rate or bit rate in the system. Frames are sent when the network can accommodate them.
Loss recovery
71
Loss corrupts the current frame...
72
frame corrupted frame
Loss corrupts the current frame... and the rest!
73
frame corrupted frame frame frame
Loss recovery
74
Option 5, Salsify’s way: Jump back to the last correct state
75
Goals for the measurement testbed
reproducible input video and reproducible network traces that runs unmodified version of the system-under-test.
77
Goals for the measurement testbed
reproducible input video and reproducible network traces that runs unmodified version of the system-under-test.
78
Goals for the measurement testbed
reproducible input video and reproducible network traces that runs unmodified version of the system-under-test.
79
Video delay
80
sender receiver
Video System AV.io
Measurement System Sender Receiver
Network Simulator
82
barcoded video video in/out (HDMI) HDMI to USB camera emulated network receiver HDMI output
barcoded video video in/out (HDMI) HDMI to USB camera emulated network receiver HDMI output
barcoded video video in/out (HDMI) HDMI to USB camera emulated network receiver HDMI output
barcoded video video in/out (HDMI) HDMI to USB camera emulated network receiver HDMI output
barcoded video video in/out (HDMI) HDMI to USB camera emulated network receiver HDMI output
barcoded video video in/out (HDMI) HDMI to USB camera emulated network receiver HDMI output
barcoded video video in/out (HDMI) HDMI to USB camera emulated network receiver HDMI output
Sent Image Timestamp: T+0.000s Received Image Timestamp: T+0.765s Quality: 9.76 dB SSIM
Evaluation results: Verizon LTE Trace
8 10 12 14 16 18 500 700 1000 2000 5000 7000
Video Quality (SSIM dB) Video Delay (95th percentile ms) WebRTC (VP9-SVC) Skype FaceTime Hangouts WebRTC
Evaluation results: Verizon LTE Trace
8 10 12 14 16 18 500 700 1000 2000 5000 7000
Video Quality (SSIM dB) Video Delay (95th percentile ms) WebRTC (VP9-SVC) Skype FaceTime Hangouts WebRTC Status Quo
(conventional transport and codec)
Evaluation results: Verizon LTE Trace
8 10 12 14 16 18 500 700 1000 2000 5000 7000
Video Quality (SSIM dB) Video Delay (95th percentile ms) WebRTC (VP9-SVC) Skype FaceTime Hangouts WebRTC Status Quo
(conventional transport and codec)
Salsify (conventional codec)
Evaluation results: Verizon LTE Trace
8 10 12 14 16 18 500 700 1000 2000 5000 7000
Video Quality (SSIM dB) Video Delay (95th percentile ms) Salsify WebRTC (VP9-SVC) Skype FaceTime Hangouts WebRTC Status Quo
(conventional transport and codec)
Salsify (conventional codec)
Evaluation results: Grace Period
8 10 12 14 16 18 500 700 1000 2000 5000 7000
Video Quality (SSIM dB) Video Delay (95th percentile ms) Salsify WebRTC (VP9-SVC) Skype FaceTime Hangouts WebRTC Salsify (no grace period) Status Quo
(conventional transport and codec)
Salsify (conventional codec)
Evaluation results: AT&T LTE Trace
8 9 10 11 12 13 14 15 16 200 300 500 700 1000 2000 5000
Video Quality (SSIM dB) Video Delay (95th percentile ms) WebRTC (VP9-SVC) Skype FaceTime Hangouts Salsify WebRTC
Evaluation results: T-Mobile UMTS Trace
9 10 11 12 13 14 3500 5000 7000 10000 14000 18000
Video Quality (SSIM dB) Video Delay (95th percentile ms) WebRTC (VP9-SVC) Skype FaceTime Hangouts Salsify WebRTC
WebRTC (Google Chrome 65.0 dev) Salsify
Watch the video at: https://snr.stanford.edu/salsify
WebRTC (Google Chrome 65.0 dev) Salsify
Watch the video at: https://snr.stanford.edu/salsify
Where Salsify is not a good fit
102
Where Salsify is not a good fit
103
Where Salsify is not a good fit: non-variable links
7 8 9 10 11 12 300 500 700 1000 2000 5000 15000
Video Quality (SSIM dB) Video Delay (95th percentile ms) WebRTC (VP9-SVC) Skype Hangouts Salsify FaceTime WebRTC
Where Salsify is not a good fit
105
Codecs have been treated as black boxes in video systems for a long time.
Takeaways
functional video codec, allowing it to respond quickly to changing network conditions.
benefits.
Thank you: NSF, DARPA, Google, Dropbox, VMware, Huawei, Facebook, Stanford Platform Lab, and James.