Pitfalls of data-driven networking: A case study of latent causal - - PowerPoint PPT Presentation

pitfalls of data driven networking a case study of latent
SMART_READER_LITE
LIVE PREVIEW

Pitfalls of data-driven networking: A case study of latent causal - - PowerPoint PPT Presentation

Pitfalls of data-driven networking: A case study of latent causal confounders in video streaming P. C. Sruthi, Sanjay Rao, Bruno Ribeiro Say you want design a video streaming system... Say you want design a video streaming system... Video


slide-1
SLIDE 1

Pitfalls of data-driven networking: A case study of latent causal confounders in video streaming

  • P. C. Sruthi, Sanjay Rao, Bruno Ribeiro
slide-2
SLIDE 2

Say you want design a video streaming system...

slide-3
SLIDE 3

Say you want design a video streaming system...

Video Streaming Algorithm

slide-4
SLIDE 4

What If...

  • A different algorithm had been used?
  • Viewers started playing 4K videos? Would they experience buffering?
slide-5
SLIDE 5

What If...

  • A different algorithm had been used?
  • Viewers started playing 4K videos? Would they experience buffering?

Counterfactual questions

slide-6
SLIDE 6

What this talk is about

  • What are the challenges involved in answering counterfactual questions

for networked systems?

slide-7
SLIDE 7

What this talk is about

  • What are the challenges involved in answering counterfactual questions

for networked systems?

  • A study of these challenges in the context of video streaming algorithms
slide-8
SLIDE 8

What this talk is about

  • What are the challenges involved in answering counterfactual questions

for networked systems?

  • A study of these challenges in the context of video streaming algorithms
  • Limitations of current methods, and a preliminary approach to overcome

these challenges

slide-9
SLIDE 9

Background: Video Streaming (ABR)

slide-10
SLIDE 10

Background: Video Streaming (ABR)

A video is encoded into multiple qualities (bitrates)

slide-11
SLIDE 11

Background: Video Streaming (ABR)

Each bitrate is split into chunks

slide-12
SLIDE 12

Background: Video Streaming (ABR)

slide-13
SLIDE 13

Counterfactuals for video streaming

  • What if ABR 2 had been used instead of ABR 1?
slide-14
SLIDE 14

Counterfactuals for video streaming

  • What if ABR 2 had been used instead of ABR 1?
  • Alternatively, what if a different sequence of bitrates had been

downloaded?

slide-15
SLIDE 15

Counterfactuals for video streaming

ABR 1 Deployment Traces Performance Evaluation

slide-16
SLIDE 16

Counterfactuals for video streaming

ABR 1 Deployment Traces Performance Evaluation ABR 2

slide-17
SLIDE 17

Counterfactuals for video streaming

ABR 1 Deployment Traces Performance Evaluation ABR 2 Performance Evaluation Offline trace based execution

slide-18
SLIDE 18

Evaluating video streaming systems using traces

ABR 1 Deployment Traces Performance Evaluation (t, s, d) (0, 1Mb, 1s) (1, 2Mb, 1s) (2, 1Mb, 1s) . . . . t: download start time of chunk s: size of chunk d: download time

slide-19
SLIDE 19

Evaluating video streaming systems using traces

ABR 1 Deployment Traces Performance Evaluation (t, s, d) (0, 1Mb, 1s) (1, 2Mb, 1s) (2, 1Mb, 1s) . . . . ABR 2 t: download start time of chunk s: size of chunk d: download time

slide-20
SLIDE 20

What can go wrong with using traces?

slide-21
SLIDE 21

What can go wrong with using traces?

  • Traces generated by adaptive algorithms can affect trace driven evaluation!
slide-22
SLIDE 22

What can go wrong with using traces?

slide-23
SLIDE 23
  • ABR-Probe probes bandwidth

before downloading a chunk

  • Chooses bitrate to match the

probed bandwidth

What can go wrong with using traces?

slide-24
SLIDE 24

(t, s, d) (0, 1Mb, 1s) (3.2, 2Mb, 1s) (4.6, 1Mb, 1s) . . . .

What can go wrong with using traces?

slide-25
SLIDE 25

(t, s, d) (0, 1Mb, 1s) (3.2, 2Mb, 1s) (4.6, 1Mb, 1s) . . . .

What can go wrong with using traces?

slide-26
SLIDE 26

The issue of confounders

Confounders induce dependencies in the data that are often unaccounted for.

slide-27
SLIDE 27

The issue of confounders

Confounders induce dependencies in the data that are often unaccounted for. This can affect the accuracy of trace based execution.

slide-28
SLIDE 28

The issue of confounders

Confounders induce dependencies in the data that are often unaccounted for. This can affect the accuracy of trace based execution.

Causal Graph

slide-29
SLIDE 29

Existing approaches to deal with confounders

  • Randomized Controlled Trials (RCTs)

○ Choose the bitrates at random so that the bandwidth doesn’t affect it ○ RCTs don’t work here - Trace collection is impractical, other data dependencies

slide-30
SLIDE 30

Existing approaches to deal with confounders

  • Randomized Controlled Trials (RCTs)

○ Choose the bitrates at random so that the bandwidth doesn’t affect it ○ RCTs don’t work here - Trace collection is impractical, other data dependencies

  • Observational Studies (Matching on confounders)

○ Find data in the original trace that matches what you’d like to estimate in your new system, and use that as a measurement ○ Do not account for latent confounders [1][2]

[1] S. Shunmuga Krishnan and Ramesh K. Sitaraman. 2012. Video stream quality impacts viewer behavior: inferring causality using quasi-experimental designs. In Proceedings of the 2012 Internet Measurement Conference (IMC ’12) [2] Detecting network neutrality violations with causal inference. In Proceedings of the 5th International Conference on Emerging Networking Experiments and Technologies

slide-31
SLIDE 31

What if you could account for latent confounders?

  • We conducted a case study on the simplest scenario that illustrated the

problem

slide-32
SLIDE 32

Illustrative Case Study

  • Create trace by downloading a

video using ABR-Probe

  • Use trace to evaluate

performance of second bitrate sequence

slide-33
SLIDE 33

Illustrative Case Study

  • Create trace by downloading a

video using ABR-Probe

  • Use trace to evaluate

performance of second bitrate sequence

  • Assumptions:

○ 𝜾: session phase, hidden ○ 𝜔: chunk start phase, hidden ○ Bh, Bl, T are known

slide-34
SLIDE 34

Our Approach

  • Construct causal graph for trace

production process

  • Infer hidden confounders from the

data

slide-35
SLIDE 35

Our Approach

  • Construct causal graph for trace

production process

  • Infer hidden confounders from the

data

  • Use trace with inferred confounders

to evaluate performance of second sequence

slide-36
SLIDE 36

Our Approach

  • Key Idea:

○ Infer the chunk phase explicitly from the data

  • Use Maximum A Posteriori estimation

○ All of the details in the paper Chunk size Download Time Chunk Phase (𝜔)

slide-37
SLIDE 37

Evaluation

  • Trace Production: ABR, Randomized bitrates
slide-38
SLIDE 38

Evaluation

  • Trace Production: ABR, Randomized bitrates
  • Trace based evaluation

○ Calculate download times of new sequence of bitrates using only the trace as input, with different methods

slide-39
SLIDE 39

Evaluation

  • Trace Production: ABR, Randomized bitrates
  • Trace based evaluation

○ Calculate download times of new sequence of bitrates using only the trace as input, with different methods ○ Evaluation metric: Error in download time calculation from trace vs ground truth deployment ○ How accurate was it in answering the counterfactual compared with ground truth?

slide-40
SLIDE 40

Evaluation

  • Trace based evaluation methods

○ Direct Emulation - Use observed throughput from trace as bandwidth model ○ Match - No Latent - Match on measured features only (bitrate) ○ Match - Latent - Our method: match on bitrate and inferred chunk phase

slide-41
SLIDE 41

Takeaways

Trace Production: ABR-Probe

slide-42
SLIDE 42

Takeaways

  • Direct Emulation based on the observed

throughputs is not accurate for evaluation - median error ~18%

Trace Production: ABR-Probe

slide-43
SLIDE 43

Takeaways

  • Direct Emulation based on the observed

throughputs is not accurate for evaluation - median error ~18%

  • Performing matching without

accounting for confounders can be even worse

Trace Production: ABR-Probe

slide-44
SLIDE 44

Takeaways

  • Direct Emulation based on the observed

throughputs is not accurate for evaluation - median error ~18%

  • Performing matching without

accounting for confounders can be even worse

  • Matching on latent confounders is the

most accurate

Trace Production: ABR-Probe

slide-45
SLIDE 45

Using RCTs

  • Similar results
  • Match-Latent is optimal

Trace Production: Randomized bitrates

slide-46
SLIDE 46

Conclusions and Future Directions

  • First step towards answering counterfactual questions with video

streaming systems

○ Key challenge: True bandwidth process is not available - latent confounders

slide-47
SLIDE 47

Conclusions and Future Directions

  • First step towards answering counterfactual questions with video

streaming systems

○ Key challenge: True bandwidth process is not available - latent confounders

  • Preliminary approach to deal with latent confounders

○ RCTs and matching techniques insufficient without considering latent confounders

slide-48
SLIDE 48

Conclusions and Future Directions

  • First step towards answering counterfactual questions with video

streaming systems

○ Key challenge: True bandwidth process is not available - latent confounders

  • Preliminary approach to deal with latent confounders

○ RCTs and matching techniques insufficient without considering latent confounders

  • Challenges and Future Directions:

○ Generalization towards richer bandwidth processes, what this means for more complex scenarios