Steering the Edgecast CDN Marcel Flores 13 June 2018 - - PowerPoint PPT Presentation

steering the edgecast cdn
SMART_READER_LITE
LIVE PREVIEW

Steering the Edgecast CDN Marcel Flores 13 June 2018 - - PowerPoint PPT Presentation

Steering the Edgecast CDN Marcel Flores 13 June 2018 Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized


slide-1
SLIDE 1

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Steering the Edgecast CDN

Marcel Flores 13 June 2018

slide-2
SLIDE 2

| 2

The Edgecast CDN

slide-3
SLIDE 3

| 3

  • The CDN moves content closer to end users.
  • Reduces latency, increases capacity.

What does it do?

PoP PoP Customer

  • rigin

Clients

slide-4
SLIDE 4

| 4

  • Uses anycast for PoP

selection.

  • Relies on BGP to get to the

right PoP.

  • We get lots for free:

○ Network spreads load more-or-less automatically. ○ We can achieve failover by retracting announcements.

How does it work?

The Internet PoP PoP Client

BGP MAGIC

slide-5
SLIDE 5

| 5

  • Overall, it can be unpredictable:

○ No information on latency or load.

  • What’s going to happen if we

change announcements?

  • Can make many traditional Traffic

Engineering/Management problems hard.

Anycast Challenges

The Internet PoP PoP Client

BGP MAGIC?

PoP

slide-6
SLIDE 6

| 6

  • Breaks TCP connections

○ Unpredictable behavior for a period of time

  • Long running downloads may get ruined
  • But it can be even worse...

Pulling blocks can be destructive

slide-7
SLIDE 7

| 7

Pulling blocks can be destructive

Chicago Dallas New York

slide-8
SLIDE 8

| 8

A smoother way...

CH1 CH2 CH1

DNS Magic

slide-9
SLIDE 9

| 9

  • Edgecast has a few networks.
  • Each has a (potentially) overlapping

set of servers that it addresses. ○ Kind of like Microsoft’s FastRoute

  • Using DNS can steer clients to a

particular network.

DNS To Different Anycast Announcements

slide-10
SLIDE 10

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

| 10

Working with Humans

slide-11
SLIDE 11

| 11

  • An experienced Human would look at signals:

○ Current load at PoPs ○ Available capacity at PoPs

  • Human would write and deploy a DNS rule to effect this

change.

In the old days...

slide-12
SLIDE 12

| 12

In the old days...

Try it Measure Deploy

slide-13
SLIDE 13

| 13

  • Humans make mistakes:

○ It’s hard to look at a lot of numbers at once, ○ As the CDN grows this gets more serious.

  • Humans have to sleep sometimes!

Challenges with humans...

slide-14
SLIDE 14

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

| 14

How do we make a robot do it?

slide-15
SLIDE 15

| 15

Heteractis

Decision Making Action Manager DNS Voice Chat-Ops Capacity Network Information Usage ??? DATA

slide-16
SLIDE 16

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Wow that’s complicated.

slide-17
SLIDE 17

| 17

Heteractis

Decision Making Action Manager DNS Voice Chat-Ops Capacity Network Information Usage ??? DATA

slide-18
SLIDE 18

| 18

  • Collect Many source of data.

○ Combine into meaningful representations. ○ Capacity, usage, the current state.

  • Keep those up to date.

Data Models

slide-19
SLIDE 19

| 19

Complexity Model 1. Option A .99 2. Option B .89 3. Option C .74 ...

  • Considers a set of possible

actions (i.e. DNS rules). ○ “For traffic A, send Z% to Red Network”

  • Asks: According to each model,

what would happen if I did this? ○ Each model generates a score.

Decision Making Models

D A T A

Capacity Model 1. Option A .96 2. Option B .85 3. Option C .74 ...

slide-20
SLIDE 20

| 20

  • For each action...

○ Compute a weighted linear combination.

  • Rank all the actions by

combined score.

  • Pick the action with the highest

score.

Decision Making

Capacity Model Complexity Model 90% 10% Final Score

slide-21
SLIDE 21

| 21

  • Applies some of that delicate

touch that a human would do: ○ Smooths actions out over several minutes. ○ Prevents overlapping changes from firing at once.

Action Manager

Try it Measure Deploy

slide-22
SLIDE 22

| 22

  • Interacts with humans:

○ Human gating mechanism: As a deployment strategy, a human says what is OK.

  • Integrates with other systems

○ Slack, chatops, etc.

Voice

slide-23
SLIDE 23

| 23

  • Why did it do that?

○ Can a human validate that it was a good idea?

  • Can ask it questions:

○ What does Heteractis think about X? ○ Why did it just recommend Y?

Making sure Humans Can understand.

slide-24
SLIDE 24

| 24

Why didn’t we get really fancy?

  • Humans need to feel good about why it’s making decisions.
  • Each decision:

○ Can be made based on current data alone. ■ Easy debugging ○ Can be recreated based on snapshot.

slide-25
SLIDE 25

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

| 25

Score Models: Capacity

slide-26
SLIDE 26

| 26

Basic Idea :

  • For some proposed action,

what would happen?

  • Low Utilization: High Score
  • High Utilization: Low Score

✔ X

slide-27
SLIDE 27

| 27

  • Because we are using

anycast, not totally clear what will happen.

  • So how do we know what

score to give it?

Predicting Load

The Internet PoP PoP Client

BGP MAGIC?

PoP

slide-28
SLIDE 28

| 28

  • Think about it like fluid:

○ Sum total amount to move. ○ Distribute it evenly over destinations.

Predicting Load

slide-29
SLIDE 29

| 29

On the other hand:

  • If we know we already have X% at Blue, just scale it!

At the end: we have an estimate % utilization at each PoP.

Predicting Load

slide-30
SLIDE 30

| 30

  • Use a logistic curve to

smooth out the edges.

  • Take a harmonic mean of all

PoPs to test for outliers below.

Computing Scores

slide-31
SLIDE 31

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

| 31

Heteractis In Action

slide-32
SLIDE 32

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

| 32

Customer X

Throughput Hour

slide-33
SLIDE 33

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

| 33

Customer X + Daily Traffic

Throughput Hour

slide-34
SLIDE 34

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

| 34

Heteractis’s Changes

% Moved Hour

slide-35
SLIDE 35

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

| 35

Red Network PoPs

Throughput Hour

slide-36
SLIDE 36

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

| 36

Blue Network PoPs

Throughput Hour

slide-37
SLIDE 37

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

| 37

PoP Capacity (%)

PoP Capacity (%) Hour

slide-38
SLIDE 38

| 38

Heteractis in Production

  • Heteractis has been live for almost 2 years.

○ Making automatic traffic moves nearly daily.

  • Moved from human gated to full auto.

○ Built confidence of the humans. ○ Widely used as a view into CDN health.

  • Significantly reduced manual human interactions.
slide-39
SLIDE 39

| 39

Heteractis

  • We built an automated system for managing traffic.
  • Implemented in a way that:

○ Provides visibility into decision making process. ○ Builds trust with humans.

slide-40
SLIDE 40

Thank you.

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.