Bimodal Multicast
And Cache Invalidation
Bimodal Multicast And Cache Invalidation Who/What/Where Bruce - - PowerPoint PPT Presentation
Bimodal Multicast And Cache Invalidation Who/What/Where Bruce Spang Software Engineer/Student Fastly ` Powderhorn Outline Frame the problem The papers we looked at Bimodal Multicast What we built Content Delivery
And Cache Invalidation
We would like to be able to update a piece of content globally
Notify all servers to remove a piece of content
Central Service.
M e s s a g e
A c k
M e s s a g e
A c k
Works
Cache servers can send purges themselves
M e s s a g e
A c k
Every server sends an ack to the sender
This is hard.
Send a message to a set of servers
“Try very hard” to deliver a message
`
messages it knows about
messages, it requests that they be retransmitted
M e s s a g e
Ack
M e s s a g e
R e s e n d
A server that’s behind will recover from many servers in the cluster.
Don’t DDoS servers that are trying to recover.
Ignore servers that are running behind.
Bimodal Multicast in the Wild
All the logic is in failure handling.
“I have messages {1,2,3,4,5,6,7,8,9,…}”
We see high packet loss and network partitions all the time.
List of Message IDs
Doesn’t Have to be a List
New York London San Jose Tokyo
0.00 0.05 0.10 0.00 0.05 0.10 0.00 0.05 0.10 0.00 0.05 0.10 50 100 150Latency (ms) Density
Density plot and 95th percentile of purge latency by server location
30 60 90 120 02:30 03:00 03:30 04:00
Time Throughput (messages/s)
0.1 1 10 60 02:30 03:00 03:30 04:00
Time 95th percentile latency (s)
Cache server A B C D
Purge performance under network partition
30 60 90 120 06:00 06:10 06:20 06:30
Time Throughput (messages/s)
5 10 15 06:00 06:10 06:20 06:30
Time Recovered purges (messages/s)
Cache server NYC London
Purge performance
100 200 300 400 16:30 17:00 17:30 18:00
Time Throughput (messages/s)
25 50 75 100 125 16:30 17:00 17:30 18:00
Time Recovered purges (messages/s)
Cache server Affected Unaffected
Purge performance
50 100 150 23:40 23:50 00:00 00:10 00:20
Time Throughput (messages/s)
0.1 1 10 60 23:40 23:50 00:00 00:10 00:20
Time 95th percentile latency (s)
Cache server Victim Unaffected
Purge performance under denial−of−service attack
We generally don’t have to worry about purging failing, even when the network does.
brucespang.com/bimodal
brucespang.com/bimodal
www.fastly.com/about/jobs
5 10 May 07 May 08 May 09 May 10 May 11 May 12 May 13
Date 95th percentile latency (ms)
Purge performance with linear probing hash−table