How we scaled push messaging for millions of Netflix devices
Susheel Aroskar Cloud Gateway
How we scaled push messaging for millions of Netflix devices - - PowerPoint PPT Presentation
How we scaled push messaging for millions of Netflix devices Susheel Aroskar Cloud Gateway Why do we need push? How I spend my time in Netflix application... What is push? What is push? How you can build it What is push?
Susheel Aroskar Cloud Gateway
Susheel Aroskar
Senior Software Engineer Cloud Gateway saroskar@netflix.com github.com/raksoras @susheelaroskar
Zuul Push Servers
Zuul Push Servers WebSockets / SSE
Push Registry Zuul Push Servers Register User WebSockets / SSE
Push Registry Zuul Push Servers Register User WebSockets / SSE
Push Library Push Registry Zuul Push Servers Register User WebSockets / SSE
Push Library Push Message Queue Push Registry Zuul Push Servers Register User WebSockets / SSE
Message Processor Push Library Push Message Queue Push Registry Zuul Push Servers Register User WebSockets / SSE
Message Processor Push Library Push Message Queue Push Registry Zuul Push Servers Register User WebSockets / SSE
Message Processor Push Library Push Message Queue Push Registry Zuul Push Servers Register User Lookup server WebSockets / SSE
Message Processor Push Library Push Message Queue Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE
Handling millions of persistent connections
C10K challenge
Socket Socket Thread per Connection Thread-1 Thread-2 Read Write Write Read
Socket Socket Thread per Connection Thread-1 Thread-2 Read Write Write Read Async I/O Socket read callback write callback Socket Single Thread read callback write callback
S O C K E T Channel Inbound Handler Channel Inbound Handler Channel Outbound Handler Channel Outbound Handler Channel Pipeline Head Tail
protected void addPushHandlers(ChannelPipeline pl) { pl.addLast(new HttpServerCodec()); pl.addLast(new HttpObjectAggregator()); pl.addLast(getPushAuthHandler()); pl.addLast(new WebSocketServerCompressionHandler()); pl.addLast(new WebSocketServerProtocolHandler()); pl.addLast(getPushRegistrationHandler()); }
Authenticate by Cookies, JWT
Tracking clients’ connection Metadata in real-time
public class MyRegistration extends PushRegistrationHandler { @Override protected void registerClient( ChannelHandlerContext ctx, PushUserAuth auth, PushConnection conn, PushConnectionRegistry registry) { super.registerClient(ctx, authEvent, conn, registry); ctx.executor().submit(() -> storeInRedis(auth)); } }
https://github.com/Netflix/dynomite
Redis + Auto-sharding + Read/Write quorum + Cross-region replication Dynomite
Queue, Route Deliver
We use Kafka message queues to decouple message senders from receivers
Different than REST of them
Long lived stable connections
Long lived stable connections ○ Great for client efficiency
Long lived stable connections ○ Great for client efficiency ○ Terrible for quick deploy/rollback
If you love your clients set them free...
Tear down connections periodically
Randomize each connection’s lifetime
# reconnects Time
Effect of randomizing connection lifetime on reconnect peaks
Ask client to close its connection.
Most connections are idle!
ulimit -n 262144 net.ipv4.tcp_rmem="4096 87380 16777216" net.ipv4.tcp_wmem="4096 87380 16777216"
Optimize for cost, NOT instance count
RPS? CPU??
RPS? CPU?? Open Connections
Solution - Run ELB as a TCP load balancer
7 Application 6 Presentation 5 Session 4 Transport 3 Network 2 Data link 1 Physical HTTP TCP IP Ethernet OSI 7 network layers (conceptual) HTTP over TCP/IP
Layer 7 HTTP (WebSocket Upgrade Request) Layer 4 TCP
USE IT FOR?
Susheel Aroskar
Senior Software Engineer Cloud Gateway saroskar@netflix.com github.com/raksoras @susheelaroskar
Rich, exciting Apps More efficient systems Easy to customize Easy to