How we scaled push messaging for millions of Netflix devices - - PowerPoint PPT Presentation

how we scaled push messaging for millions of netflix
SMART_READER_LITE
LIVE PREVIEW

How we scaled push messaging for millions of Netflix devices - - PowerPoint PPT Presentation

How we scaled push messaging for millions of Netflix devices Susheel Aroskar Cloud Gateway Why do we need push? How I spend my time in Netflix application... What is push? What is push? How you can build it What is push?


slide-1
SLIDE 1

How we scaled push messaging for millions of Netflix devices

Susheel Aroskar Cloud Gateway

slide-2
SLIDE 2

Why do we need push?

slide-3
SLIDE 3
slide-4
SLIDE 4

How I spend my time in Netflix application...

slide-5
SLIDE 5
  • What is push?
slide-6
SLIDE 6
  • What is push?
  • How you can build it
slide-7
SLIDE 7
  • What is push?
  • How you can build it
  • How you can operate it
slide-8
SLIDE 8
  • What is push?
  • How you can build it
  • How you can operate it
  • What can you do with it
slide-9
SLIDE 9

Susheel Aroskar

Senior Software Engineer Cloud Gateway saroskar@netflix.com github.com/raksoras @susheelaroskar

slide-10
SLIDE 10

PERSIST UNTIL SOMETHING HAPPENS

slide-11
SLIDE 11

PERSIST UNTIL SOMETHING HAPPENS

slide-12
SLIDE 12

Zuul Push Architecture

slide-13
SLIDE 13

Zuul Push Servers

slide-14
SLIDE 14

Zuul Push Servers WebSockets / SSE

slide-15
SLIDE 15

Push Registry Zuul Push Servers Register User WebSockets / SSE

slide-16
SLIDE 16

Push Registry Zuul Push Servers Register User WebSockets / SSE

slide-17
SLIDE 17

Push Library Push Registry Zuul Push Servers Register User WebSockets / SSE

slide-18
SLIDE 18

Push Library Push Message Queue Push Registry Zuul Push Servers Register User WebSockets / SSE

slide-19
SLIDE 19

Message Processor Push Library Push Message Queue Push Registry Zuul Push Servers Register User WebSockets / SSE

slide-20
SLIDE 20

Message Processor Push Library Push Message Queue Push Registry Zuul Push Servers Register User WebSockets / SSE

slide-21
SLIDE 21

Message Processor Push Library Push Message Queue Push Registry Zuul Push Servers Register User Lookup server WebSockets / SSE

slide-22
SLIDE 22

Message Processor Push Library Push Message Queue Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE

slide-23
SLIDE 23

Handling millions of persistent connections

Zuul Push server

slide-24
SLIDE 24

C10K challenge

slide-25
SLIDE 25

Socket Socket Thread per Connection Thread-1 Thread-2 Read Write Write Read

slide-26
SLIDE 26

Socket Socket Thread per Connection Thread-1 Thread-2 Read Write Write Read Async I/O Socket read callback write callback Socket Single Thread read callback write callback

slide-27
SLIDE 27

S O C K E T Channel Inbound Handler Channel Inbound Handler Channel Outbound Handler Channel Outbound Handler Channel Pipeline Head Tail

Netty

slide-28
SLIDE 28

protected void addPushHandlers(ChannelPipeline pl) { pl.addLast(new HttpServerCodec()); pl.addLast(new HttpObjectAggregator()); pl.addLast(getPushAuthHandler()); pl.addLast(new WebSocketServerCompressionHandler()); pl.addLast(new WebSocketServerProtocolHandler()); pl.addLast(getPushRegistrationHandler()); }

slide-29
SLIDE 29

Authenticate by Cookies, JWT

  • r any other custom scheme

Plug in your custom authentication policy

slide-30
SLIDE 30

Tracking clients’ connection Metadata in real-time

Push Registry

slide-31
SLIDE 31

public class MyRegistration extends PushRegistrationHandler { @Override protected void registerClient( ChannelHandlerContext ctx, PushUserAuth auth, PushConnection conn, PushConnectionRegistry registry) { super.registerClient(ctx, authEvent, conn, registry); ctx.executor().submit(() -> storeInRedis(auth)); } }

slide-32
SLIDE 32

Push registry features checklist

slide-33
SLIDE 33
  • Low read latency

Push registry features checklist

slide-34
SLIDE 34
  • Low read latency
  • Record expiry

Push registry features checklist

slide-35
SLIDE 35
  • Low read latency
  • Record expiry
  • Sharding

Push registry features checklist

slide-36
SLIDE 36
  • Low read latency
  • Record expiry
  • Sharding
  • Replication

Push registry features checklist

slide-37
SLIDE 37
slide-38
SLIDE 38

What we use

https://github.com/Netflix/dynomite

Redis + Auto-sharding + Read/Write quorum + Cross-region replication Dynomite

slide-39
SLIDE 39

Message Processing

Queue, Route Deliver

slide-40
SLIDE 40

We use Kafka message queues to decouple message senders from receivers

slide-41
SLIDE 41

Fire and Forget

slide-42
SLIDE 42

Cross-region Replication

slide-43
SLIDE 43

Different queues for different priorities

slide-44
SLIDE 44

We run multiple message processor instances in parallel to scale our message processing throughput.

slide-45
SLIDE 45

Operating Zuul Push

Different than REST of them

slide-46
SLIDE 46

Persistent connections make Zuul Push server stateful

Long lived stable connections

slide-47
SLIDE 47

Persistent connections make Zuul Push server stateful

Long lived stable connections ○ Great for client efficiency

slide-48
SLIDE 48

Persistent connections make Zuul Push server stateful

Long lived stable connections ○ Great for client efficiency ○ Terrible for quick deploy/rollback

slide-49
SLIDE 49

If you love your clients set them free...

Tear down connections periodically

slide-50
SLIDE 50

Randomize each connection’s lifetime

slide-51
SLIDE 51

# reconnects Time

Effect of randomizing connection lifetime on reconnect peaks

slide-52
SLIDE 52

Ask client to close its connection.

slide-53
SLIDE 53

Most connections are idle!

How to optimize push server

slide-54
SLIDE 54

BIG Server, tons of connections

ulimit -n 262144 net.ipv4.tcp_rmem="4096 87380 16777216" net.ipv4.tcp_wmem="4096 87380 16777216"

slide-55
SLIDE 55
slide-56
SLIDE 56
slide-57
SLIDE 57

Goldilocks strategy

slide-58
SLIDE 58

Optimize for cost, NOT instance count

$$ $$

slide-59
SLIDE 59

How to auto-scale?

slide-60
SLIDE 60

How to auto-scale?

RPS? CPU??

slide-61
SLIDE 61

How to auto-scale?

RPS? CPU?? Open Connections

slide-62
SLIDE 62

Amazon Elastic Load Balancers cannot proxy WebSockets.

slide-63
SLIDE 63

Solution - Run ELB as a TCP load balancer

7 Application 6 Presentation 5 Session 4 Transport 3 Network 2 Data link 1 Physical HTTP TCP IP Ethernet OSI 7 network layers (conceptual) HTTP over TCP/IP

Layer 7 HTTP (WebSocket Upgrade Request) Layer 4 TCP

slide-64
SLIDE 64

Managing push cluster - a quick recap

  • Recycle connections after tens of minutes
slide-65
SLIDE 65

Managing push cluster - a quick recap

  • Recycle connections after tens of minutes
  • Randomize each connection’s lifetime
slide-66
SLIDE 66

Managing push cluster - a quick recap

  • Recycle connections after tens of minutes
  • Randomize connection’s lifetime
  • More number of smaller servers >> few BIG servers
slide-67
SLIDE 67

Managing push cluster - a quick recap

  • Recycle connections after tens of minutes
  • Randomize connection’s lifetime
  • More number of smaller servers >> few BIG servers
  • Auto-scale on number of open connections per box
slide-68
SLIDE 68

Managing push cluster - a quick recap

  • Recycle connections after tens of minutes
  • Randomize connection’s lifetime
  • More number of smaller servers >> few BIG servers
  • Auto-scale on number of open connections per box
  • WebSocket aware vs TCP load balancer
slide-69
SLIDE 69

If you build it, They will push

slide-70
SLIDE 70

On-demand diagnostics

slide-71
SLIDE 71

Remote recovery

slide-72
SLIDE 72

User messaging

slide-73
SLIDE 73

WHAT WILL YOU

USE IT FOR?

slide-74
SLIDE 74

Call to action

slide-75
SLIDE 75

PULL!

slide-76
SLIDE 76

PULL!

https://github.com/Netflix/zuul

slide-77
SLIDE 77

In conclusion, push can make you

slide-78
SLIDE 78

In conclusion, push can make you rich (in functionality),

slide-79
SLIDE 79

In conclusion, push can make you rich (in functionality), thin (by getting rid of polling)

slide-80
SLIDE 80

In conclusion, push can make you rich (in functionality), thin (by getting rid of polling) and happy!

slide-81
SLIDE 81

Thank you.

slide-82
SLIDE 82

Questions?

Susheel Aroskar

Senior Software Engineer Cloud Gateway saroskar@netflix.com github.com/raksoras @susheelaroskar

slide-83
SLIDE 83

Rich, exciting Apps More efficient systems Easy to customize Easy to

  • perate

Zuul Push Battle tested