Exception-Less System Calls for Event-Driven Servers Livio Soares - - PowerPoint PPT Presentation

exception less system calls for event driven servers
SMART_READER_LITE
LIVE PREVIEW

Exception-Less System Calls for Event-Driven Servers Livio Soares - - PowerPoint PPT Presentation

Exception-Less System Calls for Event-Driven Servers Livio Soares and Michael Stumm University of Toronto memcached speeds up by 25-35% nginx speeds up by 70-120% Talk overview At OSDI'10: exception-less system calls Technique targeted


slide-1
SLIDE 1

Exception-Less System Calls for Event-Driven Servers

Livio Soares and Michael Stumm University of Toronto

slide-2
SLIDE 2

Livio Soares | Exception-Less System Calls for Event-Driven Servers 2

Talk overview

➔ At OSDI'10: exception-less system calls

➔ Technique targeted at highly threaded servers ➔ Doubled performance of Apache

➔ Event-driven servers are popular

➔ Faster than threaded servers

We show that exception-less system calls make event-driven server faster

➔ memcached speeds up by 25-35%

memcached speeds up by 25-35%

➔ nginx speeds up by 70-120%

nginx speeds up by 70-120%

slide-3
SLIDE 3

Livio Soares | Exception-Less System Calls for Event-Driven Servers 3

Event-driven server architectures

➔ Supports I/O concurrency with a single

execution context

➔ Alternative to thread based architectures

➔ At a high-level:

➔ Divide program flow into non-blocking stages ➔ After each stage register interest in event(s) ➔ Notification of event is asynchronous, driving next

stage in the program flow

➔ To avoid idle time, applications multiplex execution

  • f multiple independent stages
slide-4
SLIDE 4

Livio Soares | Exception-Less System Calls for Event-Driven Servers 4

Example: simple network server

void server() { ... ... fd = accept(); ... ... read(fd); ... ... write(fd); ... ... close(fd); ... ... }

slide-5
SLIDE 5

Livio Soares | Exception-Less System Calls for Event-Driven Servers 5

S2 S3 S4 S5 S1

Example: simple network server

void server() { ... ... fd = accept(); ... ... read(fd); ... ... write(fd); ... ... close(fd); ... ... }

S1 S2 S3 S4 S5

UNIX options: Non-blocking I/O poll() select() epoll() Async I/O

slide-6
SLIDE 6

Livio Soares | Exception-Less System Calls for Event-Driven Servers 6

Performance: events vs. threads

100 200 300 400 500 2000 4000 6000 8000 10000 12000 14000

ApacheBench

nginx (events) Apache (threads)

Concurrency Requests/sec.

nginx delivers 1.7x the throughput of Apache; gracefully copes with high loads

slide-7
SLIDE 7

Livio Soares | Exception-Less System Calls for Event-Driven Servers 7

Issues with UNIX event primitives

➔ Do not cover all system calls

➔ Mostly work with file-descriptors (files and sockets)

➔ Overhead

➔ Tracking progress of I/O involves both application

and kernel code

➔ Application and kernel communicate frequently

Previous work shows that fine-grain mode switching can half processor efficiency

slide-8
SLIDE 8

Livio Soares | Exception-Less System Calls for Event-Driven Servers 8

FlexSC component overview

FlexSC and FlexSC-Threads presented at OSDI 2010 This work: libflexsc for event-driven servers 1) memcached throughput increase of up to 35% 2) nginx throughput increase of up to 120%

slide-9
SLIDE 9

Livio Soares | Exception-Less System Calls for Event-Driven Servers 9

Benefits for event-driven applications

1) General purpose

➔ Any/all system calls can be asynchronous

2) Non-intrusive kernel implementation

➔ Does not require per syscall code

3) Facilitates multi-processor execution

➔ OS work is automatically distributed

4) Improved processor efficiency

➔ Reduces frequent user/kernel mode switches

slide-10
SLIDE 10

Livio Soares | Exception-Less System Calls for Event-Driven Servers 10

Summary of exception-less syscalls

slide-11
SLIDE 11

Livio Soares | Exception-Less System Calls for Event-Driven Servers 11

Exception-less interface: syscall page

write(fd, buf, 4096); entry = free_syscall_entry(); /* write syscall */ /* write syscall */ entry->syscall = 1; entry->num_args = 3; entry->args[0] = fd; entry->args[1] = buf; entry->args[2] = 4096; entry->status = SUBMIT SUBMIT; while while (entry->status != DONE DONE) do_something_else(); return return entry->return_code;

slide-12
SLIDE 12

Livio Soares | Exception-Less System Calls for Event-Driven Servers 12

Exception-less interface: syscall page

write(fd, buf, 4096); entry = free_syscall_entry(); /* write syscall */ /* write syscall */ entry->syscall = 1; entry->num_args = 3; entry->args[0] = fd; entry->args[1] = buf; entry->args[2] = 4096; entry->status = SUBMIT SUBMIT; while while (entry->status != DONE DONE) do_something_else(); return return entry->return_code;

SUBMIT SUBMIT

slide-13
SLIDE 13

Livio Soares | Exception-Less System Calls for Event-Driven Servers 13

Exception-less interface: syscall page

write(fd, buf, 4096); entry = free_syscall_entry(); /* write syscall */ /* write syscall */ entry->syscall = 1; entry->num_args = 3; entry->args[0] = fd; entry->args[1] = buf; entry->args[2] = 4096; entry->status = SUBMIT SUBMIT; while while (entry->status != DONE DONE) do_something_else(); return return entry->return_code;

DONE DONE

slide-14
SLIDE 14

Livio Soares | Exception-Less System Calls for Event-Driven Servers 14

Syscall threads

➔ Kernel-only threads

➔ Part of application process

➔ Execute requests from syscall page ➔ Schedulable on a per-core basis

slide-15
SLIDE 15

Livio Soares | Exception-Less System Calls for Event-Driven Servers 15

Dynamic multicore specialization

1) FlexSC makes specializing cores simple 2) Dynamically adapts to workload needs

Core 0 Core 2 Core 1 Core 3

slide-16
SLIDE 16

Livio Soares | Exception-Less System Calls for Event-Driven Servers 16

libflexsc: async syscall library

➔ Async syscall and notification library ➔ Similar to libevent

➔ But operates on syscalls instead of file-descriptors

➔ Three main components:

1) Provides main loop (dispatcher) 2) Support asynchronous syscall with associated callback to notify completion 3) Cancellation support

slide-17
SLIDE 17

Livio Soares | Exception-Less System Calls for Event-Driven Servers 17

Main API: async system call

1 struct flexsc_cb { 2 void (*callback)(struct flexsc_cb *); /* event handler */ 3 void *arg; /* auxiliary var */ 4 int64_t ret; /* syscall return */ 5 } 6 7 int flexsc_##SYSCALL(struct flexsc_cb *, ... /*syscall args*/); 8 /* Example: asynchronous accept */ 9 struct flexsc_cb cb; 10 cb.callback = handle_accept; 11 flexsc_accept(&cb, master_sock, NULL, 0); 12 13 void handle_accept(struct flexsc_cb *cb) { 14 int fd = cb->ret; 15 if (fd != -1) { 16 struct flexsc_cb read_cb; 17 read_cb.callback = handle_read; 18 flexsc_read(&read_cb, fd, read_buf, read_count); 19 } 20 }

slide-18
SLIDE 18

Livio Soares | Exception-Less System Calls for Event-Driven Servers 18

memcached port to libflexsc

➔ memcached: in-memory key/value store

➔ Simple code-base: 8K LOC ➔ Uses libevent

➔ Modified 293 LOC ➔ Transformed libevent calls to libflexsc ➔ Mostly in one file: memcached.c ➔ Most memcached syscalls are socket based

slide-19
SLIDE 19

Livio Soares | Exception-Less System Calls for Event-Driven Servers 19

nginx port to libflexsc

➔ Most popular event-driven webserver

➔ Code base: 82K LOC ➔ Natively uses both non-blocking (epoll) I/O and

asynchronous I/O

➔ Modified 255 LOC ➔ Socket based code already asynchronous ➔ Not all file-system calls were asynchronous

➔ e.g., open, fstat, getdents

➔ Special handling of stack allocated syscall args

slide-20
SLIDE 20

Livio Soares | Exception-Less System Calls for Event-Driven Servers 20

Evaluation

➔ Linux 2.6.33 ➔ Nehalem (Core i7) server, 2.3GHz

➔ 4 cores

➔ Client connected through 1Gbps network ➔ Workloads

➔ memslap on memcached (30% user, 70% kernel) ➔ httperf on nginx (25% user, 75% kernel)

➔ Default Linux (“epoll”) vs.

libflexsc (“flexsc”)

slide-21
SLIDE 21

Livio Soares | Exception-Less System Calls for Event-Driven Servers 21

memcached on 4 cores

200 400 600 800 1000 20000 40000 60000 80000 100000 120000 140000

flexsc epoll

Request Concurrency Throughput (requests/sec.)

30% improvement

slide-22
SLIDE 22

Livio Soares | Exception-Less System Calls for Event-Driven Servers 22

memcached processor metrics

CPI L2 d-cache i-cache CPI L2 d-cache i-cache 0.2 0.4 0.6 0.8 1 1.2

Relative Performance (flexsc/epoll)

User Kernel

slide-23
SLIDE 23

Livio Soares | Exception-Less System Calls for Event-Driven Servers 23

httperf on nginx (1 core)

10000 20000 30000 40000 50000 60000 20 40 60 80 100 120

flexsc epoll

Requests/s Throughput (Mbps)

100% improvement

slide-24
SLIDE 24

Livio Soares | Exception-Less System Calls for Event-Driven Servers 24

nginx processor metrics

CPI L2 d-cache i-cache Branch CPI L2 d-cache i-cache Branch 0.2 0.4 0.6 0.8 1 1.2

Relative Performance (flexsc/epoll)

User Kernel

slide-25
SLIDE 25

Livio Soares | Exception-Less System Calls for Event-Driven Servers 25

Concluding remarks

➔ Current event-based primitives add overhead

➔ I/O operations require frequent communication

between OS and application

➔ libflexsc: exception-less syscall library

1) General purpose 2) Non-intrusive kernel implementation 3) Facilitates multi-processor execution 4) Improved processor efficiency

➔ Ported memcached and nginx to libflexsc

➔ Performance improvements of 30 - 120%

slide-26
SLIDE 26

Exception-Less System Calls for Event-Driven Servers

Livio Soares and Michael Stumm University of Toronto

slide-27
SLIDE 27

Backup Slides

slide-28
SLIDE 28

Livio Soares | Exception-Less System Calls for Event-Driven Servers 28

Difference in improvements

Server memcached nginx Frequency of syscalls (in instructions) 3,750 1,460

Why does nginx improve more than memcached? 1) Frequency of mode switches: 2) nginx uses greater diversity of system calls

➔ More interference in processor structures (caches)

3) Instruction count reduction

➔ nginx with epoll() has connection timeouts

slide-29
SLIDE 29

Livio Soares | Exception-Less System Calls for Event-Driven Servers 29

Limitations

➔ Scalability (number of outstanding syscalls)

➔ Interface: operations perform linear scan ➔ Implementation: overheads of syscall threads

non-negligible

➔ Solutions

➔ Throttle syscalls at application or OS ➔ Switch interface to scalable message passing ➔ Provide exception-less versions of async I/O ➔ Make kernel fully non-blocking

slide-30
SLIDE 30

Livio Soares | Exception-Less System Calls for Event-Driven Servers 30

Latency (ApacheBench)

1 core 2 cores

5 10 15 20 25 30

epoll flexsc Latency (ms)

50% latency reduction