Optimizing Lua Applications for LuaJIT and OpenResty - - PowerPoint PPT Presentation

optimizing lua applications for luajit and openresty
SMART_READER_LITE
LIVE PREVIEW

Optimizing Lua Applications for LuaJIT and OpenResty - - PowerPoint PPT Presentation

Optimizing Lua Applications for LuaJIT and OpenResty agentzh@openresty.org Yichun Zhang (@agentzh) 2016.9 NGINX + LuaJIT Flame Graphs I/O Off -CPU Flame Graphs # assuming the nginx worker process to be analyzed is 10901.


slide-1
SLIDE 1

Optimizing Lua Applications for LuaJIT and OpenResty ☺agentzh@openresty.org☺

Yichun Zhang (@agentzh)

2016.9
slide-2
SLIDE 2
slide-3
SLIDE 3

♡ NGINX + LuaJIT

slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7

☺ Flame Graphs

slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11

☺ I/O

slide-12
SLIDE 12

♡ Off-CPU Flame Graphs

slide-13
SLIDE 13
slide-14
SLIDE 14 # assuming the nginx worker process to be analyzed is 10901. ./sample­bt­off­cpu ­p 10901 ­t 5 > a.bt
slide-15
SLIDE 15

# using Brendan Gregg's flame graph tools: $ stackcollapse­stap.pl a.bt > a.cbt $ flamegraph.pl a.cbt > a.svg

slide-16
SLIDE 16
slide-17
SLIDE 17

♡ Synchronously nonblocking I/O

slide-18
SLIDE 18

♡ Light threads & semaphores

slide-19
SLIDE 19

local thread_A, err = ngx.thread.spawn(func1) ­­ thread_A keeps running asynchronously ­­ in the background of the current ­­ "light thread".

slide-20
SLIDE 20

local ok, res1, res2 = ngx.thread.wait(thread_A, thread_B)

slide-21
SLIDE 21
slide-22
SLIDE 22

local ok, err = ngx.thread.kill(thread_A)

slide-23
SLIDE 23

♡ Full-Duplex Cosockets

slide-24
SLIDE 24

local sock = ngx.socket.tcp() local ok, err = sock:connect("www.cloudflare.com", 443)

  • k, err = sock:sslhandshake(

false, ­­ disable SSL session "www.cloudflare.com", ­­ SNI name true ­­ verify everything )

slide-25
SLIDE 25

♡ Timers and Sleeps

slide-26
SLIDE 26

­­ create a timer triggered after 1 sec ngx.timer.at(1000, function (premature) do_something() end) ­­ sleeps for 1 sec then continue ngx.sleep(1000)

slide-27
SLIDE 27

☺ CPU

slide-28
SLIDE 28

♡ on-CPU Flame Graphs

slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32

♡ Lua-land Flame Graphs

slide-33
SLIDE 33
slide-34
SLIDE 34

http://agentzh.org/misc/flamegraph/lua-on-cpu-local-waf-jitted-only.svg

slide-35
SLIDE 35

lj­lua­stacks.sxx ­­arg time=5 \ ­­skip­badvars \ ­x 6949 \ > a.bt

slide-36
SLIDE 36

♡ LuaJIT Built-in Profiler vs SystemTap Sampling

slide-37
SLIDE 37

♡ Dynamic Allocations & Garbage Collection

slide-38
SLIDE 38

Lua tables

slide-39
SLIDE 39

lj_tab_new lj_tab_resize lj_tab_len

slide-40
SLIDE 40

table.new(10, 20)

slide-41
SLIDE 41

table.clear(tb)

slide-42
SLIDE 42

tb[key1] = val1 tb[key1] = nil tb[key2] = val2

slide-43
SLIDE 43

Lua strings

slide-44
SLIDE 44

? s = s .. r

slide-45
SLIDE 45

­­ tb[#tb + 1] is slow! idx = idx + 1 tb[idx] = r s = table.concat(tb)

slide-46
SLIDE 46

? string.sub(s, i, i)

slide-47
SLIDE 47

string.byte(s, i, i)

slide-48
SLIDE 48

Lua functions

slide-49
SLIDE 49

foo = function (...) ... end

slide-50
SLIDE 50
slide-51
SLIDE 51

♡ JITting vs Interpreting

slide-52
SLIDE 52

lua-resty-core

slide-53
SLIDE 53
slide-54
SLIDE 54
slide-55
SLIDE 55
slide-56
SLIDE 56

jit.v jit.dump

slide-57
SLIDE 57

lj­lua­stacks.sxx ­­arg nojit=1 ... lj­lua­stacks.sxx ­­arg nointerp=1 ...

slide-58
SLIDE 58

♡ Biased vs Unbiased Branching

slide-59
SLIDE 59

♡ Lua code generation atop LuaJIT JIT over a JIT!

slide-60
SLIDE 60

♡ Regexes

slide-61
SLIDE 61

/ \d+ \. \d+ | \. \d+ | \d+ /x

slide-62
SLIDE 62
slide-63
SLIDE 63

sregex

slide-64
SLIDE 64
slide-65
SLIDE 65
slide-66
SLIDE 66
slide-67
SLIDE 67
slide-68
SLIDE 68
slide-69
SLIDE 69
slide-70
SLIDE 70

☺ Memory

slide-71
SLIDE 71

♡ Memory-Leak Flame Graphs

slide-72
SLIDE 72
slide-73
SLIDE 73

♡ GC Object Analaysis

slide-74
SLIDE 74 $ lj­gc­objs.sxx ­x 14378 ­D MAXACTION=200000 Start tracing 14378 (/opt/nginx/sbin/nginx) main machine code area size: 65536 bytes C callback machine code size: 4096 bytes GC total size: 9683407 bytes GC state: pause 27948 table objects: max=131112, avg=106, min=32, sum=2983944 (in bytes) 22343 string objects: max=1421562, avg=198, min=18, sum=4432482 (in bytes) 12168 userdata objects: max=8916, avg=50, min=27, sum=619223 (in bytes) 2837 function objects: max=148, avg=27, min=20, sum=78264 (in bytes) 1200 upvalue objects: max=24, avg=24, min=24, sum=28800 (in bytes) 650 proto objects: max=3860, avg=313, min=74, sum=203902 (in bytes) 349 thread objects: max=1648, avg=774, min=424, sum=270464 (in bytes) 202 trace objects: max=1560, avg=375, min=160, sum=75832 (in bytes) 9 cdata objects: max=36, avg=17, min=12, sum=156 (in bytes) JIT state size: 7696 bytes global state tmpbuf size: 710772 bytes C type state size: 4568 bytes My GC walker detected for total 9683407 bytes. 45008 microseconds elapsed in the probe handler.
slide-75
SLIDE 75 (gdb) lgcstat 15172 str objects: max=2956, avg = 51, min=18, sum=779126 987 upval objects: max=24, avg = 24, min=24, sum=23688 104 thread objects: max=1648, avg = 1622, min=528, sum=168784 431 proto objects: max=226274, avg = 2234, min=78, sum=963196 952 func objects: max=144, avg = 30, min=20, sum=28900 446 trace objects: max=23400, avg = 1857, min=160, sum=828604 2965 cdata objects: max=4112, avg = 17, min=12, sum=51576 18961 tab objects: max=24608, avg = 207, min=32, sum=3943256 9 udata objects: max=176095, avg = 39313, min=32, sum=353822
slide-76
SLIDE 76

♡ Streaming Processing

slide-77
SLIDE 77
slide-78
SLIDE 78
slide-79
SLIDE 79

♡ Streaming Regex (sregex)

slide-80
SLIDE 80

♡ The cost of abstractions

slide-81
SLIDE 81

♡ The oppportunities of new abstractions

slide-82
SLIDE 82

♡ Business-Level Domain Specific Languages

slide-83
SLIDE 83

ModSecurity's syntax sucks.

slide-84
SLIDE 84

☺ Any questions? ☺

slide-85
SLIDE 85