A Static Type Analysis for Lua
Lua
Jan Midtgaard
Dagstuhl Seminar 16131
Lua Jan Midtgaard Dagstuhl Seminar 16131 State of affairs, PL-wise - - PowerPoint PPT Presentation
A Static Type Analysis for Lua Lua Jan Midtgaard Dagstuhl Seminar 16131 State of affairs, PL-wise Good news: Many of todays popular programming languages (JavaScript, Python, Lua, . . . ) are dynamically typed: they enable rapid
Jan Midtgaard
Dagstuhl Seminar 16131
2 / 30
Good news: Many of today’s popular programming languages (JavaScript, Python, Lua, . . . ) are dynamically typed:
Bad news: In terms of software guarantees, that means
3 / 30
3 / 30
4 / 30
More good news: Static analysis to the rescue! Today: Developing a static analysis (lattice) for inferring types of Lua programs
Result: A tool
5 / 30 Lua games picture slide courtesy of Ierusalimschy, de Figueiredo, and Celes. The above company names and logos are trademarked.
6 / 30
–
Imperative
–
Dynamically typed
–
Builtin tables (associative arrays)
–
First-class functions
–
. . .
, OO, . . . )
7 / 30
Set = {}
function Set.new (l) local set = {} for _, v in ipairs(l) do set[v] = true end return set end function Set.union (a, b) local res = Set.new {} for k in pairs(a) do res[k] = true end for k in pairs(b) do res[k] = true end return res end s1 = Set.new {10 ,20 ,30 ,50} s2 = Set.new {30 ,1} s3 = Set.union(s1,s2)
8 / 30
Set = {}
Set.new = function (l) local set = {} for _, v in ipairs(l) do set[v] = true end return set end Set.union = function (a, b) local res = Set.new {} for k in pairs(a) do res[k] = true end for k in pairs(b) do res[k] = true end return res end s1 = Set.new {10 ,20 ,30 ,50} s2 = Set.new {30 ,1} s3 = Set.union(s1,s2)
8 / 30
Set = {}
Set.new = function (l) local set = {} for _, v in ipairs(l) do set[v] = true end return set end Set.union = function (a, b) local res = Set.new {} for k in pairs(a) do res[k] = true end for k in pairs(b) do res[k] = true end return res end s1 = Set.new {10 ,20 ,30 ,50} s2 = Set.new {30 ,1} s3 = Set.union(s1,s2)
builtin tables
8 / 30
Set = {}
Set.new = function (l) local set = {} for _, v in ipairs(l) do set[v] = true end return set end Set.union = function (a, b) local res = Set.new {} for k in pairs(a) do res[k] = true end for k in pairs(b) do res[k] = true end return res end s1 = Set.new {10 ,20 ,30 ,50} s2 = Set.new {30 ,1} s3 = Set.union(s1,s2)
tables as modules builtin tables
8 / 30
Set = {}
Set.new = function (l) local set = {} for _, v in ipairs(l) do set[v] = true end return set end Set.union = function (a, b) local res = Set.new {} for k in pairs(a) do res[k] = true end for k in pairs(b) do res[k] = true end return res end s1 = Set.new {10 ,20 ,30 ,50} s2 = Set.new {30 ,1} s3 = Set.union(s1,s2)
tables as modules builtin tables lexical scope, block structure
9 / 30
To add PL spice it also includes:
–
These are used to model both OO-like inheritance and overriding
–
Python has similar constructions
10 / 30
Set = {} local mt = {} -- metatable for sets
function Set.new (l) local set = {} setmetatable(set ,mt) for _, v in ipairs(l) do set[v] = true end return set end function Set.union (a, b) local res = Set.new {} for k in pairs(a) do res[k] = true end for k in pairs(b) do res[k] = true end return res end mt.__add = Set.union s1 = Set.new {10 ,20 ,30 ,50} s2 = Set.new {30 ,1} s3 = s1 + s2
10 / 30
Set = {} local mt = {} -- metatable for sets
function Set.new (l) local set = {} setmetatable(set ,mt) for _, v in ipairs(l) do set[v] = true end return set end function Set.union (a, b) local res = Set.new {} for k in pairs(a) do res[k] = true end for k in pairs(b) do res[k] = true end return res end mt.__add = Set.union s1 = Set.new {10 ,20 ,30 ,50} s2 = Set.new {30 ,1} s3 = s1 + s2
declare a metatable
10 / 30
Set = {} local mt = {} -- metatable for sets
function Set.new (l) local set = {} setmetatable(set ,mt) for _, v in ipairs(l) do set[v] = true end return set end function Set.union (a, b) local res = Set.new {} for k in pairs(a) do res[k] = true end for k in pairs(b) do res[k] = true end return res end mt.__add = Set.union s1 = Set.new {10 ,20 ,30 ,50} s2 = Set.new {30 ,1} s3 = s1 + s2
declare a metatable install metatable
10 / 30
Set = {} local mt = {} -- metatable for sets
function Set.new (l) local set = {} setmetatable(set ,mt) for _, v in ipairs(l) do set[v] = true end return set end function Set.union (a, b) local res = Set.new {} for k in pairs(a) do res[k] = true end for k in pairs(b) do res[k] = true end return res end mt.__add = Set.union s1 = Set.new {10 ,20 ,30 ,50} s2 = Set.new {30 ,1} s3 = s1 + s2
declare a metatable install metatable register metamethod for addition event
10 / 30
Set = {} local mt = {} -- metatable for sets
function Set.new (l) local set = {} setmetatable(set ,mt) for _, v in ipairs(l) do set[v] = true end return set end function Set.union (a, b) local res = Set.new {} for k in pairs(a) do res[k] = true end for k in pairs(b) do res[k] = true end return res end mt.__add = Set.union s1 = Set.new {10 ,20 ,30 ,50} s2 = Set.new {30 ,1} s3 = s1 + s2
declare a metatable install metatable register metamethod for addition event utilize overridden addition
11 / 30
approximating the run-time values of Lua programs
Our challenge is to design a lattice capable of expressing types for this kind of example Related work: . . .
11 / 30
approximating the run-time values of Lua programs
Our challenge is to design a lattice capable of expressing types for this kind of example Related work: . . . lots
12 / 30
Lua has a range of basic values:
nil, true, false, 42, 3.14, "hello", ’world’, . . . statelattice = Var − → valuelattice valuelattice = P(tag) tag = {nil, bool, number, string, userdata}
program point
12 / 30
Lua has a range of basic values:
nil, true, false, 42, 3.14, "hello", ’world’, . . . statelattice = Var − → valuelattice valuelattice = P(tag) tag = {nil, bool, number, string, userdata}
program point
12 / 30
Lua has a range of basic values:
nil, true, false, 42, 3.14, "hello", ’world’, . . . analysislattice = pplabel − → statelattice statelattice = Var − → valuelattice valuelattice = P(tag) tag = {nil, bool, number, string, userdata}
program point
13 / 30
In Lua even environments are hashtables:
for name ,val in pairs(_G) do print(name , type(val)) end
displays _G’s content (the global environment):
string table pairs function _G table type function arg table rawget function loadstring function ...
Tables double as modules (e.g., string.len), objects, and classes
14 / 30
As a first step we will label all allocation sites:
Set = ℓ1{}
Set.new = function (l) local set = ℓ2{} for _, v in ipairs(l) do set[v] = true end return set end ...
An allocation site (a label) represents all table values
A table’s contents is available in an approximate store.
15 / 30
analysislattice = pplabel − → statelattice statelattice = storelattice × envlattice envlattice = tablelabel+ storelattice = tablelabel − → proplattice proplattice = (string − → valuelattice) × valuelattice valuelattice = P(tag) × P(tablelabel) tag = {nil, bool, number, string, userdata}
We label each scope (scope chain is a label string)
16 / 30
analysislattice = pplabel − → statelattice statelattice = storelattice × envlattice envlattice = tablelabel+ storelattice = tablelabel − → proplattice proplattice = (string − → valuelattice) × valuelattice valuelattice = P(tag) × P(tablelabel) tag = {nil, bool, number, string, userdata}
unknown entries.
(we know variable names statically)
17 / 30
analysislattice = pplabel − → statelattice statelattice = storelattice × envlattice envlattice = tablelabel+ storelattice = tablelabel − → proplattice proplattice = (string − → valuelattice) × valuelattice valuelattice = P(tag) × stringlattice × P(tablelabel) stringlattice= {⊥, ⊤} ∪ string tag = {nil, bool, number, userdata}
Solution:
18 / 30
analysislattice = pplabel − → statelattice statelattice = storelattice × envlattice envlattice = tablelabel+ storelattice = tablelabel − → proplattice proplattice = (string − → valuelattice) × valuelattice valuelattice = P(tag) × stringlattice × P(tablelabel) stringlattice = {⊥, ⊤} ∪ string tag = {nil, bool, number, userdata}
Challenge:
18 / 30
analysislattice = pplabel − → statelattice statelattice = storelattice × envlattice envlattice = tablelabel+ storelattice = tablelabel − → proplattice proplattice = (string − → valuelattice) × valuelattice × valuelattice valuelattice = P(tag) × stringlattice × P(tablelabel) stringlattice = {⊥, ⊤} ∪ string tag = {nil, bool, number, userdata}
Challenge:
valuelattice
19 / 30
analysislattice = pplabel − → statelattice statelattice = storelattice × envlattice envlattice = tablelabel+ storelattice = tablelabel − → proplattice proplattice = (string − → valuelattice) × valuelattice × valuelattice valuelattice = P(tag) × stringlattice × P(tablelabel) stringlattice = {⊥, ⊤} ∪ string tag = {nil, bool, number, userdata}
environment at run time
19 / 30
analysislattice = pplabel − → statelattice statelattice = storelattice × envlattice envlattice = P(tablelabel+) storelattice = tablelabel − → proplattice proplattice = (string − → valuelattice) × valuelattice × valuelattice valuelattice = P(tag) × stringlattice × P(tablelabel) stringlattice = {⊥, ⊤} ∪ string tag = {nil, bool, number, userdata}
environment at run time
approximation
20 / 30
Which means we can pass around functions as values:
function apply (f, x) return f(x) end local tmp = apply(functionℓ (x) return x end , "foo") print(tmp , apply(type , tmp))
Again we use labels:
from a function literal
21 / 30
analysislattice = pplabel − → statelattice statelattice = storelattice × envlattice envlattice = P(tablelabel+) storelattice = tablelabel − → proplattice proplattice = (string − → valuelattice) × valuelattice × valuelattice valuelattice = P(tag) × stringlattice × P(funlabel) × P(tablelabel) stringlattice = {⊥, ⊤} ∪ string tag = {nil, bool, number, userdata}
21 / 30
analysislattice = pplabel − → statelattice statelattice = storelattice × envlattice envlattice = P(tablelabel+) storelattice = tablelabel − → proplattice proplattice = (string − → valuelattice) × valuelattice × valuelattice valuelattice = P(tag) × stringlattice × P(funlabel) × P(tablelabel) stringlattice = {⊥, ⊤} ∪ string tag = {nil, bool, number, userdata}
21 / 30
analysislattice = pplabel − → statelattice statelattice = storelattice × envlattice envlattice = P(tablelabel+) storelattice = tablelabel − → proplattice proplattice = (string − → valuelattice) × valuelattice × valuelattice valuelattice = P(tag) × stringlattice × P(funlabel) × P(tablelabel) stringlattice = {⊥, ⊤} ∪ string tag = {nil, bool, number, userdata}
22 / 30
analysislattice = pplabel − → statelattice statelattice = storelattice × envlattice envlattice = P(tablelabel+) storelattice = tablelabel − → proplattice proplattice = (string − → valuelattice) × valuelattice × valuelattice valuelattice = P(tag) × stringlattice × P(funlabel) × P(tablelabel) stringlattice = {⊥, ⊤} ∪ string tag = {nil, bool, number, userdata}
22 / 30
analysislattice = pplabel − → statelattice statelattice = storelattice × envlattice envlattice = P(tablelabel+) storelattice = tablelabel − → proplattice proplattice = (string − → valuelattice) × valuelattice × valuelattice valuelattice = P(tag) × stringlattice × P(funlabel) × P(tablelabel) stringlattice = {⊥, ⊤} ∪ string tag = {nil, bool, number, userdata}
Problem:
23 / 30
analysislattice = pplabel − → statelattice statelattice = storelattice × envlattice envlattice = P(tablelabel+) storelattice = tablelabel − → proplattice proplattice = (string − → absencelattice × valuelattice) × valuelattice × valuelattice absencelattice = {⊥, ⊤} valuelattice = P(tag) × stringlattice × P(funlabel) × P(tablelabel) stringlattice = {⊥, ⊤} ∪ string tag = {nil, bool, number, userdata}
24 / 30
analysislattice = pplabel − → statelattice statelattice = storelattice × envlattice envlattice = P(tablelabel+) storelattice = tablelabel − → proplattice proplattice = ((string ∪ {metatable}) − → absencelattice × valuelattice) × valuelattice × valuelattice absencelattice = {⊥, ⊤} valuelattice = P(tag) × stringlattice × numberlattice × P(funlabel) × P(tablelabel) stringlattice = {⊥, ⊤} ∪ string numberlattice = {⊥, ⊤} tag = {nil, bool, userdata}
A special table entry models installed metatables
25 / 30
Transfer functions over this lattice now models Lua’s semantics. The manual specifies, e.g., events with interpreters:
function add_event (op1 , op2) local o1 , o2 = tonumber(op1), tonumber(op2) if o1 and o2 then
return o1 + o2
’ else
local h = getbinhandler(op1 , op2 , "__add") if h then
return (h(op1 , op2)) else
error(· · ·) end end end
25 / 30
Transfer functions over this lattice now models Lua’s semantics. The manual specifies, e.g., events with interpreters:
function add_event (op1 , op2) local o1 , o2 = tonumber(op1), tonumber(op2) if o1 and o2 then
return o1 + o2
’ else
local h = getbinhandler(op1 , op2 , "__add") if h then
return (h(op1 , op2)) else
error(· · ·) end end end
numeric add
25 / 30
Transfer functions over this lattice now models Lua’s semantics. The manual specifies, e.g., events with interpreters:
function add_event (op1 , op2) local o1 , o2 = tonumber(op1), tonumber(op2) if o1 and o2 then
return o1 + o2
’ else
local h = getbinhandler(op1 , op2 , "__add") if h then
return (h(op1 , op2)) else
error(· · ·) end end end
numeric add
26 / 30
Transfer functions over this lattice now models Lua’s semantics. We try to stick as close to the manual as possible:
and transfer_arith_event clab info op event op1 op2 = let o1 ,o2 = VL.coerce_tonum op1 , VL.coerce_tonum op2 in mvl_join (if VL.may_be_number (VL.meet o1 o2) (* -- both operands are numeric? *) then red_return (VL.binop op o1 o2) (* -- ’+’ here is the primitive ’add ’ *) else merror) (getbinhandler op1 op2 event >>= fun h -> if VL.may_be_proc h then (* -- call the handler with both operands *) transfer_calls clab h [op1;op2] info >>= adjust_to_single else (* -- no handler available: default behavior *) merror)
26 / 30
Transfer functions over this lattice now models Lua’s semantics. We try to stick as close to the manual as possible:
and transfer_arith_event clab info op event op1 op2 = let o1 ,o2 = VL.coerce_tonum op1 , VL.coerce_tonum op2 in mvl_join (if VL.may_be_number (VL.meet o1 o2) (* -- both operands are numeric? *) then red_return (VL.binop op o1 o2) (* -- ’+’ here is the primitive ’add ’ *) else merror) (getbinhandler op1 op2 event >>= fun h -> if VL.may_be_proc h then (* -- call the handler with both operands *) transfer_calls clab h [op1;op2] info >>= adjust_to_single else (* -- no handler available: default behavior *) merror)
numeric add
26 / 30
Transfer functions over this lattice now models Lua’s semantics. We try to stick as close to the manual as possible:
and transfer_arith_event clab info op event op1 op2 = let o1 ,o2 = VL.coerce_tonum op1 , VL.coerce_tonum op2 in mvl_join (if VL.may_be_number (VL.meet o1 o2) (* -- both operands are numeric? *) then red_return (VL.binop op o1 o2) (* -- ’+’ here is the primitive ’add ’ *) else merror) (getbinhandler op1 op2 event >>= fun h -> if VL.may_be_proc h then (* -- call the handler with both operands *) transfer_calls clab h [op1;op2] info >>= adjust_to_single else (* -- no handler available: default behavior *) merror)
numeric add
27 / 30
this cannot happen Things outside the over-approximations are impossible, e.g., unreachable code (lets us emit a warning)
28 / 30
We’ve built a prototype implementation
28 / 30
We’ve built a prototype implementation
http://jmid.github.io/luata-quickcheck/
29 / 30
–
to track array indices separately
–
for more precise type string tracking
–
to handle error idiom of Lua:
local function idiv(d1 , d2) if d2 == 0 then return nil , "division by zero" else local r = d1 % d2 local q = (d1 - r)/d2 return q, r end end
30 / 30
examples — but leaves room for improvement
work (Aarhus U, IBM TJ Watson, KAIST, . . . ) In retrospect:
(reconsider or do symbolic/modular analysis?)
sensitivity
30 / 30
examples — but leaves room for improvement
work (Aarhus U, IBM TJ Watson, KAIST, . . . ) In retrospect:
(reconsider or do symbolic/modular analysis?)
sensitivity
Thanks