In Search Of Shotgun Parsers
Katie Underwood
University of Calgary
Michael Locasto
SRI International May 25, 2016
In Search Of Shotgun Parsers Katie Underwood University of Calgary - - PowerPoint PPT Presentation
In Search Of Shotgun Parsers Katie Underwood University of Calgary Michael Locasto SRI International May 25, 2016 Overview Context Defining The Shotgun Parser Tainted Path Length In Android Applications Our Definition In The Wild Future
University of Calgary
SRI International May 25, 2016
Context Defining The Shotgun Parser Tainted Path Length In Android Applications Our Definition In The Wild Future Work
2
Input use and recognition intermixed throughout!
4
we need to know what we’re looking for!
we see one?
static taint analysis of control flow graphs
5
How far does untrusted data propagate through the code?
Is input data fully validated before being used?
How much program state is affected by properties 1 and 2?
6
How far does untrusted data propagate through the code?
Is input data fully validated before being used?
How much program state is affected by properties 1 and 2?
6
How far does untrusted data propagate through the code?
Is input data fully validated before being used?
How much program state is affected by properties 1 and 2?
6
Let G be the static control-flow graph which describes Let Pn be the connected subgraph induced by the vertices of G tainted by n , where d Pn d G Let S Pi i be the set of all taint-induced subgraphs on G
7
describes A Let Pn be the connected subgraph induced by the vertices of G tainted by n , where d Pn d G Let S Pi i be the set of all taint-induced subgraphs on G
7
describes A
by the vertices of G tainted by n ∈ N, where d(Pn) ≤ d(G) Let S Pi i be the set of all taint-induced subgraphs on G
7
describes A
by the vertices of G tainted by n ∈ N, where d(Pn) ≤ d(G)
taint-induced subgraphs on G
7
Shotgun parser indicators:
→ Indicates input n not handled in principled
manner
Large S
Evidence for presence of multiple shotgun parsers in
8
Shotgun parser indicators:
→ Indicates input n not handled in principled
manner
→ Evidence for presence of multiple shotgun
parsers in A
8
an arbitrary piece of code is “fully recognized”
handling of specific data types
9
For example:
O, you must do 5 reads of 4 bytes each, then write 20 bytes in a specific
memory events which take place after input is received
10
For example:
O, you must do 5 reads of 4 bytes each, then write 20 bytes in a specific
memory events which take place after input is received
10
Let Pn now be a weighted graph, where each edge E x y corresponds to the number of variables tainted by n after node x
11
edge E(x, y) corresponds to the number of variables tainted by n after node x
11
Shotgun parser indicators:
compared to total number of variables → Indicates untrusted input affects significant
proportion of program state
Areas of Pn where edge weight increases may merit further study
Allows us to triage program statements / methods for further analysis
12
Shotgun parser indicators:
compared to total number of variables → Indicates untrusted input affects significant
proportion of program state
may merit further study → Allows us to triage program statements /
methods for further analysis
12
The “worst case” shotgun parser exhibits all three properties in abundance!
13
The “worst case” shotgun parser exhibits all three properties in abundance!
13
applications
through the LangSec lens
15
Jimple CFG for one module of the classic game “Snake”
statement-level control flow graphs
path corresponding to each source
intermediate representation
16
framework for Android
Software Engineering Group at Paderborn University/ TU Darmstadt
https://blogs.uni-paderborn.de/sse/tools/flowdroid/
We Add:
paths, not only those terminating in a sink
each taint source
each taint
handler functions to measure input path length
17
Each time a taint is propagated, our custom handler is invoked:
18
19
20
21
23
23
23
24
24
24
24
24
Observations:
(attempted) validation, but input is also passed elsewhere
different functions before being read into a native data structure
25
26
Observations:
before processing, but not used along the way
intermixing, however...
payload!
27
Parsing Done Right!
the Ragel compiler)
28
29
29
29
29
based on our definition for other software ecosystems
for common types (characterize “recognition”)
vulnerabilities
31
We gratefully acknowledge Steven Arzt from the Secure Software Engineering Group at TU Darmstadt for his ongoing assistance with technical questions about FlowDroid via the Soot mailing list
32
parsers...and not all shotgun parsers are necessarily vulnerable
not just an issue of attack surface, but being error-prone
to do the parsing - why aren’t you validating as soon as data enters your software?
33
→ FlowDroid dummy main method - necessary due to Android Lifecycle
→ Jimple is an intermediate representation
memory intensive!
→ And we had time constraints...
34