BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE Sbastien - - PowerPoint PPT Presentation

binary level security
SMART_READER_LITE
LIVE PREVIEW

BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE Sbastien - - PowerPoint PPT Presentation

BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE Sbastien Bardin (CEA LIST) Joint work with Richard Bonichon, Robin David, Adel Djoudi & many other people Sbastien Bardin -- ISSISP 2017 | 1 ABOUT MY LAB @CEA Sbastien Bardin


slide-1
SLIDE 1

| 1 Sébastien Bardin -- ISSISP 2017

BINARY-LEVEL SECURITY: SEMANTIC ANALYSIS TO THE RESCUE

Sébastien Bardin (CEA LIST) Joint work with Richard Bonichon, Robin David, Adel Djoudi & many other people

slide-2
SLIDE 2

| 2 Sébastien Bardin -- ISSISP 2017

ABOUT MY LAB @CEA

slide-3
SLIDE 3

| 3

  • Binary-level security analysis: many applications, many challenges
  • Standard techniques (dynamic, syntactic) not enough
  • Formal methods can help … but must be strongly adapted
  • [Complement existing methods]
  • Need robustness, precision and scalability!
  • Acceptable to lose both correctness & completeness – in a controlled way
  • New challenges and variations, many things to do!
  • A tour on how formal methods can help
  • Explore and discover -- with Josselin Feist
  • Prove infeasibility or validity -- with Robin David
  • Simplify (not covered here) -- with Jonathan Salwan

Sébastien Bardin -- ISSISP 2017

IN A NUTSHELL

slide-4
SLIDE 4

| 4 Sébastien Bardin -- ISSISP 2017

OUTLINE

  • Why binary-level analysis?
  • Some background on source-level formal methods
  • The hard journey from source to binary
  • A few case-studies
  • Conclusion
  • Focus mostly on Symbolic Execution
  • Give hints for abstract Interpretation

Cover both

  • vulnerability detection
  • deobfuscation
slide-5
SLIDE 5

| 5 Sébastien Bardin -- ISSISP 2017

OUTLINE

  • Why binary-level analysis?
  • Some background on source-level formal methods
  • The hard journey from source to binary
  • A few case-studies
  • Conclusion
slide-6
SLIDE 6

| 6 Sébastien Bardin -- ISSISP 2017

BENEFITS

No source code More precise analysis Malware What for: vulnerabilities, reverse (malware, legacy), protection evaluation, etc.

slide-7
SLIDE 7

| 7 Sébastien Bardin -- ISSISP 2017

EXAMPLE: COMPILER BUG

Our goal here:

  • Check the code after compilation
slide-8
SLIDE 8

| 8 Sébastien Bardin -- ISSISP 2017

EXAMPLE: MALWARE COMPREHENSION The day after: malware comprehension

  • understand what has been going on
  • mitigate, fix and clean
  • improve defense

Highly challenging [obfuscation] APT: highly sophisticated attacks

  • Targeted malware
  • Written by experts
  • Attack: 0-days
  • Defense: stealth, obfuscation
  • Sponsored by states or mafia

USA elections: DNC Hack

slide-9
SLIDE 9

| 9 Sébastien Bardin -- ISSISP 2017

CHALLENGE: CORRECT DISASSEMBLY Basic reverse problem

  • aka model recovery
  • aka CFG recovery
slide-10
SLIDE 10

| 10 Sébastien Bardin -- ISSISP 2017

CAN BE TRICKY!

  • code – data
  • dynamic jumps (jmp eax)
slide-11
SLIDE 11

| 11 Sébastien Bardin -- ISSISP 2017

STATE-OF-THE-ART TOOLS ARE NOT ENOUGH

  • Static (syntactic): too fragile
  • Dynamic: too incomplete

Just add mov %eax,%ecx mov %ecx,%eax and break results

slide-12
SLIDE 12

| 12 Sébastien Bardin -- ISSISP 2017

[See later] CAN BECOME A NIGHTMARE WHEN OBFUSCATED

slide-13
SLIDE 13

| 13 Sébastien Bardin -- ISSISP 2017

EXAMPLE: VULNERABILITY DETECTION Find vulnerabilities before the bad guys

  • On the whole program
  • At binary-level
  • Know only the entry point and program

input format

slide-14
SLIDE 14

| 14 Sébastien Bardin -- ISSISP 2017

EXAMPLE: VULNERABILITY DETECTION

slide-15
SLIDE 15

| 15 Sébastien Bardin -- ISSISP 2017

CHALLENGE: In-depth exploration (example: use after free) Dynamic: not enough

  • Too incomplete
slide-16
SLIDE 16

| 16 Sébastien Bardin -- ISSISP 2017

BONUS: (MULTI-)ARCHITECTURE SUPPORT

slide-17
SLIDE 17

| 17 Sébastien Bardin -- ISSISP 2017

THE SITUATION

  • Binary-level security analysis is necessary
  • Binary-level security analysis is highly challenging (*)
  • Standard tools are not enough – experts need better help!

(*) i.e., more challenging than source code analysis

  • Static (syntactic): too fragile
  • Dynamic: too incomplete
slide-18
SLIDE 18

| 18 Sébastien Bardin -- ISSISP 2017

SOLUTION? BINARY-LEVEL SEMANTIC ANALYSIS Semantic preserved by compilation or

  • bfuscation

Can reason about sets of executions

slide-19
SLIDE 19

| 19 Sébastien Bardin -- ISSISP 2017

OUTLINE

  • Why binary-level analysis?
  • Some background on source-level formal methods
  • The hard journey from source to binary
  • A few case-studies
  • Conclusion
slide-20
SLIDE 20

| 20 Sébastien Bardin -- ISSISP 2017

BACK IN TIME: THE SOFTWARE CRISIS (1969)

slide-21
SLIDE 21

| 21 Sébastien Bardin -- ISSISP 2017

ABOUT FORMAL METHODS

Success in safety-critical

slide-22
SLIDE 22

| 22 Sébastien Bardin -- ISSISP 2017

A DREAM COME TRUE … IN CERTAIN DOMAINS

slide-23
SLIDE 23

| 23 Sébastien Bardin -- ISSISP 2017

A DREAM COME TRUE … IN CERTAIN DOMAINS (2)

slide-24
SLIDE 24

| 24

Semantics

  • Precise meaning for the domain of evaluation and the effect of instructions
  • Operational semantics = « interpreter »

Properties

  • From Invariants / reachability to safety/liveness/hyper-properties/…
  • On software: mostly invariants and reachability

Algorithms:

  • Historically: Weakest precondition, Abstract interpretation, model checking
  • Correctness: the analysis explores only behaviors of interest
  • Completeness: the analysis explores at least all behaviors of interest

Sébastien Bardin -- ISSISP 2017

OVERVIEW OF FORMAL METHODS

slide-25
SLIDE 25

| 25

Trends:

  • Frontier between techniques disappear
  • master abstraction (correct xor complete)
  • reduction to logic
  • sweet spots

Next:

  • AI: complete (can prove invariants) -- 1977
  • DSE: correct (can find bugs) -- 2005

Sébastien Bardin -- ISSISP 2017

OVERVIEW OF FORMAL METHODS

  • Representative
  • Industrial successes at

source-level

  • Adaptation to binary:

very different situations

slide-26
SLIDE 26

| 26 Sébastien Bardin -- ISSISP 2017

ABSTRACT INTERPRETATION

slide-27
SLIDE 27

| 27 Sébastien Bardin -- ISSISP 2017

ABSTRACT INTERPRETATION IN PRACTICE

skip

slide-28
SLIDE 28

| 28

Key points:

  • Infinite data: abstract domain
  • Path explosion: merge
  • Loops: widening

In practice:

  • Tradeoff between cost and precision
  • Tradeoff between generic & dedicated domains

It is sometimes simple and useful

  • taint, pointer nullness, typing

Big successes: Astrée, Frama-C, Clousot

Sébastien Bardin -- ISSISP 2017

ABSTRACT INTERPRETATION IN PRACTICE

slide-29
SLIDE 29

| 29 Sébastien Bardin -- ISSISP 2017

DYNAMIC SYMBOLIC EXECUTION (DSE, Godefroid 2005)

Perfect for intensive testing

  • Correct, relatively complete
  • No false alarm
  • Robust
  • Scale in some ways

// incomplete

slide-30
SLIDE 30

| 30 Sébastien Bardin -- ISSISP 2017

DSE: PATH PREDICATE COMPUTATION (DSE, Godefroid 2005)

slide-31
SLIDE 31

| 31 Sébastien Bardin -- ISSISP 2017

DSE: GLOBAL PROCEDURE (DSE, Godefroid 2005)

slide-32
SLIDE 32

| 32 Sébastien Bardin -- ISSISP 2017

ABOUT ROBUSTNESS (imo, the major advantage) « concretization »

  • Keep going when symbolic

reasoning fails

  • Tune the tradeoff genericity
  • cost
slide-33
SLIDE 33

| 33

Three key ingredients

  • Path predicate & solving
  • Path enumeration
  • C/S policy

Limits

  • #paths -> better heuristics (?), state merging, distributed search,

path pruning, adaptation to coverage objectives, etc.

  • solving cost -> preprocessing, caching, incremental solving,

aggressive concretization (good?) [wait for better solvers ]

  • Preconditions/postconditions/advanced stubs

Sébastien Bardin -- ISSISP 2017

DSE

slide-34
SLIDE 34

| 34 Sébastien Bardin -- ISSISP 2017

DSE: PATH PREDICATE MAY BE COMPLICATED

slide-35
SLIDE 35

| 35 Sébastien Bardin -- ISSISP 2017

DSE: SEARCH

  • Search heurstics matters
  • But no good choice (hint: DFS is often the worst)
  • The engine must provide flexibility
slide-36
SLIDE 36

| 36 Sébastien Bardin -- ISSISP 2017

DSE: SEARCH (2) Generic engine

  • Score each active prefix
  • Pick the best & expand
  • Easy encoding of many

heuristics

slide-37
SLIDE 37

| 37 Sébastien Bardin -- ISSISP 2017

C/S POLICIES

slide-38
SLIDE 38

| 38 Sébastien Bardin -- ISSISP 2017

C/S POLICIES (2)

  • C/S policy matters
  • But no good choice
  • The engine must provide flexibility
slide-39
SLIDE 39

| 39 Sébastien Bardin -- ISSISP 2017

C/S POLICIES (3) Generic engine

  • C/S specification
  • DSE parametrized by C/S
slide-40
SLIDE 40

| 40 Sébastien Bardin -- ISSISP 2017

OUTLINE

  • Why binary-level analysis?
  • Some background on source-level formal methods
  • The hard journey from source to binary
  • A few case-studies
  • Conclusion
slide-41
SLIDE 41

| 41 Sébastien Bardin -- ISSISP 2017

NOW: BINARY-LEVEL SECURITY

slide-42
SLIDE 42

| 42 Sébastien Bardin -- ISSISP 2017

THE HARD JOURNEY FROM SOURCE TO BINARY Wanted

  • robustness
  • precision
  • scale
slide-43
SLIDE 43

| 43

DSE is quite easy to adapt

  • thx to SMT solvers (arrays+bitvectors)
  • thx to concretization
  • yet, performance degrades

AI is much more complicated

  • Even for « normal » code
  • btw, cannot expect better than

source-level precision

Sébastien Bardin -- ISSISP 2017

ADAPTING DSE and AI to BINARY: two very different stories Problems

  • Low-level control: jump eax
  • Low-level data: memory
  • Low-level data: flags

Problem solved: multi-architecture

  • rely on some IR
slide-44
SLIDE 44

| 44 Sébastien Bardin -- ISSISP 2017

FULL DISCLOSURE: the BINSEC tool

Still very young! Semantic analysis for binary-level security

  • Help make sense of binary
  • more robust than syntactic
  • more exhaustive than dynamic

Some features

  • Help to recover a simple model
  • Identify feasible events (+ input)
  • Identify infeasible events (eg, protections)
  • Multi-architecture
slide-45
SLIDE 45

| 45 Sébastien Bardin -- ISSISP 2017

UNDER THE HOOD

slide-46
SLIDE 46

| 46 Sébastien Bardin -- ISSISP 2017

INTERMEDIATE REPRESENTATION

  • Concise
  • Well-defined
  • Clear, side-effect free
slide-47
SLIDE 47

| 47 Sébastien Bardin -- ISSISP 2017

INTERMEDIATE REPRESENTATION + simplifications

  • IR level
  • machine-instruction level
  • program level
slide-48
SLIDE 48

| 48 Sébastien Bardin -- ISSISP 2017

BINARY-LEVEL DSE (Godefroid) For deobfuscation

  • find new real paths
  • robust
  • still incomplete

« dynamic analysis on steroids »

slide-49
SLIDE 49

| 49 Sébastien Bardin -- ISSISP 2017

DSE COMPLEMENTS DYNAMIC ANALYSIS

slide-50
SLIDE 50

| 50 Sébastien Bardin -- ISSISP 2017

IN PRACTICE

Can recover useful semantic information

  • More precise disassembly
  • Exact semantic of instructions
  • Input of interest
slide-51
SLIDE 51

| 51 Sébastien Bardin -- ISSISP 2017

ABSTRACT INTERPRETATION IS VERY VERY HARD ON BINARY CODE Problems

  • Jump eax
  • memory
  • Bit resoning
slide-52
SLIDE 52

| 52 Sébastien Bardin -- ISSISP 2017

ISSUE: GLOBAL MEMORY Problems

  • Jump eax
  • memory
  • Bit resoning
slide-53
SLIDE 53

| 53 Sébastien Bardin -- ISSISP 2017

ISSUE: LACK of HIGH-LEVEL STRUCTURE High-level conditions translated into low-level flag predicates Condition on flags, not on register (nor stack) Problems

  • Jump eax
  • memory
  • Bit resoning
slide-54
SLIDE 54

| 54 Sébastien Bardin -- ISSISP 2017

LOW-LEVEL CONDITIONS

slide-55
SLIDE 55

| 55 Sébastien Bardin -- ISSISP 2017

LOW-LEVEL CONDITIONS

slide-56
SLIDE 56

| 56 Sébastien Bardin -- ISSISP 2017

SOLUTIONS? Precision refinement [Brauer, 2011] Degraded mode [Kinder, 2012]

slide-57
SLIDE 57

| 57 Sébastien Bardin -- ISSISP 2017

SOLUTIONS? (2)

slide-58
SLIDE 58

| 58 Sébastien Bardin -- ISSISP 2017

HIGH-LEVEL CONDITION RECOVERY

slide-59
SLIDE 59

| 59 Sébastien Bardin -- ISSISP 2017

STATIC ANALYSIS in BINSEC an overview

slide-60
SLIDE 60

| 60 Sébastien Bardin -- ISSISP 2017

OVERVIEW

Correct Complete Efficient Robust Static syntactic X X / -- OK X Dynamic OK XX OK OK DSE OK

  • X

OK Static semantic X OK / X X X

slide-61
SLIDE 61

| 61 Sébastien Bardin -- ISSISP 2017

OUTLINE

  • Why binary-level analysis?
  • Some background on source-level formal methods
  • The hard journey from source to binary
  • A few case-studies
  • Conclusion
slide-62
SLIDE 62

| 62 Sébastien Bardin -- ISSISP 2017

APPLICATION: VULNERABILITY DETECTION Find vulnerabilities before the bad guys

  • On the whole program
  • At binary-level
  • Know only the entry point and program

input format

slide-63
SLIDE 63

| 63 Sébastien Bardin -- ISSISP 2017

APPLICATION: VULNERABILITY DETECTION Many successful applications of pure DSE

  • SAGE @ Microsoft
  • Mayhem/VeriT @ ForallSecure
  • cf. Cyber Grand Challenge
slide-64
SLIDE 64

| 64 Sébastien Bardin -- ISSISP 2017

APPLICATION: VULNERABILITY DETECTION [SSPREW 2016, with VERIMAG] Here:

  • Focus on use-after-free
  • Combine static and DSE
slide-65
SLIDE 65

| 65 Sébastien Bardin -- ISSISP 2017

KEY IDEAS (Josselin Feist) A Pragmatic 2-step approach

  • Static: scale, not complete, not correct
  • Symbolic: correct, directed by static
  • Combination: scalable and correct
slide-66
SLIDE 66

| 66 Sébastien Bardin -- ISSISP 2017

EXPERIMENTAL EVALUATION

On these examples:

  • Better than DSE alone
  • Better than blackbox fuzzing
  • Better than greybox fuzzing with no seed
slide-67
SLIDE 67

| 67 Sébastien Bardin -- ISSISP 2017

APPLICATION: MALWARE DEOBFUSCATION [S&P 2017, with LORIA] The day after: malware comprehension

  • understand what has been going on
  • mitigate, fix and clean
  • improve defense

Goal: help malware comprehension

  • Reverse of heavily obfuscated code
  • Identify and simplify protections

APT: highly sophisticated attacks

  • Targeted malware
  • Written by experts
  • Attack: 0-days
  • Defense: stealth, obfuscation
  • Sponsored by states or mafia

USA elections: DNC Hack

slide-68
SLIDE 68

| 68 Sébastien Bardin -- ISSISP 2017

REVERSE CAN BECOME A NIGHTMARE (OBFUSCATION) Obfuscation: make a code

hard to reverse

  • self-modification
  • encryption
  • virtualization
  • code overlapping
  • opaque predicates
  • callstack tampering

Goal: help malware comprehension

  • Identify and simplify protections
  • Ideal = revert protections
slide-69
SLIDE 69

| 69 Sébastien Bardin -- ISSISP 2017

EXAMPLE: OPAQUE PREDICATE

Constant-value predicates (always true, always false)

  • dead branch points to spurious code
  • goal = waste reverser time & efforts
slide-70
SLIDE 70

| 70 Sébastien Bardin -- ISSISP 2017

EXAMPLE: STACK TAMPERING

Alter the standard compilation scheme: ret do not go back to call

  • hide the real target
  • return site may be spurious code
slide-71
SLIDE 71

| 71 Sébastien Bardin -- ISSISP 2017

STANDARD DISASSEMBLY TECHNIQUES ARE NOT ENOUGH

Static analysis

  • too fragile vs obfuscation
  • junk instr, missed instr.

Dynamic analysis

  • robust vs obfuscation
  • too incomplete
slide-72
SLIDE 72

| 72 Sébastien Bardin -- ISSISP 2017

DYNAMIC SYMBOLIC EXECUTION CAN HELP (Debray, Kruegel, …) For deobfuscation

  • find new real paths
  • robust
  • still incomplete

« dynamic analysis on steroids »

slide-73
SLIDE 73

| 73 Sébastien Bardin -- ISSISP 2017

YET … WHAT ABOUT INFEASIBILITY QUESTIONS? Prove that something is always true (resp. false)

Many such issues in reverse

  • is a branch dead?
  • does the ret always return to the call?
  • have i found all targets of a dynamic jump?

And more

  • does this malicious ret always go there?
  • does this expression always evaluate to 15?
  • does this self-modification always write this opcode?
  • does this self-modification always rewrite this instr.?

Not addressed by DSE

  • Cannot enumerate all paths
slide-74
SLIDE 74

| 74 Sébastien Bardin -- ISSISP 2017

OUR PROPOSAL: BACKWARD-BOUNDED SYMBOLIC EXECUTION

Insight 1: symbolic reasoning

  • precision
  • But: need finite #paths

Insight 2: backward-bounded

  • pre_k(c)=0 => c is infeasible
  • finite #paths
  • efficient, depends on k
  • But: backward on jump eax?

Insight 3: dynamic partial CFG

  • solve (partially) dyn. jumps
  • robustness

False negative (FN)

  • can miss infeasibility
  • why: k too small (miss /\-constraints)

False positive (FP)

  • wrongly assert infeasibility
  • why: CFG too partial (miss \/-constraints)

Low FP/FN rates in practice

  • ground truth xp
slide-75
SLIDE 75

| 75 Sébastien Bardin -- ISSISP 2017

FORWARD & BACKWARD SYMBOLIC EXECUTION

slide-76
SLIDE 76

| 76 Sébastien Bardin -- ISSISP 2017

EXPERIMENTAL EVALUATION

  • Controlled experiments (ground truth) precision
  • Large-scale experiment: packers scalability, robustness
  • Case-study: X-tunnel malware usefulness
slide-77
SLIDE 77

| 77 Sébastien Bardin -- ISSISP 2017

CONTROLLED EXPERIMENTS

  • Goal = assess the precision of the technique
  • ground truth value
  • Experiment 1: opaque predicates (o-llvm)
  • 100 core utils, 5x20 obfuscated codes
  • k=16: 3.46% error, no false negative
  • robust to k
  • efficient: 0.02s / query
  • Experiment 2: stack tampering (tigress)
  • 5 obfuscated codes, 5 core utils
  • almost all genuine ret are proved (no false positive)
  • many malicious ret are proved « single-targets »
  • Very precise résults
  • Seems efficient
slide-78
SLIDE 78

| 78 Sébastien Bardin -- ISSISP 2017

CASE-STUDY: PACKERS

Packers: legitimate software protection tools (basic malware: the sole protection)

slide-79
SLIDE 79

| 79 Sébastien Bardin -- ISSISP 2017

CASE-STUDY: PACKERS (fun facts)

slide-80
SLIDE 80

| 80 Sébastien Bardin -- ISSISP 2017

CASE-STUDY: THE XTUNNEL MALWARE (part of DNC hack) Two heavily obfuscated samples

  • Many opaque predicates

Goal: detect & remove protections

  • Identify 50% of code as spurious
  • Fully automatic, < 3h
slide-81
SLIDE 81

| 81 Sébastien Bardin -- ISSISP 2017

CASE-STUDY: THE XTUNNEL MALWARE (fun facts)

  • Protection seems to rely only on opaque predicates
  • Only two families of opaque predicates
  • Yet, quite sophisticated
  • riginal OPs
  • interleaving between payload and OP computation
  • sharing among OP computations
  • possibly long dependencies chains (avg 8.7, upto 230)
slide-82
SLIDE 82

| 82 Sébastien Bardin -- ISSISP 2017

SECURITY ANALYSIS: COUNTER-MEASURES (and mitigations)

  • Long dependecy chains (evading the bound k)
  • Not always requires the whole chain to conclude!
  • Can use a more flexible notion of bound (data-dependencies, formula size)
  • Hard-to-solve predicates (causing timeouts)
  • A time-out is already a valuable information
  • Opportunity to find infeasible patterns (then matching), or signatures
  • Tradeoff between performance penalty vs protection focus
  • Note: must be input-dependent, otherwise removed by standard DSE optimizations
  • Anti-dynamic tricks (fool initial dynamic recovery)
  • Can use the appropriate mitigations
  • Note: some tricks can be circumvent by symbolic reasoning

Current state-of-the-art

  • push the cat-and-mouse game further
  • raise the bar for malware designers

Also

  • « Probabilistic obfuscation »
  • Covert channels
slide-83
SLIDE 83

| 83 Sébastien Bardin -- ISSISP 2017

OUTLINE

  • Why binary-level analysis?
  • Some background on source-level formal methods
  • The hard journey from source to binary
  • A few case-studies
  • Conclusion
slide-84
SLIDE 84

| 84 Sébastien Bardin -- ISSISP 2017

SUMMARY

Feasibility Infeasibility Efficient Robust Static syntactic X X OK X Dynamic

  • X

OK OK DSE OK X X OK Static semantic X OK X X BB-DSE X OK (fp,fn) OK OK

slide-85
SLIDE 85

| 85 Sébastien Bardin -- ISSISP 2017

CONCLUSION

  • Semantic analysis can change the game of binary-level security
  • Current syntactic and dynamic methods are not enough
  • [complement existing approaches and help the expert, not replace everything]
  • Explore more, Prove invariance, Simplify
  • Yet, challenging to adapt from source-level safety-critical
  • Need robustness, precision and scale!!
  • « Correct-enough » and « Complete-enough » are enough (room for better definition!)
  • DSE much easier to adapt than AI
  • New challenges and variations, so much to do
slide-86
SLIDE 86

| 86 Sébastien Bardin -- ISSISP 2017

FUTURE DIRECTION

slide-87
SLIDE 87

Commissariat à l’énergie atomique et aux énergies alternatives Institut List | CEA SACLAY NANO-INNOV | BAT. 861 – PC142 91191 Gif-sur-Yvette Cedex - FRANCE www-list.cea.fr Établissement public à caractère industriel et commercial | RCS Paris B 775 685 019