Stack Overflow Considered Harmful? The Impact of Copy&Paste on - - PowerPoint PPT Presentation

stack overflow considered harmful
SMART_READER_LITE
LIVE PREVIEW

Stack Overflow Considered Harmful? The Impact of Copy&Paste on - - PowerPoint PPT Presentation

Stack Overflow Considered Harmful? The Impact of Copy&Paste on Android Application Security F. Fischer * , K. Bttinger * , H.Xiao * , C. Stransky , Y. Acar , M. Backes , S. Fahl * Fraunhofer AISEC CISPA, Saarland


slide-1
SLIDE 1

Stack Overflow Considered Harmful?

The Impact of Copy&Paste on Android Application Security

  • F. Fischer*, K. Böttinger*, H.Xiao*, C. Stransky†, Y. Acar†, M. Backes†, S. Fahl†

*Fraunhofer AISEC †CISPA, Saarland University

Presentation by Kevin Liao

slide-2
SLIDE 2

Code copypasta insecure?

slide-3
SLIDE 3

How prolific are security-related code snippets from Stack Overflow in Android applications?

Research question

slide-4
SLIDE 4

Rather than discuss results at end… Present results first, then analyze the methodology Does the methodology convince us of the results?

This talk

slide-5
SLIDE 5

The high-level approach

slide-6
SLIDE 6

The high-level approach

Extract security-related snippets

slide-7
SLIDE 7

The high-level approach

Security analysis

slide-8
SLIDE 8

The high-level approach

Identify code reuse

slide-9
SLIDE 9

Results: Alarming (potentially)

slide-10
SLIDE 10

Extracted snippets

30 million posts 2 million Android-related posts ~4,000 security-related snippets

slide-11
SLIDE 11

Security classification

Secure 70% Insecure 30%

slide-12
SLIDE 12

Prevalence of code reuse

1.3 million free apps 2,673 secure snippets 1,161 insecure snippets

slide-13
SLIDE 13

Prevalence of code reuse

slide-14
SLIDE 14

Prevalence of code reuse

slide-15
SLIDE 15

Prevalence of code reuse

slide-16
SLIDE 16

Apps with security-related snippets

Secure 2% Insecure 98%

slide-17
SLIDE 17

Top-offender? TLS…

Other 8% Empty TrustManager 92%

  • 180k apps w/ empty

Trust Manager

  • Deactivates server

verification

  • Can lead to MITM
slide-18
SLIDE 18

Next top-offender? Symmetric crypto

Other 91% AES/ECB 9%

  • 18k apps with AES in

ECB mode

  • Hard-coded keys
slide-19
SLIDE 19

Next top-offender? Symmetric crypto

Other 91% AES/ECB 9%

  • 18k apps with AES in

ECB mode

  • Hard-coded keys
slide-20
SLIDE 20

Do insecure snippets have lower scores?

slide-21
SLIDE 21

Do insecure snippets wit

with a a war arnin ing

have lower scores?

slide-22
SLIDE 22

Are high view count/score snippets copy&pasted more?

slide-23
SLIDE 23

Are high view count/score snippets

wit with a a war arnin ing copy&pasted le less ss?

slide-24
SLIDE 24

Discussion of methodology

Extract security-related snippets

slide-25
SLIDE 25

Extract security related-snippets

  • 1. Get all posts with ‘Android’ tag
  • 2. Filter code-snippets that use security APIs
  • TLS/SSL
  • Symmetric/asymmetric crypto
  • RNG
  • Signatures
  • Message digests
  • Authentication/access control
slide-26
SLIDE 26

Discuss snippet extraction

slide-27
SLIDE 27

Discussion of methodology

Security analysis

slide-28
SLIDE 28

Security analysis

  • 1. Manually label snippets as secure or insecure
  • 2. Train a binary classifier to automatically

determine security/insecurity of all snippets

slide-29
SLIDE 29

tl;dr for labeling rules

  • SSL/TLS: Use TLS v1.1 or greater; don’t use old

crypto

  • Symmetric: Don’t use old crypto; don’t use ECB;

don’t use static/zeroed/derived keys or IVs

  • Asymmetric: Use >=2048 bit RSA; use >= 244 bit

ECC

  • Hashing: Don’t use MD-family
  • RNG: Use crypto-secure RNG; securely random

seed

slide-30
SLIDE 30

Security score of training set

slide-31
SLIDE 31

Train SVM binary classifier

slide-32
SLIDE 32

Feature selection

  • Based on tf-idf
  • “The features rely merely on the vocabulary level of

input code snippets, without even understanding how they are functioning.”

  • Claim: Can be more accurate and more scalable

than rule-based methods

slide-33
SLIDE 33

https://chrisalbon.com/machine_learning/preprocessing_text/tf-idf/

slide-34
SLIDE 34

Security classification

Secure 70% Insecure 30%

slide-35
SLIDE 35

Discuss security classification

slide-36
SLIDE 36

Discussion of methodology

Identify code reuse

slide-37
SLIDE 37

Identify code reuse

  • 1. Transform source code and Dalvik executables

into same IR

  • 2. Identify similar code snippets using Program

Dependency Graphs (PDGs)

slide-38
SLIDE 38

IR transformation

Source code Dalvik executable Typed AST PPA Bytecode Lift

slide-39
SLIDE 39

Program Dependency Graphs

  • Generate PDG for each method
  • Nodes: Statements in methods
  • Edges: Data and control dependence
slide-40
SLIDE 40

Dependency edges

Data: S2 depends on S1, since A read in S2. Control: S2 depends on A, since A determines S2’s execution.

slide-41
SLIDE 41

Examples of PDGs

slide-42
SLIDE 42

Prevalence of code reuse

slide-43
SLIDE 43

Discuss identification of code reuse

slide-44
SLIDE 44

Final discussion

  • About results?
  • About methodology?
  • About future work?