Augmenting Stack Overflow with API Usage Patterns Mined from GitHub - - PowerPoint PPT Presentation

augmenting stack overflow with api usage patterns mined
SMART_READER_LITE
LIVE PREVIEW

Augmenting Stack Overflow with API Usage Patterns Mined from GitHub - - PowerPoint PPT Presentation

Augmenting Stack Overflow with API Usage Patterns Mined from GitHub Anastasia Reinhart 1,2 * Tianyi Zhang 1 Mihir Marthur 1 Miryung Kim 1 1 University of California, Los Angeles 2 George Fox University * Work done as a research intern at UCLA. 1


slide-1
SLIDE 1

Augmenting Stack Overflow with API Usage Patterns Mined from GitHub

Anastasia Reinhart1,2* Tianyi Zhang1 Mihir Marthur1 Miryung Kim1

1University of California, Los Angeles 2George Fox University

1

* Work done as a research intern at UCLA.

slide-2
SLIDE 2

Using APIs properly is becoming a key challenge

2

e.g., JDK APIs Android SDK

slide-3
SLIDE 3

The Status Quo of Learning APIs

3

Developers often search online for code examples to learn APIs [Sadowski et al. 2016]

slide-4
SLIDE 4

The Limitation of Online Code Examples

  • Programmers can only inspect a handful of search results.

[Brandt et al., 2009, Starke et al., 2009, Duala-Ekoko and Robillard, 2012]

  • Individual code examples may suffer from

– insecure coding practices [Fischer et al., 2017] – unchecked obsolete usage [Zhou and Walker, 2016] – low readability [Treude and Robillard, 2017]

4

slide-5
SLIDE 5

The Limitation of Online Code Examples

  • Programmers can only inspect a handful of search results.

[Brandt et al., 2009, Starke et al., 2009, Duala-Ekoko and Robillard, 2012]

  • Individual code examples may suffer from

– insecure coding practices [Fischer et al., 2017] – unchecked obsolete usage [Zhou and Walker, 2016] – low readability [Treude and Robillard, 2017]

5

A recent study shows that 31% of SO posts have potential API usage violations.

Zhang et al., Are Online Code Examples Reliable? A Study of API Misuse on Stack Overflow, ICSE 2018 Dataset: http://web.cs.ucla.edu/~tianyi.zhang/examplecheck.html

slide-6
SLIDE 6

Missing If Checks

6

https://stackoverflow.com/questions/21983867

slide-7
SLIDE 7

Missing If Checks

7

This example throws NoSuchElementException. You should not call firstKey on an empty TreeMap.

slide-8
SLIDE 8

Missing API Calls

8

https://stackoverflow.com/questions/12100651

slide-9
SLIDE 9

Missing API Calls

9

https://stackoverflow.com/questions/12100651

This example throws BufferUnderflowException. You must call ByteBuffer.flip() to reset the internal buffer.

slide-10
SLIDE 10

ExampleCheck: Augmenting Stack Overflow with API Usage Patterns Mined from GitHub

10

slide-11
SLIDE 11

11

Now available at Chrome Web Store!

slide-12
SLIDE 12

Stack Overflow

ExampleCheck Workflow

12

Web Browser

... <code> … </code> …

Code Extraction Pop up Generation

ExampleCheck Server API usage mining

  • n GitHub

(offline) API Misuse Detection

API misuse

slide-13
SLIDE 13

API Usage Mining from GitHub [ICSE 2018]

13

Code Search Program Slicing Call Sequence Extraction Structured API call sequences Frequent Sequence Mining SMT-based Guard Condition Mining API usage patterns 380K Java Repositories on GitHub

1 2 3

slide-14
SLIDE 14

Insight 1: Mining a Large Code Corpus

  • Our code corpus includes 380K GitHub projects with at least 100

revisions and 2 contributors.

14

Dyer et al. Boa: A language and infrastructure for analyzing ultra-large-scale software

  • repositories. ICSE 2013.

Code Search Program Slicing Call Sequence Extraction Structured API call sequences Frequent Sequence Mining SMT-based Guard Condition Mining API usage patterns 380K Java Repositories on GitHub

1 2 3

slide-15
SLIDE 15

Insight 2: Removing Irrelevant Statements via Program Slicing

  • We perform backward and forward slicing to identify data- and

control-dependent statements to an API method of interest.

15

Code Search Program Slicing Call Sequence Extraction Structured API call sequences Frequent Sequence Mining SMT-based Guard Condition Mining API usage patterns 380K Java Repositories on GitHub

1 2 3

slide-16
SLIDE 16

16

void initInterfaceProperties(String temp, File dDir) { if(!temp.equals("props.txt")) { log.error("Wrong Template."); return; } // load default properties FileInputStream in = new FileInputStream(temp); Properties prop = new Properties(); prop.load(in); ... init properties ... // write to the property file String fPath=dDir.getAbsolutePath()+"/interface.prop"; File file = new File(fPath); if(!file.exists()) { file.createNewFile(); } FileOutputStream out = new FileOutputStream(file); prop.store(out, null); in.close(); }

GitHub example of File.createNewFile The focal API method

slide-17
SLIDE 17

17

void initInterfaceProperties(String temp, File dDir) { if(!temp.equals("props.txt")) { log.error("Wrong Template."); return; } // load default properties FileInputStream in = new FileInputStream(temp); Properties prop = new Properties(); prop.load(in); ... init properties ... // write to the property file String fPath=dDir.getAbsolutePath()+"/interface.prop"; File file = new File(fPath); if(!file.exists()) { file.createNewFile(); } FileOutputStream out = new FileOutputStream(file); prop.store(out, null); in.close(); }

Data dependency up to one hop, i.e., direct dependency The focal API method

data control

slide-18
SLIDE 18

Insight 3: Capture the Semantics of API Usage

  • It is important to capture the temporal ordering, enclosing control

structures, and appropriate guard conditions of API calls.

18

Code Search Program Slicing Call Sequence Extraction Structured API call sequences Frequent Sequence Mining SMT-based Guard Condition Mining API usage patterns 380K Java Repositories on GitHub

1 2 3

slide-19
SLIDE 19

Insight 3: Capture the Semantics of API Usage

19

new File (String); try {; new FileInputStream(File)@arg0.exists(); } catch (IOException) {; } Grammar of Structured Call Sequences

slide-20
SLIDE 20

Insight 3: Capture the Semantics of API Usage

20

new File (String); try {; new FileInputStream(File)@arg0.exists(); } catch (IOException) {; } Grammar of Structured Call Sequences

slide-21
SLIDE 21

Insight 3: Capture the Semantics of API Usage

21

new File (String); try {; new FileInputStream(File)@arg0.exists(); } catch (IOException) {; } Grammar of Structured Call Sequences

slide-22
SLIDE 22

Insight 3: Capture the Semantics of API Usage

22

new File (String); try {; new FileInputStream(File)@arg0.exists(); } catch (IOException) {; } Grammar of Structured Call Sequences

slide-23
SLIDE 23

Insight 4: SMT-based Guard Condition Mining

  • GitHub developers may write the same predicate in different

ways.

23

Code Search Program Slicing Call Sequence Extraction Structured API call sequences Frequent Sequence Mining SMT-based Guard Condition Mining 380K Java Repositories on GitHub

1 2 3

slide-24
SLIDE 24

Insight 4: SMT-based Guard Condition Mining

  • We group guard conditions based on their logic equivalence.
  • We use Z3 to prove the logic equivalence of guard conditions.
  • p ⇔ q is valid iff. ¬((¬p ∨ q) ∧ (p ∨ ¬q)) is UNSAT.

24

Two equivalent but syntactically different guard conditions for substring(int): arg0>=0 && arg0<=rcv.length() ⇔ arg0>-1 && arg0<rcv.length()+1

slide-25
SLIDE 25

API Misuse Detection

  • Contrast SO code snippets with mined API usage patterns

automatically.

25

Java Stack Overflow snippets Call Sequence Extraction Guard Condition Check Structured API call sequences Patterns API usage violations query pattern(s) Temporal Ordering Check

slide-26
SLIDE 26

Extract Structured Call Sequence

26

JsonObject obj = root.getAsJsonObject(); JsonElement match_number =

  • bj.get("match_number");

... System.out.println( match_number.getAsString());

SO code example [Post 29860000]

slide-27
SLIDE 27

Extract Structured Call Sequence

JsonObject obj = root.getAsJsonObject(); JsonElement match_number =

  • bj.get("match_number");

... System.out.println( match_number.getAsString());

27

getAsJsonObject()@true; get(String)@true; ... getAsString()@true; println(String)@true

extract SO code example [Post 29860000] Structured Call Sequence

slide-28
SLIDE 28

Extract Structured Call Sequence

28

extract

JsonObject obj = root.getAsJsonObject(); JsonElement match_number =

  • bj.get("match_number");

... System.out.println( match_number.getAsString());

SO code example [Post 29860000]

getAsJsonObject()@true; get(String)@true; ... getAsString()@true; println(String)@true

Structured Call Sequence

slide-29
SLIDE 29

API Usage Pattern for JsonElement.getAsString()

29

getAsJsonObject()@true; get(String)@true; ... getAsString()@true; println(String)@true

Structured Call Sequence

try {; getAsString()@rcv.isJsonPrimitive(); }; catch (Exception) {; };

An API Usage Pattern of JsonElement.getAsString()

slide-30
SLIDE 30

Temporal Ordering Check

30

missing a try-catch block!

getAsJsonObject()@true; get(String)@true; ... getAsString()@true; println(String)@true

Structured Call Sequence

try {; getAsString()@rcv.isJsonPrimitive(); }; catch (Exception) {; };

An API Usage Pattern of JsonElement.getAsString()

slide-31
SLIDE 31

Guard Condition Check

31

true ⇏ rcv.isJsonPrimitive() and thus incorrect guard condition!

getAsJsonObject()@true; get(String)@true; ... getAsString()@true; println(String)@true

Structured Call Sequence

try {; getAsString()@rcv.isJsonPrimitive(); }; catch (Exception) {; };

An API Usage Pattern of JsonElement.getAsString()

slide-32
SLIDE 32

Evaluation Results [ICSE 2018]

  • 31% of 217K SO posts contain API usage violations.
  • 72% of sampled posts with violations may cause program

crashes, resource leaks, etc.

  • Highly-voted posts are not necessarily more reliable in terms
  • f correct API usage.

32

slide-33
SLIDE 33

Live Demo

33

slide-34
SLIDE 34

Summary

  • Alert users about potential API usage violations in Stack

Overflow using patterns mined from 380K GitHub projects

  • Expand the scope of APIs beyond 100 Java and Android APIs
  • Automate the end-to-end pipeline of API usage mining to keep

API usage patterns up-to-date

34

Tool: https://chrome.google.com/webstore/detail/examplecheck/ amliempebckaiaklimcpopomlnklkioe Dataset: http://web.cs.ucla.edu/~tianyi.zhang/examplecheck.html

slide-35
SLIDE 35

Q&A

35