Are Code Examples on an Online Q&A Forum Reliable? A Study of - - PowerPoint PPT Presentation

are code examples on an online q a forum reliable
SMART_READER_LITE
LIVE PREVIEW

Are Code Examples on an Online Q&A Forum Reliable? A Study of - - PowerPoint PPT Presentation

Are Code Examples on an Online Q&A Forum Reliable? A Study of API Misuse on Stack Overflow Tianyi Zhang 1 ,Ganesha Upadhyaya 2 , Anastasia Reinhart 3 , Hridesh Rajan 2 , Miryung Kim 1 1 University of California, Los Angeles 2 Iowa State


slide-1
SLIDE 1

Are Code Examples on an Online Q&A Forum Reliable?

Tianyi Zhang1,Ganesha Upadhyaya2, Anastasia Reinhart3, Hridesh Rajan2, Miryung Kim1

1University of California, Los Angeles 2Iowa State University 3George Fox University

1

A Study of API Misuse on Stack Overflow

slide-2
SLIDE 2

Using APIs properly is a key challenge in Programming

2

e.g., Java APIs

slide-3
SLIDE 3

The Status Quo of Learning APIs

3

Developers often search online for code examples to learn APIs [Sadowski et al. 2016]

slide-4
SLIDE 4

The Limitation of Online Code Examples

  • Programmers can only inspect a handful of search results.

[Brandt et al. 2009, Starke et al. 2009, Duala-Ekoko and Robillard 2012]

  • Individual code examples may suffer from

– insecure coding practices [Fischer et al. 2017] – unchecked obsolete usage [Zhou and Walker 2016] – low readability [Treude and Robillard 2017]

4

slide-5
SLIDE 5

“How do I write data to a file using FileChannel?”

5

slide-6
SLIDE 6

“How do I write data to a file using FileChannel?”

6

slide-7
SLIDE 7

“How do I write data to a file using FileChannel?”

7

This example forgets to close the FileChannel object properly.

slide-8
SLIDE 8

“How do I write data to a file using FileChannel?”

8

slide-9
SLIDE 9

“How do I write data to a file using FileChannel?”

9

This example forgets to handle potential exceptions such as IOException and FileNotFoundException.

slide-10
SLIDE 10

Research Questions

  • RQ1. Is API misuse prevalent on Stack Overflow?
  • RQ2. Are highly voted posts more reliable?
  • RQ3. What are the characteristics of API misuse?

10

slide-11
SLIDE 11

Outline

  • Problem Statement
  • API usage mining from 380K Java Projects on GitHub
  • An Empirical Study of API Misuse on Stack Overflow

11

slide-12
SLIDE 12

API Usage Mining from GitHub

  • We contrast SO snippets with API usage patterns mined from

380K GitHub projects.

12

Code Search Program Slicing Call Sequence Extraction Structured API call sequences Frequent Sequence Mining SMT-based Guard Condition Mining API usage patterns 380K Java Repositories on GitHub

1 2 3

slide-13
SLIDE 13

Insight 1: Mining a Large Code Corpus

  • Our code corpus includes 380K GitHub projects with at least 100

revisions and 2 contributors.

13

Code Search Program Slicing Call Sequence Extraction Structured API call sequences Frequent Sequence Mining SMT-based Guard Condition Mining API usage patterns 380K Java Repositories on GitHub

1 2 3

Dyer et al. Boa: A language and infrastructure for analyzing ultra-large-scale software

  • repositories. ICSE 2013.
slide-14
SLIDE 14

Insight 2: Removing Irrelevant Statements via Program Slicing

  • We perform backward and forward slicing to identify data- and

control-dependent statements to an API method of interest.

14

Code Search Program Slicing Call Sequence Extraction Structured API call sequences Frequent Sequence Mining SMT-based Guard Condition Mining API usage patterns 380K Java Repositories on GitHub

1 2 3

slide-15
SLIDE 15

15

void initInterfaceProperties(String temp, File dDir) { if(!temp.equals("props.txt")) { log.error("Wrong Template."); return; } // load default properties FileInputStream in = new FileInputStream(temp); Properties prop = new Properties(); prop.load(in); ... init properties ... // write to the property file String fPath=dDir.getAbsolutePath()+"/interface.prop"; File file = new File(fPath); if(!file.exists()) { file.createNewFile(); } FileOutputStream out = new FileOutputStream(file); prop.store(out, null); in.close(); }

GitHub example of File.createNewFile The focal API method

slide-16
SLIDE 16

16

void initInterfaceProperties(String temp, File dDir) { if(!temp.equals("props.txt")) { log.error("Wrong Template."); return; } // load default properties FileInputStream in = new FileInputStream(temp); Properties prop = new Properties(); prop.load(in); ... init properties ... // write to the property file String fPath=dDir.getAbsolutePath()+"/interface.prop"; File file = new File(fPath); if(!file.exists()) { file.createNewFile(); } FileOutputStream out = new FileOutputStream(file); prop.store(out, null); in.close(); }

Data dependency up to one hop, i.e., direct dependency The focal API method

data control

slide-17
SLIDE 17

17

void initInterfaceProperties(String temp, File dDir) { if(!temp.equals("props.txt")) { log.error("Wrong Template."); return; } // load default properties FileInputStream in = new FileInputStream(temp); Properties prop = new Properties(); prop.load(in); ... init properties ... // write to the property file String fPath=dDir.getAbsolutePath()+"/interface.prop"; File file = new File(fPath); if(!file.exists()) { file.createNewFile(); } FileOutputStream out = new FileOutputStream(file); prop.store(out, null); in.close(); }

Data dependency up to two hops The focal API method

data control

slide-18
SLIDE 18

Insight 3: Capture Semantics Info in API Usage

  • It is important to capture the temporal ordering, enclosing

control structures, and appropriate guard conditions of API calls.

18

Code Search Program Slicing Call Sequence Extraction Structured API call sequences Frequent Sequence Mining SMT-based Guard Condition Mining API usage patterns 380K Java Repositories on GitHub

1 2 3

slide-19
SLIDE 19

Insight 3: Capture Semantics Info in API Usage

  • It is important to capture the temporal ordering, enclosing

control structures, and appropriate guard conditions of API calls.

19

Code Search Program Slicing Call Sequence Extraction Structured API call sequences Frequent Sequence Mining SMT-based Guard Condition Mining API usage patterns 380K Java Repositories on GitHub

1 2 3 new File (String); try {; new FileInputStream(File)@arg0.exists(); } catch (IOException) {; }

slide-20
SLIDE 20

Insight 3: Capture Semantics Info in API Usage

  • It is important to capture the temporal ordering, enclosing

control structures, and appropriate guard conditions of API calls.

20

Code Search Program Slicing Call Sequence Extraction Structured API call sequences Frequent Sequence Mining SMT-based Guard Condition Mining API usage patterns 380K Java Repositories on GitHub

1 2 3 new File (String); try {; new FileInputStream(File)@arg0.exists(); } catch (IOException) {; }

slide-21
SLIDE 21

Insight 3: Capture Semantics Info in API Usage

  • It is important to capture the temporal ordering, enclosing

control structures, and appropriate guard conditions of API calls.

21

Code Search Program Slicing Call Sequence Extraction Structured API call sequences Frequent Sequence Mining SMT-based Guard Condition Mining API usage patterns 380K Java Repositories on GitHub

1 2 3 new File (String); try {; new FileInputStream(File)@arg0.exists(); } catch (IOException) {; }

slide-22
SLIDE 22

Insight 3: Capture Semantics Info in API Usage

  • It is important to capture the temporal ordering, enclosing

control structures, and appropriate guard conditions of API calls.

22

Code Search Program Slicing Call Sequence Extraction Structured API call sequences Frequent Sequence Mining SMT-based Guard Condition Mining API usage patterns 380K Java Repositories on GitHub

1 2 3 new File (String); try {; new FileInputStream(File)@arg0.exists(); } catch (IOException) {; }

slide-23
SLIDE 23

Insight 4: Variations in Guard Conditions

  • Guard conditions are canonicalized and grouped based on logical

equivalence.

23

Two equivalent guard conditions for String.substring: arg0>=0 && arg0<=rcv.length() ⇔ arg0>-1 && arg0<rcv.length()+1 Code Search Program Slicing Call Sequence Extraction Structured API call sequences Frequent Sequence Mining SMT-based Guard Condition Mining 380K Java Repositories on GitHub

1 2 3

slide-24
SLIDE 24

Insight 4: Variations in Guard Conditions

  • Guard conditions are canonicalized and grouped based on logical

equivalence.

24

Two equivalent guard conditions for String.substring: arg0>=0 && arg0<=rcv.length() ⇔ arg0>-1 && arg0<rcv.length()+1 Code Search Program Slicing Call Sequence Extraction Structured API call sequences Frequent Sequence Mining SMT-based Guard Condition Mining 380K Java Repositories on GitHub

1 2 3

slide-25
SLIDE 25

Insight 4: Variations in Guard Conditions

  • Guard conditions are canonicalized and grouped based on logical

equivalence.

25

Two equivalent guard conditions for String.substring: arg0>=0 && arg0<=rcv.length() ⇔ arg0>-1 && arg0<rcv.length()+1 Code Search Program Slicing Call Sequence Extraction Structured API call sequences Frequent Sequence Mining SMT-based Guard Condition Mining 380K Java Repositories on GitHub

1 2 3

slide-26
SLIDE 26

Insight 4: Variations in Guard Conditions

  • We use Z3 to prove the logic equivalence of guard conditions.
  • p ⇔ q is valid iff. ¬((¬p ∨ q) ∧ (p ∨ ¬q)) is UNSAT.

26

if (start>=0 && start<=s.length()) { s.substring(start); } if (i>-1 && i<log.length()+1) { log.substring(i); } p : arg0>=0 && arg0<=rcv.length() q : arg0>-1 && arg0<rcv.length()+1

slide-27
SLIDE 27

Outline

  • Problem Statement
  • API usage mining from 380K Java Projects on GitHub
  • An Empirical Study of API Misuse on Stack Overflow

27

slide-28
SLIDE 28

Data Set: API Usage Patterns

  • We learn 245 API usage patterns of 100 Java API methods.
  • We manually inspect these patterns.
  • 180 patterns can be confirmed by online documentation.

28

API method Pattern ArrayList.list loop {; get(int)@arg0<rcv.size(); } Scanner.nextLine loop {; nextLine()@rcv.hasNextLine(); } SQLiteDatabase.query query(…)@true; close()@true TypedArray.getString getString(int)@true; recycle()@true FileChannel.write try {; write(ByteBuffer)@true; }; catch(IOException) {; }

slide-29
SLIDE 29

Data Set: SO Posts

  • We extract code snippets in the markdown <code> from the SO

posts with the Java tag.

  • We parse and analyze these snippets using a partial program

parsing and type resolution technique, JavaBaker.

  • We find 217,818 SO posts with code snippets that use the 100 Java

API methods in our study scope.

29

Siddharth Subramanian, Laura Inozemtseva, and Reid Holmes. Live API

  • Documentation. ICSE 2014.
slide-30
SLIDE 30

Method of API Misuse Detection

  • We examine 220K SO posts with 180 confirmed patterns.

30

java Stack Overflow snippets Subsequence Check Call Sequence Extraction Guard Condition Check Structured API call sequences 180 Valid Patterns API usage violations

Dataset: http://web.cs.ucla.edu/~tianyi.zhang/examplecheck.html

query pattern(s)

slide-31
SLIDE 31

Method of API Misuse Detection

  • We examine 220K SO posts with 180 confirmed patterns.

31

java Stack Overflow snippets Subsequence Check Call Sequence Extraction Guard Condition Check Structured API call sequences 180 Valid Patterns API usage violations

Dataset: http://web.cs.ucla.edu/~tianyi.zhang/examplecheck.html

query pattern(s)

slide-32
SLIDE 32

Method of API Misuse Detection

  • We examine 220K SO posts with 180 confirmed patterns.

32

java Stack Overflow snippets Subsequence Check Call Sequence Extraction Guard Condition Check Structured API call sequences 180 Valid Patterns API usage violations

Dataset: http://web.cs.ucla.edu/~tianyi.zhang/examplecheck.html

query pattern(s)

slide-33
SLIDE 33

Method of API Misuse Detection

  • We examine 220K SO posts with 180 confirmed patterns.

33

java Stack Overflow snippets Subsequence Check Call Sequence Extraction Guard Condition Check Structured API call sequences 180 Valid Patterns API usage violations

Dataset: http://web.cs.ucla.edu/~tianyi.zhang/examplecheck.html

query pattern(s)

slide-34
SLIDE 34

Method of API Misuse Detection

  • We examine 220K SO posts with 180 confirmed patterns.

34

java Stack Overflow snippets Subsequence Check Call Sequence Extraction Guard Condition Check Structured API call sequences 180 Valid Patterns API usage violations

Dataset: http://web.cs.ucla.edu/~tianyi.zhang/examplecheck.html

query pattern(s)

slide-35
SLIDE 35
  • RQ1. Is API Misuse Prevalent on Stack Overflow?
  • 31% of SO posts contain API usage violations.
  • Two authors independently inspected 400 SO posts with

reported API usage violations.

  • 289 posts (72%) can negatively impact on production code.

35

slide-36
SLIDE 36

36

public ArrayList get_user_by_id(String id) { ArrayList listUserInfo = new ArrayList(); SQLiteDatabase db = this.getReadableDatabase(); Cursor cursor = db.query(...); if (cursor != null) { while (cursor.moveToNext()) { UserInfo userInfo = new UserInfo(); userInfo.setAppId(cursor.getString(cursor.getColumnIndex( COLUMN_APP_ID))); // HERE YOU CAN MULTIPLE RECORD AND ADD TO LIST 11 listUserInfo.add(userInfo); } } return listUserInfo; }

A code example of getting data from SQLite database in Android [Post ID 31531250]

A Cursor object is created but never released.

slide-37
SLIDE 37

37

public ArrayList get_user_by_id(String id) { ArrayList listUserInfo = new ArrayList(); SQLiteDatabase db = this.getReadableDatabase(); Cursor cursor = db.query(...); if (cursor != null) { while (cursor.moveToNext()) { UserInfo userInfo = new UserInfo(); userInfo.setAppId(cursor.getString(cursor.getColumnIndex( COLUMN_APP_ID))); // HERE YOU CAN MULTIPLE RECORD AND ADD TO LIST 11 listUserInfo.add(userInfo); } } return listUserInfo; }

A code example of getting data from SQLite database in Android [Post ID 31531250]

A Cursor object is created but never released.

slide-38
SLIDE 38

38

public ArrayList get_user_by_id(String id) { ArrayList listUserInfo = new ArrayList(); SQLiteDatabase db = this.getReadableDatabase(); Cursor cursor = db.query(...); if (cursor != null) { while (cursor.moveToNext()) { UserInfo userInfo = new UserInfo(); userInfo.setAppId(cursor.getString(cursor.getColumnIndex( COLUMN_APP_ID))); // HERE YOU CAN MULTIPLE RECORD AND ADD TO LIST 11 listUserInfo.add(userInfo); } } return listUserInfo; }

A code example of getting data from SQLite database in Android [Post ID 31531250]

A Cursor object is created but never released.

slide-39
SLIDE 39

39

public ArrayList get_user_by_id(String id) { ArrayList listUserInfo = new ArrayList(); SQLiteDatabase db = this.getReadableDatabase(); Cursor cursor = db.query(...); if (cursor != null) { while (cursor.moveToNext()) { UserInfo userInfo = new UserInfo(); userInfo.setAppId(cursor.getString(cursor.getColumnIndex( COLUMN_APP_ID))); // HERE YOU CAN MULTIPLE RECORD AND ADD TO LIST 11 listUserInfo.add(userInfo); } } return listUserInfo; }

A code example of getting data from SQLite database in Android [Post ID 31531250]

A Cursor object is created but never released.

slide-40
SLIDE 40

40

public ArrayList get_user_by_id(String id) { ArrayList listUserInfo = new ArrayList(); SQLiteDatabase db = this.getReadableDatabase(); Cursor cursor = db.query(...); if (cursor != null) { while (cursor.moveToNext()) { UserInfo userInfo = new UserInfo(); userInfo.setAppId(cursor.getString(cursor.getColumnIndex( COLUMN_APP_ID))); // HERE YOU CAN MULTIPLE RECORD AND ADD TO LIST 11 listUserInfo.add(userInfo); } } return listUserInfo; }

A code example of getting data from SQLite database in Android [Post ID 31531250]

A Cursor object is created but never released.

slide-41
SLIDE 41
  • RQ1. Is API Misuse Prevalent on Stack Overflow?
  • Many SO snippets use hardcoded input for illustration.
  • They may crash with real-world input data.

41

String text = "<img src=\"mysrc\" width=\"128\" height=\"92\" border=\"0\" alt=\"alt\" /><p><strong>"; text = text.substring(text.indexOf("src=\"")); text = text.substring("src=\"".length()); text = text.substring(0, text.indexOf("\"")); System.out.println(text);

A code example of extracting the src field from a html string [Post ID 12742734]

slide-42
SLIDE 42
  • RQ1. Is API Misuse Prevalent on Stack Overflow?
  • Many SO snippets use hardcoded input for illustration.
  • They may crash with real-world input data.

42

String text = "<img src=\"mysrc\" width=\"128\" height=\"92\" border=\"0\" alt=\"alt\" /><p><strong>"; text = text.substring(text.indexOf("src=\"")); text = text.substring("src=\"".length()); text = text.substring(0, text.indexOf("\"")); System.out.println(text);

A code example of extracting the src field from a html string [Post ID 12742734]

slide-43
SLIDE 43
  • Highly-voted posts are not necessarily more reliable in terms
  • f correct API usage.
  • RQ2. Are highly voted posts more reliable?

43

slide-44
SLIDE 44
  • RQ3. What are characteristics of API misuse?
  • API misuse is caused by three main reasons---missing control

structures, missing or incorrect order of API calls, and incorrect guard conditions.

44

Cursor emails = db.query(Email.CONTENT_URI,...); while (emails.moveToNext()) { String email = emails.getString(..); } emails.close();

Missing finally block [Post ID 31427468]

slide-45
SLIDE 45
  • RQ3. What are characteristics of API misuse?
  • API misuse is caused by three main reasons---missing control

structures, missing or incorrect order of API calls, and incorrect guard conditions.

45

Cursor emails = db.query(Email.CONTENT_URI,...); while (emails.moveToNext()) { String email = emails.getString(..); } emails.close();

Missing finally block [Post ID 31427468]

slide-46
SLIDE 46
  • RQ3. What are characteristics of API misuse?
  • API misuse is caused by three main reasons---missing control

structures, missing or incorrect order of API calls, and incorrect guard conditions.

46

ByteBuffer bb = ByteBuffer.allocate(4); bb.put(newArgb); int i = bb.getInt();

Missing an API call, bb.flip() [Post ID 12100651]

slide-47
SLIDE 47
  • RQ3. What are characteristics of API misuse?
  • API misuse is caused by three main reasons---missing control

structures, missing or incorrect order of API calls, and incorrect guard conditions.

47

TreeMap map = new TreeMap(); //OR SortedMap map = new //TreeMap(); map.firstKey();

Incorrect guard condition, !map.isEmpty() [Post ID 21983867]

slide-48
SLIDE 48
  • RQ3. What are characteristics of API misuse?
  • Network, database, IO, crypto, string manipulation APIs are

more likely to be misused.

48

slide-49
SLIDE 49

ExampleCheck: Checking API Misuse in Stack Overflow using Patterns Mined from GitHub

49

slide-50
SLIDE 50

50

ExampleCheck: Checking API Misuse in Stack Overflow using Patterns Mined from GitHub

https://chrome.google.com/webstore/detail/examplecheck/amliempeb ckaiaklimcpopomlnklkioe

slide-51
SLIDE 51

Examplore: Visualizing API Usage Examples at Scale [CHI 2018]

  • Examplore visualizes API usage features in hundreds of code

examples along with their statistical distribution in histograms.

51

Tool: http://examplore.cs.ucla.edu:3000/

slide-52
SLIDE 52

Our Contributions

  • An API usage mining technique that extracts patterns from
  • ver 380K GitHub projects
  • A large-scale empirical study of API misuse in 220K SO posts
  • A Chrome extension that augments Stack Overflow with API

usage patterns mined from GitHub

52