How Developers Read and Comprehend Stack Overflow Questions for Tag - - PowerPoint PPT Presentation

how developers read and comprehend stack overflow
SMART_READER_LITE
LIVE PREVIEW

How Developers Read and Comprehend Stack Overflow Questions for Tag - - PowerPoint PPT Presentation

How Developers Read and Comprehend Stack Overflow Questions for Tag Prediction Senior Capstone Project By: Ali Morris Objectives Determine what developers focus on when reading Stack Overflow questions to assign tags using eye-tracking


slide-1
SLIDE 1

How Developers Read and Comprehend Stack Overflow Questions for Tag Prediction

Senior Capstone Project By: Ali Morris

slide-2
SLIDE 2

Objectives

  • Determine what developers focus on when reading Stack Overflow questions

to assign tags using eye-tracking

  • Determine valuable areas of interest (AOIs) for tag assignment especially

keywords

Research Questions

  • RQ1. Which sections of postings are most valuable when assigning tags (code, title,

etc)?

  • RQ2. How will non-novice developers compare against novice developers in

regards to tag assignment accuracy, reading patterns, areas of interest?

  • RQ3. How can this information be used to enhance existing auto-generating tag

techniques?

2

slide-3
SLIDE 3

Stack Overflow

  • The largest online community for programmers to learn & share their knowledge

○ 2 million questions, 19 million answers and 47 million comments ○ Available to download; data dump of size 70GB

  • Forum format where developers can post questions and others can respond
  • Organization of site dependent on classification scheme driven by tagging system
  • Why is auto-tagging important? →

○ Users may not know how to correctly categorize questions ○ Stack Overflow dependent upon this for organization, usefulness

  • Current auto-generation tag accuracy: 68.47% [1]

3

slide-4
SLIDE 4

Related Work

  • Studies to auto-generate tags for Stack Overflow without eye-tracking
  • Current approaches all similar:

○ Data Mining & Machine Learning Algorithms [1]-[3] ■ Extract important features by tokenizing many postings ■ Train algorithms on existing data to predict tags for new postings

  • Can use these concepts for future work
  • Can improve tag accuracy with eye tracking as implicit feedback

4

slide-5
SLIDE 5

Eye-Tracking

  • Gaze data holds information about visual attention

○ Thought processes, strategies, user technique

  • A new field: Eye-tracking to study how developers work
  • Huge amount of data per session:

○ Running at 60Hz → 60 samples per second

  • Different types of gaze data holding different information

5

slide-6
SLIDE 6

Eye-Tracking

  • Types of gaze data & analysis:

○ Fixation: focus point where the eyes remain stationary for some time ○ Duration: total fixation time for an area ○ Saccade: Quick eye movement between fixations ○ Scanpath: sequences saccade-fixation-saccade that interconnect ○ Area of Interest (AOI): specific areas on the screen on which quantitative eye movements (fixation counts and durations) are calculated

6

slide-7
SLIDE 7

Experiment Design

  • Conducted in eye-tracking lab utilizing Tobii Studio
  • 7 participants

○ CS, CIS, & EE majors attending Youngstown State University ○ Coding experience in C/C++ of less than a year and up to 5 years ○ Each briefed on the study and participated in pre and post surveys

  • Participants presented with of 9 tasks from 3 different categories

○ Sourced directly from Stack Overflow ○ Questions C/C++ relevant ○ Categories increased with complexity & curated based on defined criteria

  • Participants assigned up to 5 tags from a Suggested Tags list

○ 10 possible tags: 5 relevant, 5 distractors ○ Participants allowed to suggest tags not in list if necessary 7

slide-8
SLIDE 8

Task Categories

Simple Content commonly taught in CS1:

  • Simple data types
  • Operators
  • Control structures
  • Basic properties of C/C++ language.

Average Knowledge beyond CS1 level & comes from experience developing:

  • Specific details of data structures
  • Involved application of aspects from the simple level

Complex Applications of more difficult/compound topics:

  • Algorithm designs
  • Complicated memory management techniques
  • Obscure/intense properties of the C++ language.

8

slide-9
SLIDE 9

Figure 1. Sample Task Representation 9

slide-10
SLIDE 10

Analysis

  • AOI groups assigned to each task:

○ Title ○ Description ○ Code ○ Relevant Tags ○ Distractor Tags ○ Keywords 10

slide-11
SLIDE 11

Figure 2. AOI Representation 11

slide-12
SLIDE 12

Analysis: Tag Accuracy

  • Average Accuracy: 90.57%
  • Average Tags per Task: 3
  • Feedback on overall confidence

levels generally reflected accuracy

12

slide-13
SLIDE 13

Analysis: Tag Accuracy

Tag accuracy decreases with difficulty

13

slide-14
SLIDE 14

Analysis: Overall Fixation Duration

  • Averages of all recordings
  • Relevant and Distractor Tags

approximately equal

  • Most focus time on

Description & code

  • Least focus time on Title

14

slide-15
SLIDE 15

Analysis: Overall Fixation Duration over Categories

Noticable Duration trends on Code & Title fixations

15

slide-16
SLIDE 16

Analysis: Overall Fixation Count

  • Averages of all recordings
  • Approximately consistent

with Duration Fixations

16

slide-17
SLIDE 17

Analysis: Overall Fixation Count over Categories

Same trends appear with changes in Code and Title

17

slide-18
SLIDE 18

Analysis: Accuracy Non-novice v. Novice

  • Non-novice performed slightly better
  • Where novice excelled:

○ Average Level Tasks Also only assigned 1-2 tags in this category Average Tag Assignment

  • Non-novice: 3-4 tags
  • Novice: 2 tags

Non-novice more confident in general in tag assignment 18

slide-19
SLIDE 19

Analysis: Fixation Duration Non-Novice v. Novice

19 Duration Ratios: Code: 32% Title & Description: 37% Duration Ratios: Code: 22% Title & Description: 46%

slide-20
SLIDE 20

Analysis: Fixation Count Non-Novice v Novice

20 Count Ratios: Code: 32% Title & Description: 43% : Code: 13 s Title & Description: 27 s Count Ratios: Code: 24% Title & Description: 50% : Code: 13 s Title & Description: 27 s

slide-21
SLIDE 21

Analysis: Keywords

  • First time to fixation

○ Tags not evaluated before posting

  • Notice: on average a quick

fixation on keywords... 21

slide-22
SLIDE 22

Analysis: Keywords

  • Readers often go back to keyword after first fixation
  • Average of 26% of fixation on keywords; a small portion of screen

22

slide-23
SLIDE 23

Fixation Count vs Duration vs Visits [4]

23

slide-24
SLIDE 24

Conclusions

  • Fixation count & duration often correlates
  • Approximately equal time spent evaluating Relevant and Distractor tags
  • With an increase in difficulty →

■ Increase of fixations on Code ■ Decrease of fixations on Title (especially true for non-novice programmers)

  • Non-novice programmers: perform better, assigned more tags, focus more on

code in comparison to novice & use it more as questions become more difficult

  • Novice programmers: less accuracy in tag assignment, assigned less tags,

focus mostly on description & title

  • From visual and statistical analysis: developers tend to evaluate postings first

and tags after (sequential pattern)

○ Learning styles & reading patterns can affect outcome [5]

  • Developers quickly focus on keywords & revisit frequently throughout

evaluation

24

slide-25
SLIDE 25

Future Work

Continuation of this project:

  • Machine algorithms (informed by eye-gaze) to predict tags:

○ Linear Support Vector Machines (SVM), Naive Bayes, Random Forest

Keyword Identification:

  • Identify keywords in text automatically

○ Consider existing models for tag generation compounded with eye-tracking

  • Recognize code as relevant keywords

○ Will differ with different languages 25

slide-26
SLIDE 26

References

[1]

  • A. K. Saha, R. K. Saha, and K. A. Schneider, “A discriminative model approach for suggesting tags

automatically for stack overflow questions,” in Proceedings of the 10th Working Conference on Mining Software Repositories , 2013. [2]

  • C. Stanley and M. D. Byrne, “Predicting tags for stackoverflow posts,” in Proceedings of ICCM , 2013, vol.

2013. [3]

  • S. Schuster, W Zhu, Y. Cheng, “Predicting Tags for Stack Overflow Questions”, 2013.

[4] Tobii AB, “Tobii Studio User’s Manual”, Version 3.4.5, 2016. [5]

  • A. Goswami, G. Walia, M. McCourt, G. Padmanabhan, “Using Eye Tracking to Investigate Reading Patterns

and Learning Styles of Software Requirement Inspectors to Enhance Inspection Team Outcome”, in Proceedings of ESEM, 2016. 26