Inferring semantically related words from software context
Jinqiu Yang, Lin Tan University of Waterloo
1
Inferring semantically related words from software context Jinqiu - - PowerPoint PPT Presentation
Inferring semantically related words from software context Jinqiu Yang , Lin Tan University of Waterloo 1 Motivation I need to find all functions that disable interrupts in the Linux kernel. Hmmm, so I search for disable*interrupt .
1
2
3
4
Real comments and identifiers from the Linux kernel
5
6
7
Real comments from Apache HTTPD Server
Parsing
Apache
maybe add a higher-level description min of spare daemons data in the appropriate order the compiled max daemons an iovec to store the trailer sent after the file data in the wrong order an iovec to store the headers sent before the file return err maybe add a higher-level desc if a user manually creates a data file
8
Number of Common Words in the Two Sequences Total Number of Words in the Shorter Sequence
the compiled max threads min of spare threads
SimilarityMeasure = 1/4
SimilarityMeasure = 8/10
an iovec to store the trailer sent after the file an iovec to store the headers sent before the file
You can find how difgerent thresholds afgect our results in our paper.
9
10
maybe add a higher-level description min of spare daemons data in the appropriate order the compiled max daemons an iovec to store the trailer sent after the file data in the wrong order an iovec to store the headers sent before the file return err maybe add a higher-level desc if a user manually creates a data file
10
maybe add a higher-level description min of spare daemons data in the appropriate order the compiled max daemons an iovec to store the trailer sent after the file data in the wrong order an iovec to store the headers sent before the file return err maybe add a higher-level desc if a user manually creates a data file maybe add a higher-level description
maybe add a higher-level description
10
maybe add a higher-level description min of spare daemons data in the appropriate order the compiled max daemons an iovec to store the trailer sent after the file data in the wrong order an iovec to store the headers sent before the file return err maybe add a higher-level desc if a user manually creates a data file maybe add a higher-level description min of spare daemons data in the appropriate order the compiled max daemons an iovec to store the trailer sent after the file data in the wrong order an iovec to store the headers sent before the file return err maybe add a higher-level desc if a user manually creates a data file
maybe add a higher-level description return err maybe add a higher-level desc
an iovec to store the headers sent before the file an iovec to store the trailer sent after the file
data in the appropriate order data in the wrong order
m i n
s p a r e d a e m
s the compiled max daemons if a user manually create a data file
11
12
13
14
15
Software rPairs Accuracy Not in Webster or WordNet Linux HTTPD Collections iReport jBidWatcher javaHMO jajuk 108,571 1,428 469 878 111 144 203 47% 47% 74% 84% 64% 56% 69% 76.6% 93.6% 97.3% 95.2% 98.4% 91.1% 94.2% Total/Average 111,804 63% 91.7%
We randomly sample 100 rPairs per project for manual verification (all 111 for jBidWatcher).
16
17
17
SWUM gold set
18
Our approach (55 words) SWUM (84 words)
Precision = 3/55 = 5.5% Recall = 3/3 =100% Precision = 2/84 = 2.3% Recall = 2/3 = 67.7%
18
Our approach (55 words) SWUM (84 words)
Precision = 3/55 = 5.5% Recall = 3/3 =100% Precision = 2/84 = 2.3% Recall = 2/3 = 67.7%
19
20