Toward Mining Concept Keywords from Identifiers in Large Software - - PowerPoint PPT Presentation

toward mining concept keywords from identifiers in large
SMART_READER_LITE
LIVE PREVIEW

Toward Mining Concept Keywords from Identifiers in Large Software - - PowerPoint PPT Presentation

Toward Mining Concept Keywords from Identifiers in Large Software Projects Masaru Ohba and Katsuhiko Gondow Tokyo Institute of Technology What are concept keywords? Most programmers try to name identifiers meaningfully.


slide-1
SLIDE 1

Toward Mining “Concept Keywords” from Identifiers in Large Software Projects

Masaru Ohba and Katsuhiko Gondow Tokyo Institute of Technology

slide-2
SLIDE 2

What are “concept keywords”?

  • Most programmers try to name identifiers meaningfully.
  • Concept keywords are defined terms that describe key

concepts to aid in as program understanding.

– e.g. read_dirent() : dirent is a concept keyword.

Concept keywords dirent, root, PTE, tss, path, signal, yield Grouping words kbd , vga , FAT12 , sys , H, t Attributes, less important concepts busy, byte, offset, name, memory, end, int8, again Generic verbs read, set, is, move, wait, print, dump, make, init

Human-selected concept keywords and other category words in udos

slide-3
SLIDE 3

Suggestion

  • We should use more “concept keywords” in

program understanding tools.

– concept keywords are concise and descriptive

  • Our solution:

– provides a way to mine concept keywords.

  • ckTF/IDF methods / Identifier Exploratory Framework

– could be used to build tools that support and utilize extracted concept keywords (future work).

slide-4
SLIDE 4

Future work

  • Applying concept keywords to a Bug Tracking System

(BTS) to see the relationship between bug report and corresponding problem source code.

Bug-report no.1 Overview: It could not read directories. Bug-report no.3 Overview: I could not catch system calls. dirent fat12.c read_dirent() { return NULL; } task.c signal sys_signal(){ sys_kill(); } Concept keyword can bridge the gap between bug-reports and source code.