Discovering Information Explaining API Types Using Text Classification
Presented by: Sunyam Bagga Course Instructor:
- Dr. Jin Guo
Discovering Information Explaining API Types Using Text Course - - PowerPoint PPT Presentation
Discovering Information Explaining API Types Using Text Course Instructor: Classification Dr. Jin Guo Presented by: Sunyam Bagga TEXT CLASSIFICATION Relevant/Irrelevant [API type, Section fragment] Source:
Discovering Information Explaining API Types Using Text Classification
Presented by: Sunyam Bagga Course Instructor:
TEXT CLASSIFICATION
Source: https://www.python-course.eu/text_classification_introduction.php
Relevant/Irrelevant [API type, Section fragment]
Technical Concepts
RecoDoc
“Recovering Traceability Links between an API and Its Learning Resources”
1
Aim:
“DateTime….such as year() or monthOfYear().”
specific code elements (e.g., DateTime.year())
Ambiguity
▪ Declaration Ambiguity: CLTs are rarely fully qualified. ▪ Overload Ambiguity: CLTs do not indicate the number/type of parameters (method is
▪ External Reference Ambiguity: May refer to code elements in external libraries. ▪ Language Ambiguity: Human errors: typographical (HtttpClient), case errors, forgetting parameters etc.
Parsing Artifacts and Recovering Traceability Links
codebase whose name matches the term.
LOOCV
“Evaluating a classifier’s performance”
2
Leave-one-out Cross Validation
Source: https://towardsdatascience.com/train-test-split-and-cross-validation-in-python-80b61beca4b6
MaxEnt Classifier
“Using Maximum Entropy for Text Classification” by Nigam et al.
3
Maximum Entropy:
from data
distribution that has the maximum entropy (most-uniform).
distribution
Example
Source: NLP by Dan Jurafsky and Chris Manning
Add Noun feature: f1 = {NN, NNS, NNP, NNPS} Add Proper Noun feature: f2 = {NNP, NNPS}
Source: NLP by Dan Jurafsky and Chris Manning
Constraints and Features
expected value for a feature as seen in training data, D:
Cosine Similarity with tf-idf
“Comparison with Information Retrieval”
4
Tf-Idf
weight rare words.
Cosine Similarity
the vectors:
similarity value is higher than a certain threshold.
KAPPA score
“Annotating the Experimental Corpus”
5
Kappa formula
▪ Po: observed agreement among annotators ▪ Pe: hypothetical probability of chance agreement ▪ More robust than simple percent agreement calculation
Kappa Example:
▪ Po = (20+15) / 50 = 0.7 ▪ P(Yes) = 0.5*0.6 = 0.3 ▪ P(No) = 0.5*0.4 = 0.2 ▪ Pe = P(Yes) + P(No) = 0.5 Kappa = (0.7 - 0.5) / (1 - 0.5) = 0.4
Source: https://en.wikipedia.org/wiki/Cohen%27s_kappa
Any questions?