Are Deep Neural Networks the Best Choice for Modeling Source Code? Authors: Vincent J. Hellendoorn, Premkumar Devanbu
COMP 762: Paper Presentation (Technical Details) Alexander - - PowerPoint PPT Presentation
COMP 762: Paper Presentation (Technical Details) Alexander - - PowerPoint PPT Presentation
COMP 762: Paper Presentation (Technical Details) Alexander Nicholson Are Deep Neural Networks the Best Choice for Modeling Source Code? Authors: Vincent J. Hellendoorn, Premkumar Devanbu Overview: Big question in the title Creates a
Overview:
▪ Big question in the title ▪ Creates a robust baseline model (Important!) ▪ Optimizations for a specific, practical task
Scope Language specific scoping? Static vs. Dynamic
Why online training/dynamism with (R)NN is hard: E.g. 50000 * 512 = 25600000 parameters between first and second layer.
Image From: Towards Deep Learning Software Repositories, White et al.
Smoothing (Discounting/Correction and Interpolation)
Resources: https://www.youtube.com/watch?v=FUS7XkhYBLo&list=PLBv09BD7ez_7Ke6U7yGBvfP4_Hau3ZGj2&index =5 https://nlp.stanford.edu/~wcmac/papers/20050421-smoothing-tutorial.pdf
Discounting/Correction: Subtract/Add a number from the counts. Interpolation: “Add information from known distributions” Weight additional distributions using
Laplace/Lidstone Correction Add and re-normalize
Image From: Stanford Smoothing Tutorial - https://nlp.stanford.edu/~wcmac/papers/20050421-smoothing-tutorial.pdf
Absolute Discounting Subtract and re-normalize Paper’s modification uses three values of
Image From: Stanford Smoothing Tutorial - https://nlp.stanford.edu/~wcmac/papers/20050421-smoothing-tutorial.pdf
Kneser-Ney Smoothing Paper’s modification uses three values of
Image From: Stanford Smoothing Tutorial - https://nlp.stanford.edu/~wcmac/papers/20050421-smoothing-tutorial.pdf
Jelinek-Mercer Smoothing General Interpolation: P(X) + (1-)P(Z) In J-M: is a constant.
Witten-Bell Smoothing
Image From: Stanford Smoothing Tutorial - https://nlp.stanford.edu/~wcmac/papers/20050421-smoothing-tutorial.pdf
Trie Data-structure
From Wikipedia: Children have common prefix.
Trie Data-structure Each scope has its own trie
Zipf’s Law
Image From: https://phys.org/news/2017-08-unzipping-zipf-law-solution-century-old.html
Memoization
- Optimization technique commonly used in dynamic
programming
- Cache-(ish) to avoid multiple recalculations.
http://cs.mcgill.ca/~jcheung/teaching/fall-2017/comp550/index.html
Dependency Models (Trees)
https://en.wikibooks.org/wiki/LaTeX/Linguistics
Dropout
http://cs.mcgill.ca/~hvanho2/comp551/
Evaluation Terms
- Mean Reciprocal Rank
- Explained well in Sec. 4.1
- Two-tailed t-test
- Statistical significance test
- Tests if target is higher OR lower than reference.
- Cohen’s D
- Effect Size
- Used with t-test