Smelling Source Code Using Deep Learning Tushar Sharma - PowerPoint PPT Presentation

Smelling Source Code Using Deep Learning Tushar Sharma http://www.tusharma.in

What is a smell? …certain structures in the code that suggest (sometimes they scream for) the possibility of refactoring . - Kent Beck 20 Definitions of smells: http://www.tusharma.in/smells/smellDefs.html Smells’ catalog: http://www.tusharma.in/smells/

Implementation smells

Design Smells

Architecture Smells

How smells get detected?

Metrics-based smell detection Source model < > Metrics Code (or source artifact) < ! > Smells

Machine learning-based smell detection < > Existing < > examples < ! > Machine learning algorithm < > Code Source model (or source artifact) f(x) < ! > f(x) f(x) f(x) Smells Trained model

Machine learning-based smell detection Existing academic work: Take metrics as the - Support vector machines features/input - Bayesian belief network - Logistic regression - CNN m f(m) Validation on balanced samples

Research questions RQ1: Would it be possible to use deep learning methods to detect code smells? RQ2: Is transfer-learning feasible in the context of detecting smells? Transfer-learning refers to the technique where a learning algorithm exploits the commonalities between different learning tasks to enable knowledge transfer across the tasks

Overview Detected smells C# C# </> C# CodeSplit Code Java fragments Java Java -- -- ---- Learning data ---- -- -- generator Preprocess 23 51 23 51 -- -- 32 200 Tokenized 32 200 -- -- ---- 11 45 ---- 11 45 ---- ---- -- -- -- -- samples Research questions -- -- -- -- ---- ---- -- -- ---- -- ---- Deep learning -- ---- ---- -- -- Tokenizer -- -- models Positive and negative samples

Data Curation

Repositories download 2,528 and C# Java 1,072 selected 100 repositories C# Java repositories C# Java Architecture Community CI Documentation History License Issues Unit test Stars

Splitting code fragments C# </> -- ---- -- C# -- ---- -- ---- -- -- C# Code fragments CodeSplit (methods or classes) Java </> -- ---- -- -- ---- -- ---- -- Java -- Java https://github.com/tushartushar/CodeSplitJava

Smell detection C# C# C# Detected code smells Java Java Java Java https://github.com/tushartushar/DesigniteJava http://www.designite-tools.com/

Generating training and evaluation samples Code smells -- Code -- -- ---- -- ---- -- -- ---- ---- -- -- -- ---- ---- -- -- -- ---- -- fragments -- -- ---- -- ---- -- -- Sample generator Positive and negative samples

Tokenizing learning samples -- -- 23 51 23 51 ---- -- 32 200 -- ---- -- -- 32 200 -- 11 45 -- -- ---- -- ---- 11 45 -- ---- -- ---- -- ---- -- ---- -- -- ---- ---- -- -- -- -- Code fragments Tokenizer Tokenized samples https://github.com/dspinellis/tokenizer

Tokenizing learning samples public void InternalCallback(object state) 1-D { 123 2002 40 2003 41 59 474 123 2004 46 2005 Callback(State); try { 2-D timer.Change(Period, TimeSpan.Zero); 123 2002 40 2003 41 59 } 474 123 2004 46 2005 40 2006 44 2007 46 2008 catch (ObjectDisposedException) 125 329 40 2009 41 123 125 125 { } } -- -- 23 51 23 51 ---- -- 32 200 -- ---- -- -- 32 200 -- 11 45 -- -- ---- -- ---- 11 45 -- ---- -- ---- -- ---- -- ---- -- -- ---- ---- -- -- -- --

Data preparation 5,146 311,533 70-30 3,602 218,073 93,460 1,544 split 3,602 3,602 93,460 1,544 Training samples Evaluation samples

Selection of smells an unexplained the method has numeric literal is high cyclomatic used in an complexity • Complex method expression • Magic number • Empty catch block • Multifaceted abstraction a class has more a catch block of than one an exception is responsibility empty assigned to it

Architecture - CNN Inputs • Filters = {8, 16, 32, 64} Convolution layer • Kernel size = {5, 7, 11} Repeat this set of • Pooling window = {2, 3, 4, 5} Batch normalization layer hidden units Max pooling layer • Dynamic Batch size = {32, 64, 128, Dropout layer 0.1 256} Flatten layer 32, relu • Callbacks Dense layer 1 • Early stopping (patience = 5) 1, sigmoid Dense layer 2 • Model check point Output

Architecture - RNN • Dimensionality of embedding layer = {16, 32} Inputs • LSTM units = {32, 64, 128} Embedding layer Repeat this set of LSTM layer hidden units • Dynamic Batch size = {32, 64, 128, 0.2 Dropout layer 256} 1, sigmoid Dense layer • Callbacks Output • Early stopping (patience = 2) • Model check point

Running experiments • Phase 1 – Grid search for optimal hyper- parameters • Validation set – 20% • Number of configurations GRNET Super • CNN = 144 computing facility • RNN = 18 Each experiment using 1 GPU with 64 GB • Phase 2 – experiment with the optimal memory hyper-parameters

Results

RQ1. Would it be possible to use deep learning methods to detect code smells? F1 0.7 AUC-ROC 0.57 0.6 0.90 0.5 0.41 0.80 0.38 0.4 0.35 0.31 0.29 0.70 0.3 0.22 0.60 0.2 0.09 0.50 0.06 0.1 0.04 0.02 0.02 0.40 0 CM ECB MN MA CM ECB MN MA CM ECB MN MA CM ECB MN MA CM ECB MN MA CM ECB MN MA CNN-1D CNN-2D RNN CNN-1D CNN-2D RNN

CNN-1D vs CNN-2D CNN-1D (max) - 0.05 CNN-1D (max) - 0.40 CNN-2D (max) - 0.04 CNN-2D (max) - 0.39 CNN-1D (max) - 0.36 CNN-1D (max) - 0.18 CNN-2D (max) - 0.35 CNN-2D (max) - 0.16

CNN vs RNN RNN and RNN and CNN-1D CNN-2D CM -22.94 -33.81 ECB 80.23 91.94 MN 48.96 38.58 MA -349.12 -205.26 Difference in percentage; comparing max F1

Are more deep layers always good? Layers CM ECB MN MA 1 0.36 0.05 0.36 0.08 CNN- 2 0.40 0.05 0.36 0.18 1D 3 0.40 0.05 0.36 0.19 1 0.39 0.04 0.35 0.07 CNN- 2 0.39 0.04 0.34 0.16 2D 3 0.39 0.05 0.34 0.10 1 0.34 0.21 0.48 0.28 RNN 2 0.36 0.24 0.48 0.22 3 0.37 0.23 0.48 0.20

RQ2: Is transfer-learning feasible in the context of detecting smells? F1 F1 0.57 0.57 0.6 0.54 0.54 0.60 0.49 0.49 0.49 0.49 0.5 0.50 0.41 0.38 0.35 0.4 0.40 0.29 0.3 0.30 0.2 0.14 0.14 0.20 0.09 0.07 0.07 0.06 0.06 0.04 0.1 0.03 0.03 0.02 0.10 0.01 0 0.00 CM ECB MN MA CM ECB MN MA CM ECB MN MA CM ECB MN MA CNN-1D CNN-2D CNN-1D CNN-2D Transfer-learning Direct-learning

It is feasible to make the deep learning model Transfer-learning is learn to detect smells feasible. Conclusions Improvements – many possibilities - Performance - Add more smells – different kinds

Relevant links Source code and data https://github.com/tushartushar/DeepLearningSmells Smell detection tool Java - https://github.com/tushartushar/DesigniteJava C# - http://www.designite-tools.com CodeSplit </> Java - https://github.com/tushartushar/CodeSplitJava C# - https://github.com/tushartushar/DeepLearningSmells/tree/master/CodeSplit Tokenizer https://github.com/dspinellis/tokenizer

Thank you!! Courtesy: spikedmath.com

Smelling Source Code Using Deep Learning Tushar Sharma - PowerPoint PPT Presentation

Smelling Source Code Using Deep Learning Tushar Sharma http://www.tusharma.in What is a smell? certain structures in the code that suggest (sometimes they scream for) the possibility of refactoring . - Kent Beck 20 Definitions of smells:

in practice source code source code javac scalac groovyc jrubyc 0xCAFEBABE byte code

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

and Retrieval Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H.

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

What is a Compiler? Compiler A program that translates code in one language (source code) to

Code Generation Machine code generation cs4713 1 Machine code generation machine Intermediate

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

An Empirical Examination of the Relationship Between Code Smells and Merge Conflicts Iftekhar

Week 4 They Smell Like Sheep Characteris3cs of the Sheep Sheep People Poor eyesight when

Improving the Quality of Software Using Testing and Fault Prediction Professor Iftekhar Ahmed

DATA ANALYTICS USING DEEP LEARNING GT 8803 // VENKATA KISHORE PATCHA L E C T U R E # 0 6 : S M

Fine-Grained Generic Aspects Tobias Rho, Gnter Kniesel, Malte Appeltauer Dept. of Computer

Small Number is a 5 year-old boy who gets into a lot of mischief. He lives with his Grandma and

MasteringtheVim*Language Chris&Toomey @thoughtbot @christoomey h"ps:/ /ctoomey .com

Towards a Taxonomy of Grammar Smells V. Zaytsev @ SLE17 @ SPLASH joint work with M. Stijlaart

Smelling Source Code Using Deep Learning Tushar Sharma - PowerPoint PPT Presentation

Smelling Source Code Using Deep Learning Tushar Sharma http://www.tusharma.in What is a smell? certain structures in the code that suggest (sometimes they scream for) the possibility of refactoring . - Kent Beck 20 Definitions of smells:

in practice source code source code javac scalac groovyc jrubyc 0xCAFEBABE byte code

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

and Retrieval Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H.

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

What is a Compiler? Compiler A program that translates code in one language (source code) to

Code Generation Machine code generation cs4713 1 Machine code generation machine Intermediate

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

An Empirical Examination of the Relationship Between Code Smells and Merge Conflicts Iftekhar

Week 4 They Smell Like Sheep Characteris3cs of the Sheep Sheep People Poor eyesight when

Improving the Quality of Software Using Testing and Fault Prediction Professor Iftekhar Ahmed

DATA ANALYTICS USING DEEP LEARNING GT 8803 // VENKATA KISHORE PATCHA L E C T U R E # 0 6 : S M

Fine-Grained Generic Aspects Tobias Rho, Gnter Kniesel, Malte Appeltauer Dept. of Computer

Small Number is a 5 year-old boy who gets into a lot of mischief. He lives with his Grandma and

Mastering*the*Vim*Language Chris&amp;Toomey @thoughtbot @christoomey h&quot;ps:/ /ctoomey .com

Towards a Taxonomy of Grammar Smells V. Zaytsev @ SLE17 @ SPLASH joint work with M. Stijlaart

MasteringtheVim*Language Chris&Toomey @thoughtbot @christoomey h"ps:/ /ctoomey .com