Software Analytics: Opportunities and Challenges Olga Baysal - - PowerPoint PPT Presentation

software analytics opportunities and challenges
SMART_READER_LITE
LIVE PREVIEW

Software Analytics: Opportunities and Challenges Olga Baysal - - PowerPoint PPT Presentation

Software Analytics: Opportunities and Challenges Olga Baysal Latifa Guerrouj School of Computer Science Dpartement de GL et des TI Carleton University cole de Technologie Suprieure olga.baysal@carleton.ca


slide-1
SLIDE 1

Software Analytics: Opportunities and Challenges

Olga Baysal

  • School of Computer Science

Carleton University

  • lga.baysal@carleton.ca
  • lgabaysal.com

@olgabaysal Latifa Guerrouj

  • Département de GL et des TI

École de Technologie Supérieure latifa.guerrouj@etsmtl.ca latifaguerrouj.ca @Latifa_Guerrouj

slide-2
SLIDE 2

Development Data

Development (Big) Data

2

slide-3
SLIDE 3

Mining Software Repositories

3

Docs

Other artifacts

Issues Mailing lists

Source code information

Code Tests Build/ Config Version Control

Runtime data

Crash repos Field logs

Usage data Detection and analysis of hidden patterns and trends

slide-4
SLIDE 4

Problem

4

Development artifacts Stakeholders

Data Is NOT Actionable

slide-5
SLIDE 5

Decisions Drive Development!

5

QA Developers Managers Release engineers

Program correctness Product deadlines Risks, cost, operation, planning Program defects When do we release? Are we ready to release? How do I fix this bug? How effective is our test suite?

Decisions are often based on intuition or experience

slide-6
SLIDE 6

Solution – Analytics

6

slide-7
SLIDE 7

Analytics In Industry

7

slide-8
SLIDE 8

Software Analytics

8

slide-9
SLIDE 9

Benefits of Software Analytics

9

slide-10
SLIDE 10

Supporting Development Decisions

10

Data-driven decision making, fact-based views of projects

QA Developers Managers Release engineers

Product deadlines Risks, cost, operation, planning Program defects When do we release? Are we ready to release? How do I fix this bug? How effective is our test suite? Program correctness

slide-11
SLIDE 11

Software Artifacts

  • Source Code
  • Execution Trace
  • Development History
  • Bug Reports
  • Code Reviews
  • Developer Activities
  • Software Forums
  • Software Microblogs

11

slide-12
SLIDE 12

Artifact: Source Code

  • Various languages
  • Various kinds of systems
  • Various scale: small, medium, large
  • Various complexities
  • Various programming styles

12

slide-13
SLIDE 13

4

Artifact: Source Code

13

slide-14
SLIDE 14

Artifact: Source Code

  • Where to find code?

– GitHub: https://github.com/ – Google code: http://code.google.com/ – Many other places online

  • How to analyze source code?

– Program analysis tools

  • WALA: http://wala.sourceforge.net
  • JPF: http://javapathfinder.sourceforge.net/
  • Soot: http://sable.github.io/soot/
  • Clang: http://clang-analyzer.llvm.org/
slide-15
SLIDE 15

Artifact: Execution Traces

  • Information collected when a program is run
  • What kind of information is collected?

– Sequences of methods that are executed – State of various variables at various times – State of various invariants at various times – Which components are loaded at various times

15

slide-16
SLIDE 16

Artifact: Execution Traces

16

Caller | Callee | Method Signature

slide-17
SLIDE 17

Artifact: Execution Traces

  • How to collect?

– Insert instrumentation code – Execute program – Instrumentation code writes a log file

  • What tools are available to collect traces?

– Daikon Chicory: http://groups.csail.mit.edu/pag/

daikon/dist/doc/daikon.html

– PIN: 


http://software.intel.com/en-us/articles/pin-a- dynamic-binary-instrumentation-tool

– Valgrind: http://valgrind.org/


17

slide-18
SLIDE 18

Artifact: Development History

  • What code is

– Added – Deleted – Edited

  • When
  • By Whom
  • For What Reason

18

slide-19
SLIDE 19

Artifact: Development History

19

slide-20
SLIDE 20

Artifact: Development History

20

slide-21
SLIDE 21

Artifact: Development History

  • Various tools

– CVS – Version per file – SVN – Version per snapshot – Git – Distributed

  • Slightly different ways to manage content

21

slide-22
SLIDE 22

Artifact: Bug Reports

  • People report errors and issues that they

encounter in the field

  • These errors include:

– Description of the bugs – Steps to reproduce the bugs – Severity level – Parts of the system affected by the bug – Failure traces

22

slide-23
SLIDE 23

Artifact: Bug Reports

  • Various kinds of bug repositories

– BugZilla: http://www.bugzilla.org/

Example site: 
 https://bugzilla.mozilla.org/

– JIRA: http://www.atlassian.com/software/jira/

Example site: https://issues.apache.org/jira/browse

23

slide-24
SLIDE 24

Artifact: Bug Reports

24

Title

slide-25
SLIDE 25

Artifact: Bug Reports

25

Detailed Description

slide-26
SLIDE 26

Artifact: Code Reviews

26

slide-27
SLIDE 27

Artifact: Code Reviews

27

slide-28
SLIDE 28

28

Artifact: Code Reviews

slide-29
SLIDE 29

Artifact: Developer Activities

  • Developers form a social network

– Developers work on various projects – Projects have various types, programming

languages and developers

– Developers follow updates from various

  • ther developers and projects
  • A heterogeneous social network is formed

29

slide-30
SLIDE 30

Artifact: Developer Activities

30

  • GitHub’s CEO, Tom Preston-Werner:

“We like the ideas of social networking. We think that developers work more effectively when they work together. So let’s take the ideas of a social network and add on top of that code hosting, and let’s create a site that makes it easy to share and collaborate on code”.

slide-31
SLIDE 31

Artifact: Developer Activities

There are more than 12M people collaborating right now on GitHub on over 31M projects using a powerful collaborative development workflow.

slide-32
SLIDE 32

32

slide-33
SLIDE 33

33

slide-34
SLIDE 34

34

slide-35
SLIDE 35

35

slide-36
SLIDE 36

Artifact: Software Forums

  • Developers ask and answer questions
  • About various topics
  • In various threads, some of which are very long
  • Stored in various sites

–StackOverflow: http://stackoverflow.com/ –SoftwareTripsAndTricks: http://

www.softwaretipsandtricks.com/forum/

36

slide-37
SLIDE 37

Artifact: Software Forums

37

slide-38
SLIDE 38

Artifact: Software Forums

38

slide-39
SLIDE 39

Artifact: Software Microblogs

  • Developers microblog too
  • Developers microblog about various activities:

– Advertisements – Code and tools – News – Q&A – Events – Opinions – Tips – Etc.

39

slide-40
SLIDE 40

Artifact: Software Microblogs

40

slide-41
SLIDE 41

Opportunities

  • Help developers/managers to understand their

projects, cope with their evolution, and support them during their decision-making.

  • Extract relevant & insightful information, analyze

it, and transform to decisions for the future.

  • Find trends, anticipate issues, and bring

awareness on weaknesses or conditions for making future decisions.

  • Make proactive decisions using proactive

analytics: predictive modelling, data mining, machine learning, statistical analysis, etc.

41

slide-42
SLIDE 42

Opportunities

  • Leading tech. companies need insights to

create actionable tools, increase quality, efficiency, services and risk management.

  • Organizations apply analytics to create
  • pportunities for growth, innovation and

competitive advantage.

  • Data analytics identify patterns, trends and
  • pportunities for improvement, enabling to

spot which initiatives work, which fail, and to adjust accordingly.

42

slide-43
SLIDE 43

Challenges

  • SE data without explicit format.
  • SE data is plentiful.
  • Acting on results from data analysis is not easy.
  • Analytic tools? May be but should meet the need and

be easy to use.

  • Adoption of software analytics into software

development processes.

  • Development and integration of analytics tools in

practical settings.

43