Challenges in Mining Whole Software Universe Katsuro Inoue Osaka - - PowerPoint PPT Presentation

challenges in mining whole software universe
SMART_READER_LITE
LIVE PREVIEW

Challenges in Mining Whole Software Universe Katsuro Inoue Osaka - - PowerPoint PPT Presentation

Challenges in Mining Whole Software Universe Katsuro Inoue Osaka University Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Analyzing Evolution of


slide-1
SLIDE 1

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Challenges in Mining Whole Software Universe

Katsuro Inoue Osaka University

slide-2
SLIDE 2

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 0.4 0.5 0.6 0.7 0.8 0.9 1 1993/01/31 1995/10/28 1998/07/24 2001/04/19 2004/01/14 2006/10/10 2009/07/06 2012/04/01 Cover Ratio Last modified time

26 2 49 50 47 48 1 3 4 5 6 7 8 10 9 11 12 13 14 15 16 17 18 19 24 25 27

28-33

34 36 37-46 20-23 35 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 1 Lites 1.0 (G) 28-33 Kame (G) 2 Kernel Source Archive - CMU Mach 3.0 (K) 34-36 SimOS (K) 3 Lites 1.1.u3 (G) 27-46 Kame (G) 4 Lites 1.1-950808 (G) 47 Netnice (G) 5 The Rio (RAM I/O) Project (K) 48 Kame (G) 6 ftp in The University

  • f Edinburgh

(G) 49-50 Psumip (G) 7 Mip-summer98 (G) 51 Netnice (G) 8 freeBSD/SPARC (G) 52 Reflexprotocol (G) 9-12 ftp in Stockholm University (G) 53 Netnice (G) 13 freeBSD-cam2.1.5R (G) 54 NetBSD v1.105 (K) 14-15 SonicOSX (K) 55 OpenBSD PV Xen (G) 16 Labyrinth BSD(labyrinthos) (K) 56 OpenBSD v1.73 (K) 17 Oskit (G) 57 Pmon (G) 18 Psumip (G) 58-62 Proyecto A.T.L.D. GNU/hurd(extremeli nux) (K) 19 Mach (G) 63 OpenBSD v1.74 (K) 20-22 Savannah (G) 64 Pmon (G) 23 Unofficial OSKit source (K) 65 774 (G) 24-26 Unofficial OSKit source(oskit) (K) 66 Chord-ns3 (G) 27 ftp in Stockholm University (G) 67 Openbsd-loongson- vc (G)

Results by G(Google Code Search) and K(Koders)

: File in Original BSD License : File in New BSD License

Analyzing Evolution of kern_malloc

2

slide-3
SLIDE 3

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Analyzing Reuse of Outdated Libraries

3 1 2 3 4

v1.0.11 v1.2.1 v1.2.7 v1.2.8 v1.2.5 v1.2.12 v1.2.16 v1.2.21 v1.2.22 v1.2.23 v1.2.24 v1.2.27 v1.2.29 v1.2.32 v1.2.33 v1.2.34 v1.2.35 v1.2.37 v1.2.39 v1.2.40 v1.2.42 v1.4.1 v1.2.43 v1.4.2 v1.4.4 v1.2.44 v1.4.6beta06 v1.5.1 v1.5.4 v1.5.7 v1.2.46 v1.4.8 v1.2.49 v1.5.9 v1.5.10 v1.5.12 v1.5.13

Vulnerabilities reported No defects reported

Vulnerability of 50 OSS Projects Using libpng

Result from Google Code Search and Koders

slide-4
SLIDE 4

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Experience and Concern

Mining source code repositories, e.g.,

SourceForge, Github, Open Hub, Google Code, Marven, ...

– Outcomes heavily depend on repository contents – Aren't we mining a small world? – There may be many other source code contents in the universe

(BlackDuck)

slide-5
SLIDE 5

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

slide-6
SLIDE 6

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Whole Software Universe 𝑉

  • Whole Software Universe

𝑉 ≡ 𝐷𝑝𝑚𝑚𝑓𝑑𝑢𝑗𝑝𝑜 𝑝𝑔 𝐵𝑚𝑚 𝑇𝑝𝑔𝑢𝑥𝑏𝑠𝑓 𝐸𝑓𝑤𝑓𝑚𝑝𝑞𝑓𝑒 𝑐𝑧 𝐼𝑣𝑛𝑏𝑜 𝑗𝑜 𝑢ℎ𝑓 𝑄𝑏𝑡𝑢

– Open source software – Personally-developed software – Proprietary software ... any others

  • 𝑄 : Set of all meaningful software

(a countable infinite set)

  • 𝑉 ⊆ 𝑄
slide-10
SLIDE 10

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Questions for 𝑉

A) How do we get 𝑉? B) What do we mine from 𝑉? C) How do we mine 𝑉? D) Why do we mine 𝑉?

?

slide-11
SLIDE 11

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

A) How Do We Get 𝑉?

  • No one knows actual 𝑉
  • So we would collect many repositories, and

construct a subset 𝑉′ ⊆ 𝑉

  • 𝑉′ should be as large as possible, of course
  • 𝑉′ should reflect characteristics of 𝑉
  • Challenges

– Collecting and unifying different repositories into 𝑉′

  • Duplication, coherence, ...

– Performance and capacity for 𝑉′ – Updating and maintaining 𝑉′

slide-12
SLIDE 12

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

B) What Do We Mine from 𝑉?

Examples

  • Simple metrics of 𝑉 over history

– Size 𝑉 𝑢1,|𝑉|𝑢2,… – Language usage …

  • Density of 𝑉 with respect to 𝑄
  • History and evolution of code 𝑑 in 𝑉

– Origin version of 𝑑 – Closely related code 𝑑′ (clone, variation, family, ...) – Future prediction for 𝑑

slide-13
SLIDE 13

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

C) How Do We Mine 𝑉 (𝑉′)?

  • 1. Direct mining

– Good model – Powerful machine

  • 2. Indirect mining

– Use external services – Reconstruct mining result from those external services

slide-14
SLIDE 14

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Direct Mining

𝑽

𝑽′

Copy of 𝑽′

slide-15
SLIDE 15

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Indirect Mining

𝑽

𝑽′

Query Decomposition and Result Composition

Mashup Engine

Want to know about 𝑽′

slide-16
SLIDE 16

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

D) Why Do We Mine 𝑉?

Objectives of mining 𝑉

  • Reuse and knowledge transfer

– We do not want to reinvent the wheel

  • Historical Archive

– Frontier's wisdom

...

slide-17
SLIDE 17

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Discussion!

  • Is it interesting research topics?
  • Can we get useful research results?
  • Is it feasible research target?
slide-18
SLIDE 18

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Thank you

slide-19
SLIDE 19

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University