Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Challenges in Mining Whole Software Universe Katsuro Inoue Osaka - - PowerPoint PPT Presentation
Challenges in Mining Whole Software Universe Katsuro Inoue Osaka - - PowerPoint PPT Presentation
Challenges in Mining Whole Software Universe Katsuro Inoue Osaka University Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Analyzing Evolution of
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 0.4 0.5 0.6 0.7 0.8 0.9 1 1993/01/31 1995/10/28 1998/07/24 2001/04/19 2004/01/14 2006/10/10 2009/07/06 2012/04/01 Cover Ratio Last modified time
26 2 49 50 47 48 1 3 4 5 6 7 8 10 9 11 12 13 14 15 16 17 18 19 24 25 27
28-33
34 36 37-46 20-23 35 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 1 Lites 1.0 (G) 28-33 Kame (G) 2 Kernel Source Archive - CMU Mach 3.0 (K) 34-36 SimOS (K) 3 Lites 1.1.u3 (G) 27-46 Kame (G) 4 Lites 1.1-950808 (G) 47 Netnice (G) 5 The Rio (RAM I/O) Project (K) 48 Kame (G) 6 ftp in The University
- f Edinburgh
(G) 49-50 Psumip (G) 7 Mip-summer98 (G) 51 Netnice (G) 8 freeBSD/SPARC (G) 52 Reflexprotocol (G) 9-12 ftp in Stockholm University (G) 53 Netnice (G) 13 freeBSD-cam2.1.5R (G) 54 NetBSD v1.105 (K) 14-15 SonicOSX (K) 55 OpenBSD PV Xen (G) 16 Labyrinth BSD(labyrinthos) (K) 56 OpenBSD v1.73 (K) 17 Oskit (G) 57 Pmon (G) 18 Psumip (G) 58-62 Proyecto A.T.L.D. GNU/hurd(extremeli nux) (K) 19 Mach (G) 63 OpenBSD v1.74 (K) 20-22 Savannah (G) 64 Pmon (G) 23 Unofficial OSKit source (K) 65 774 (G) 24-26 Unofficial OSKit source(oskit) (K) 66 Chord-ns3 (G) 27 ftp in Stockholm University (G) 67 Openbsd-loongson- vc (G)
Results by G(Google Code Search) and K(Koders)
: File in Original BSD License : File in New BSD License
Analyzing Evolution of kern_malloc
2
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Analyzing Reuse of Outdated Libraries
3 1 2 3 4
v1.0.11 v1.2.1 v1.2.7 v1.2.8 v1.2.5 v1.2.12 v1.2.16 v1.2.21 v1.2.22 v1.2.23 v1.2.24 v1.2.27 v1.2.29 v1.2.32 v1.2.33 v1.2.34 v1.2.35 v1.2.37 v1.2.39 v1.2.40 v1.2.42 v1.4.1 v1.2.43 v1.4.2 v1.4.4 v1.2.44 v1.4.6beta06 v1.5.1 v1.5.4 v1.5.7 v1.2.46 v1.4.8 v1.2.49 v1.5.9 v1.5.10 v1.5.12 v1.5.13
Vulnerabilities reported No defects reported
Vulnerability of 50 OSS Projects Using libpng
Result from Google Code Search and Koders
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experience and Concern
Mining source code repositories, e.g.,
SourceForge, Github, Open Hub, Google Code, Marven, ...
– Outcomes heavily depend on repository contents – Aren't we mining a small world? – There may be many other source code contents in the universe
(BlackDuck)
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Whole Software Universe 𝑉
- Whole Software Universe
𝑉 ≡ 𝐷𝑝𝑚𝑚𝑓𝑑𝑢𝑗𝑝𝑜 𝑝𝑔 𝐵𝑚𝑚 𝑇𝑝𝑔𝑢𝑥𝑏𝑠𝑓 𝐸𝑓𝑤𝑓𝑚𝑝𝑞𝑓𝑒 𝑐𝑧 𝐼𝑣𝑛𝑏𝑜 𝑗𝑜 𝑢ℎ𝑓 𝑄𝑏𝑡𝑢
– Open source software – Personally-developed software – Proprietary software ... any others
- 𝑄 : Set of all meaningful software
(a countable infinite set)
- 𝑉 ⊆ 𝑄
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Questions for 𝑉
A) How do we get 𝑉? B) What do we mine from 𝑉? C) How do we mine 𝑉? D) Why do we mine 𝑉?
?
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
A) How Do We Get 𝑉?
- No one knows actual 𝑉
- So we would collect many repositories, and
construct a subset 𝑉′ ⊆ 𝑉
- 𝑉′ should be as large as possible, of course
- 𝑉′ should reflect characteristics of 𝑉
- Challenges
– Collecting and unifying different repositories into 𝑉′
- Duplication, coherence, ...
– Performance and capacity for 𝑉′ – Updating and maintaining 𝑉′
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
B) What Do We Mine from 𝑉?
Examples
- Simple metrics of 𝑉 over history
– Size 𝑉 𝑢1,|𝑉|𝑢2,… – Language usage …
- Density of 𝑉 with respect to 𝑄
- History and evolution of code 𝑑 in 𝑉
– Origin version of 𝑑 – Closely related code 𝑑′ (clone, variation, family, ...) – Future prediction for 𝑑
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
C) How Do We Mine 𝑉 (𝑉′)?
- 1. Direct mining
– Good model – Powerful machine
- 2. Indirect mining
– Use external services – Reconstruct mining result from those external services
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Direct Mining
𝑽
𝑽′
Copy of 𝑽′
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Indirect Mining
𝑽
𝑽′
Query Decomposition and Result Composition
Mashup Engine
Want to know about 𝑽′
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
D) Why Do We Mine 𝑉?
Objectives of mining 𝑉
- Reuse and knowledge transfer
– We do not want to reinvent the wheel
- Historical Archive
– Frontier's wisdom
...
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Discussion!
- Is it interesting research topics?
- Can we get useful research results?
- Is it feasible research target?
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Thank you
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University