Use of Grid Computing for Debian Quality Assurance Lucas Nussbaum - - PowerPoint PPT Presentation

use of grid computing for debian quality assurance
SMART_READER_LITE
LIVE PREVIEW

Use of Grid Computing for Debian Quality Assurance Lucas Nussbaum - - PowerPoint PPT Presentation

Introduction QA tasks Infrastructure Results Future Work Conclusion Use of Grid Computing for Debian Quality Assurance Lucas Nussbaum lucas@debian.org lucas.nussbaum@imag.fr Laboratoire dInformatique de Grenoble - Projet MESCAL


slide-1
SLIDE 1

Introduction QA tasks Infrastructure Results Future Work Conclusion

Use of Grid Computing for Debian Quality Assurance

Lucas Nussbaum lucas@debian.org – lucas.nussbaum@imag.fr

Laboratoire d’Informatique de Grenoble - Projet MESCAL

Lucas Nussbaum Use of Grid Computing for Debian QA 1 / 31

slide-2
SLIDE 2

Introduction QA tasks Infrastructure Results Future Work Conclusion

Summary

1

Introduction

2

QA tasks

3

Infrastructure

4

Results

5

Future Work

6

Conclusion

Lucas Nussbaum Use of Grid Computing for Debian QA 2 / 31

slide-3
SLIDE 3

Introduction QA tasks Infrastructure Results Future Work Conclusion QA in Debian Grid’5000

Summary

1

Introduction Quality Assurance in Debian Grid’5000

2

QA tasks

3

Infrastructure

4

Results

5

Future Work

6

Conclusion

Lucas Nussbaum Use of Grid Computing for Debian QA 3 / 31

slide-4
SLIDE 4

Introduction QA tasks Infrastructure Results Future Work Conclusion QA in Debian Grid’5000

Quality Assurance in Debian

Debian : the largest volunteer-based GNU/Linux distribution renowned for its quality QA in general plays an crucial role : to ensure a minimal quality level for all packages to track not-so-well maintained packages ...

Lucas Nussbaum Use of Grid Computing for Debian QA 4 / 31

slide-5
SLIDE 5

Introduction QA tasks Infrastructure Results Future Work Conclusion QA in Debian Grid’5000

Quality Assurance in Debian (2)

But some QA tasks require a lot of computing power e.g rebuilding all packages in Debian : about 10 days on a single computer Difficult to perform by volunteers who pay their electricity bills, especially on a regular basis.

Lucas Nussbaum Use of Grid Computing for Debian QA 5 / 31

slide-6
SLIDE 6

Introduction QA tasks Infrastructure Results Future Work Conclusion QA in Debian Grid’5000

Grid’5000

aims at building an highly reconfigurable, controlable and monitorable experimental grid dedicated to computer science research funded by french ministry of research, INRIA, CNRS, ACI Grid, and other public organizations gathers 1200 compute nodes (2500 CPUs) in 13 clusters typical node : Dual-Opteron 2 Ghz, 2 Gb of RAM high speed network (10GbE) free time-slots during nights and week-ends

Lucas Nussbaum Use of Grid Computing for Debian QA 6 / 31

slide-7
SLIDE 7

Introduction QA tasks Infrastructure Results Future Work Conclusion QA in Debian Grid’5000

Grid’5000 (2)

Lucas Nussbaum Use of Grid Computing for Debian QA 7 / 31

slide-8
SLIDE 8

Introduction QA tasks Infrastructure Results Future Work Conclusion QA in Debian Grid’5000

Grid’5000 (3)

Lucas Nussbaum Use of Grid Computing for Debian QA 8 / 31

slide-9
SLIDE 9

Introduction QA tasks Infrastructure Results Future Work Conclusion QA in Debian Grid’5000

(Obvious) idea : use Grid’5000 to work on Debian QA

Which tests are suitable ? With which infrastructure ?

Lucas Nussbaum Use of Grid Computing for Debian QA 9 / 31

slide-10
SLIDE 10

Introduction QA tasks Infrastructure Results Future Work Conclusion Overview Rebuilding packages Installation testing

Summary

1

Introduction

2

QA tasks Overview Rebuilding packages Installation testing using piuparts

3

Infrastructure

4

Results

5

Future Work

6

Conclusion

Lucas Nussbaum Use of Grid Computing for Debian QA 10 / 31

slide-11
SLIDE 11

Introduction QA tasks Infrastructure Results Future Work Conclusion Overview Rebuilding packages Installation testing

QA tasks performed on Grid’5000

Ideal task : consumes a lot of time can be distributed over a lot of nodes doesn’t generate too many false positives would improve Debian quality Two different tasks performed on Grid’5000 : Rebuild of all packages in Debian Installation and removal testing using Piuparts

Lucas Nussbaum Use of Grid Computing for Debian QA 11 / 31

slide-12
SLIDE 12

Introduction QA tasks Infrastructure Results Future Work Conclusion Overview Rebuilding packages Installation testing

Rebuilding all packages in Debian

Arch :all packages are only built on the developer’s machine Arch :any packages are only built automatically before they reach unstable After that, the build environment changes : newer/older compiler and libraries build-dependencies removed Not tested automatically, but important for the release : Etch must be self-contained (think of security upgrades !) Easy to distribute (build in parallel)

Lucas Nussbaum Use of Grid Computing for Debian QA 12 / 31

slide-13
SLIDE 13

Introduction QA tasks Infrastructure Results Future Work Conclusion Overview Rebuilding packages Installation testing

Installation and Removal testing

installability can be tested statically (see debcheck, edos-debcheck) But packages have maintainer scripts : executed during package installation and removal to configure stuff, start services helper scripts exist (debconf, update-{rc.d,modules,inetd}) lots of bugs : missing dependencies, shell scripting mistakes, etc

Lucas Nussbaum Use of Grid Computing for Debian QA 13 / 31

slide-14
SLIDE 14

Introduction QA tasks Infrastructure Results Future Work Conclusion Overview Rebuilding packages Installation testing

Installation and Removal testing (2)

piuparts automatically : installs packages in a near-empty chroot remove it remove as many packages as possible purges it ⇒ most extreme test for maintainer scripts But quite a lot of false positives : packages that prompt without debconf packages that depend on a DBMS (mysqld,...) Easy to distribute (test packages in parallel)

Lucas Nussbaum Use of Grid Computing for Debian QA 14 / 31

slide-15
SLIDE 15

Introduction QA tasks Infrastructure Results Future Work Conclusion Principles Architecture Typical job

Summary

1

Introduction

2

QA tasks

3

Infrastructure Principles Architecture Typical job

4

Results

5

Future Work

6

Conclusion

Lucas Nussbaum Use of Grid Computing for Debian QA 15 / 31

slide-16
SLIDE 16

Introduction QA tasks Infrastructure Results Future Work Conclusion Principles Architecture Typical job

Infrastructure for QA tests on Grid’5000

Principles

connection to Grid’5000 nodes via SSH

  • ne task per node (easier to manage)

simple master/slave architecture

Lucas Nussbaum Use of Grid Computing for Debian QA 16 / 31

slide-17
SLIDE 17

Introduction QA tasks Infrastructure Results Future Work Conclusion Principles Architecture Typical job

Infrastructure for QA tests on Grid’5000

Architecture

3 central points : Master node that schedules jobs Shared NFS directory to write results Internal Debian mirror

Node 2 Node n Node 1 Directory Shared NFS Mirror Debian Master Node

....

Lucas Nussbaum Use of Grid Computing for Debian QA 17 / 31

slide-18
SLIDE 18

Introduction QA tasks Infrastructure Results Future Work Conclusion Principles Architecture Typical job

Infrastructure for QA tests on Grid’5000

Typical job (piuparts test)

55 nodes are reserved ; deployment of a Debian Sid environment using Kadeploy is started. After 12 minutes : environment deployed on 43 nodes. First node is used as master node : Prepares the other nodes (install required packages, etc) Locally updates the chroots Script responsible for controlling the other nodes is started After 2 minutes, preparation is finished : master nodes starts to schedule jobs on the other nodes. After 3 hours and 46 minutes, the 18156 packages in etch have been tested

Lucas Nussbaum Use of Grid Computing for Debian QA 18 / 31

slide-19
SLIDE 19

Introduction QA tasks Infrastructure Results Future Work Conclusion Grid’5000 bugs Debian Bug reports Speed-up

Summary

1

Introduction

2

QA tasks

3

Infrastructure

4

Results Grid’5000 bugs Debian Bug reports Speed-up

5

Future Work

6

Conclusion

Lucas Nussbaum Use of Grid Computing for Debian QA 19 / 31

slide-20
SLIDE 20

Introduction QA tasks Infrastructure Results Future Work Conclusion Grid’5000 bugs Debian Bug reports Speed-up

Results - Grid’5000 bugs

Those experiments allowed to find a few important problems on Grid’5000 : misconfigurations, performance problems, etc. In the future, it will serve as a testcase to validate extensions to the platform

Lucas Nussbaum Use of Grid Computing for Debian QA 20 / 31

slide-21
SLIDE 21

Introduction QA tasks Infrastructure Results Future Work Conclusion Grid’5000 bugs Debian Bug reports Speed-up

Results - Debian Bug Reports

About 200 RC bugs found (and fixed) in Debian Etch about 100 from rebuilds about 100 from piuparts testing Efforts welcomed by a majority of developers (but not all :-)

Lucas Nussbaum Use of Grid Computing for Debian QA 21 / 31

slide-22
SLIDE 22

Introduction QA tasks Infrastructure Results Future Work Conclusion Grid’5000 bugs Debian Bug reports Speed-up

Results - speed-up

Rebuilding the 10217 packages in Debian Etch : about 10 days on a single computer ⇒ about 7.5 hours on Grid’5000 Testing the 18153 binary packages in etch : about 5 days on a single computer ⇒ about 3 hours and 46 minutes on Grid’5000

Lucas Nussbaum Use of Grid Computing for Debian QA 22 / 31

slide-23
SLIDE 23

Introduction QA tasks Infrastructure Results Future Work Conclusion Overview Rebuild speed-up Improving the log reviewing

Summary

1

Introduction

2

QA tasks

3

Infrastructure

4

Results

5

Future Work Overview Rebuild speed-up Improving the log reviewing

6

Conclusion

Lucas Nussbaum Use of Grid Computing for Debian QA 23 / 31

slide-24
SLIDE 24

Introduction QA tasks Infrastructure Results Future Work Conclusion Overview Rebuild speed-up Improving the log reviewing

Future Work

Improve the infrastructure : Jobs using several Grid’5000 clusters at the same time Central Debian mirror is a bottleneck ⇒ local cache on the nodes Shared NFS directory for logs is a bottleneck ⇒ try other solutions Other QA tasks (less critical ones) Increase the rebuild speed-up

Lucas Nussbaum Use of Grid Computing for Debian QA 24 / 31

slide-25
SLIDE 25

Introduction QA tasks Infrastructure Results Future Work Conclusion Overview Rebuild speed-up Improving the log reviewing

Increasing the rebuild speed-up

Most packages take a very short time to build, but a few packages take a very long time (hours)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 10 100 1000 10000 F(x) package build time (s), logarithmic scale

Lucas Nussbaum Use of Grid Computing for Debian QA 25 / 31

slide-26
SLIDE 26

Introduction QA tasks Infrastructure Results Future Work Conclusion Overview Rebuild speed-up Improving the log reviewing

Increasing the rebuild speed-up (2)

Top ten packages

Source package Time

  • penoffice.org

7 h 14 min latex-cjk-chinese-arphic 6 h 18 min linux-2.6 5 h 43 min gcc-4.1 2 h 52 min gcj-4.1 2 h 44 min gnat-4.1 1 h 52 min gcc-3.4 1 h 50 min installation-guide 1 h 45 min axiom 1 h 44 m k3d 1 h 39 min

Lucas Nussbaum Use of Grid Computing for Debian QA 26 / 31

slide-27
SLIDE 27

Introduction QA tasks Infrastructure Results Future Work Conclusion Overview Rebuild speed-up Improving the log reviewing

Increasing the rebuild speed-up (3)

Using more nodes is useless

Already scheduling longest builds first

. . .

node 40 node 39 node 37 node 38 node 1

  • penoffice.org

linux−2.6 ~ 7.5 hours

Lucas Nussbaum Use of Grid Computing for Debian QA 27 / 31

slide-28
SLIDE 28

Introduction QA tasks Infrastructure Results Future Work Conclusion Overview Rebuild speed-up Improving the log reviewing

Increasing the rebuild speed-up (4)

Possible solution : "make -j"

Grid’5000 nodes have several CPUs, but only one is used during build No standard way to tell "use more than one CPU" (Debian bug #209008) Some packages fail to build when told to use several CPUs ⇒ Possible solution :

  • nly work on the few packages that annoy us...
  • r just ignore them.

Lucas Nussbaum Use of Grid Computing for Debian QA 28 / 31

slide-29
SLIDE 29

Introduction QA tasks Infrastructure Results Future Work Conclusion Overview Rebuild speed-up Improving the log reviewing

Real bottleneck : manpower for log reviewing

So many logs, so little time... Such QA tasks were traditionnally solitaire games Sharing the load is necessary to continue on the long term

Lucas Nussbaum Use of Grid Computing for Debian QA 29 / 31

slide-30
SLIDE 30

Introduction QA tasks Infrastructure Results Future Work Conclusion

Summary

1

Introduction

2

QA tasks

3

Infrastructure

4

Results

5

Future Work

6

Conclusion

Lucas Nussbaum Use of Grid Computing for Debian QA 30 / 31

slide-31
SLIDE 31

Introduction QA tasks Infrastructure Results Future Work Conclusion

Conclusion

Grid’5000 : a really nice tool well suited to running such tasks Quality Assurance in Free Software projects : could really benefit from using such a tool needs improvement, both technically : better testing tools, less false positives also human problem : needs collaboration on reviewing generated data

Lucas Nussbaum Use of Grid Computing for Debian QA 31 / 31