Ultimate Debian Database Lucas Nussbaum Debconf16 Lucas Nussbaum - - PowerPoint PPT Presentation

ultimate debian database
SMART_READER_LITE
LIVE PREVIEW

Ultimate Debian Database Lucas Nussbaum Debconf16 Lucas Nussbaum - - PowerPoint PPT Presentation

Ultimate Debian Database Lucas Nussbaum Debconf16 Lucas Nussbaum Ultimate Debian Database 1 / 15 Debian: the data hell A lot of different sources of data in Debian With different data formats: text files, BerkeleyDB, SQL databases, JSON,


slide-1
SLIDE 1

Ultimate Debian Database

Lucas Nussbaum Debconf’16

Lucas Nussbaum Ultimate Debian Database 1 / 15

slide-2
SLIDE 2

Debian: the data hell

A lot of different sources of data in Debian With different data formats: text files, BerkeleyDB, SQL databases, JSON, YAML ... Need to combine them all: Mainly for Quality Assurance, e.g:

◮ Packages of priority ≥ standard with RC bugs? ◮ Maintainers with lots of outdated/buggy packages?

Lucas Nussbaum Ultimate Debian Database 2 / 15

slide-3
SLIDE 3

Ultimate Debian Database

Idea:

◮ Import all the data in a single (Postgre)SQL DB ◮ Easier to query (relatively well-known interface) ◮ The proper way of joining data together ◮ No need to write problem-specific scripts

Lucas Nussbaum Ultimate Debian Database 3 / 15

slide-4
SLIDE 4

History

Started as a Google Summer of Code project in 2008 Student: Christian von Essen (Neronus) Mentors:

◮ Lucas Nussbaum (lucas) ◮ Marc Brockschmidt (HE) ◮ Stefano Zacchiroli (zack)

Lucas Nussbaum Ultimate Debian Database 4 / 15

slide-5
SLIDE 5

Design choices

Not problem-specific, no typical queries (not projectb!) Schema:

◮ Typical user == human ◮ Make it easy to write/run queries ◮ Performance? important, but not a critical goal ◮ No surrogate keys

Lucas Nussbaum Ultimate Debian Database 5 / 15

slide-6
SLIDE 6

Surrogate key

◮ Unique identifier (usually integer) ◮ Used as primary key ◮ Not derived from any application data

packages (package_id, package_name, ...) MySQL: AUTO_INCREMENT PostgreSQL: serial Has both advantages and disadvantages Details: http://en.wikipedia.org/wiki/Surrogate_key

Lucas Nussbaum Ultimate Debian Database 6 / 15

slide-7
SLIDE 7

Design choices (2)

Data:

◮ Correctness is critical ◮ Partial updates? Often difficult/risky ◮ Solution: complete data reloads (for most importers) Using transactions to avoid temporary unavailability

Lucas Nussbaum Ultimate Debian Database 7 / 15

slide-8
SLIDE 8

Design choices (3)

Debian is inconsistent

◮ What does "package" mean?

Inconsistency can be interesting for QA → Keep inconsistency in UDD → No foreign keys between data sources → Some VIEWs that hide inconsistency

Lucas Nussbaum Ultimate Debian Database 8 / 15

slide-9
SLIDE 9

Current status

◮ Running as udd.debian.org ◮ Uses PostgreSQL 9.4 ◮ You can connect from {qa,alioth}.d.o using: psql service=udd ◮ Even non-DDs can connect!

More info: http://wiki.debian.org/UltimateDebianDatabase

Lucas Nussbaum Ultimate Debian Database 9 / 15

slide-10
SLIDE 10

What’s in it?

◮ Main stuff: Sources and Packages Bugs (including archived

bugs)

◮ Identities: Carnivore Debian LDAP (restricted to

DDs)

Debian Maintainers PTS subscriptions (OLD) ◮ Derivatives: Sources/Packages for

Ubuntu, Aptosid

Ubuntu bugs Ubuntu upload history Ubuntu popcon Ubuntu lintian ◮ Additional stuff:

  • Migrations to testing
  • History of uploads
  • FTPmasters NEW, removals,

autorejects, deferred queue

  • Debtags
  • DDTP (translation status)
  • Popularity contest
  • Orphaned packages
  • Upstream status (uscan)
  • Screenshots
  • Lintian
  • Reproducible builds
  • security tracker
  • VCSwatch
  • Mentors
  • CI
  • Duck
  • Key packages
  • Release team hints
  • wanna-build
  • testing auto-removals

Lucas Nussbaum Ultimate Debian Database 10 / 15

slide-11
SLIDE 11

So, what can we find out about Debian using UDD?

Lucas Nussbaum Ultimate Debian Database 11 / 15

slide-12
SLIDE 12

Number of different lintian errors or warnings

select package, count(distinct tag) as cnt from lintian where tag_type in(’error’,’warning’) group by package

  • rder by cnt desc limit 15;

Lucas Nussbaum Ultimate Debian Database 12 / 15

slide-13
SLIDE 13

Number of different lintian errors or warnings

select package, count(distinct tag) as cnt from lintian where tag_type in(’error’,’warning’) group by package

  • rder by cnt desc limit 15;

package count muddleftpd 19 cyclades-serial-client 18 lpr 18 spellcast 17 bonnie++ 16 vflib3 16 scalapack-doc 15 jbofihe 15 netmaze 15 bookview 15 jailer 15 debbugs 15

Lucas Nussbaum Ultimate Debian Database 12 / 15

slide-14
SLIDE 14

Who uploaded sid’s packages?

select changed_by, count(*) from sources s, upload_history uh where s.source = uh.source and s.version = uh.version and s.distribution=’debian’ and s.release = ’sid’ group by changed_by order by count desc limit 8;

Lucas Nussbaum Ultimate Debian Database 13 / 15

slide-15
SLIDE 15

Who uploaded sid’s packages?

select changed_by, count(*) from sources s, upload_history uh where s.source = uh.source and s.version = uh.version and s.distribution=’debian’ and s.release = ’sid’ group by changed_by order by count desc limit 8;

changed_by count gregor herrmann <gregoa@debian.org> 1324 Clint Adams <clint@debian.org> 527 Andreas Tille <tille@debian.org> 512 Emmanuel Bourg <ebourg@apache.org> 462 Maximiliano Curia <maxy@debian.org> 383 Thomas Goirand <zigo@debian.org> 382 Jonas Smedegaard <dr@jones.dk> 374 Joachim Breitner <nomeata@debian.org> 296

Lucas Nussbaum Ultimate Debian Database 13 / 15

slide-16
SLIDE 16

UDD Dashboards

◮ UDD bugs search

https://udd.debian.org/bugs/ (example)

◮ UDD Maintainer Dashboard

https://udd.debian.org/dmd/ (example)

◮ UDD Bapase (search for "interesting" packages)

Lucas Nussbaum Ultimate Debian Database 14 / 15

slide-17
SLIDE 17

Things to improve

◮ Mostly a one-person project Good starting point: debug importers failures

https://udd.debian.org/udd-status.cgi

◮ No real good development environment Vagrant environment now available ◮ More collaboration with DDPO and Tracker Hard to share data inside Debian infrastructure

Contact: #debian-qa or debian-qa@l.d.o

http://wiki.debian.org/UltimateDebianDatabase

Lucas Nussbaum Ultimate Debian Database 15 / 15