ultimate debian database
play

Ultimate Debian Database Lucas Nussbaum Debconf16 Lucas Nussbaum - PowerPoint PPT Presentation

Ultimate Debian Database Lucas Nussbaum Debconf16 Lucas Nussbaum Ultimate Debian Database 1 / 15 Debian: the data hell A lot of different sources of data in Debian With different data formats: text files, BerkeleyDB, SQL databases, JSON,


  1. Ultimate Debian Database Lucas Nussbaum Debconf’16 Lucas Nussbaum Ultimate Debian Database 1 / 15

  2. Debian: the data hell A lot of different sources of data in Debian With different data formats: text files, BerkeleyDB, SQL databases, JSON, YAML ... Need to combine them all: Mainly for Quality Assurance, e.g: ◮ Packages of priority ≥ standard with RC bugs? ◮ Maintainers with lots of outdated/buggy packages? Lucas Nussbaum Ultimate Debian Database 2 / 15

  3. Ultimate Debian Database Idea: ◮ Import all the data in a single (Postgre)SQL DB ◮ Easier to query (relatively well-known interface) ◮ The proper way of joining data together ◮ No need to write problem-specific scripts Lucas Nussbaum Ultimate Debian Database 3 / 15

  4. History Started as a Google Summer of Code project in 2008 Student: Christian von Essen (Neronus) Mentors: ◮ Lucas Nussbaum (lucas) ◮ Marc Brockschmidt (HE) ◮ Stefano Zacchiroli (zack) Lucas Nussbaum Ultimate Debian Database 4 / 15

  5. Design choices Not problem-specific, no typical queries (not projectb !) Schema: ◮ Typical user == human ◮ Make it easy to write/run queries ◮ Performance? important, but not a critical goal ◮ No surrogate keys Lucas Nussbaum Ultimate Debian Database 5 / 15

  6. Surrogate key ◮ Unique identifier (usually integer) ◮ Used as primary key ◮ Not derived from any application data packages ( package_id , package_name, ...) MySQL: AUTO_INCREMENT PostgreSQL: serial Has both advantages and disadvantages Details: http://en.wikipedia.org/wiki/Surrogate_key Lucas Nussbaum Ultimate Debian Database 6 / 15

  7. Design choices (2) Data: ◮ Correctness is critical ◮ Partial updates? Often difficult/risky ◮ Solution: complete data reloads (for most importers) � Using transactions to avoid temporary unavailability Lucas Nussbaum Ultimate Debian Database 7 / 15

  8. Design choices (3) Debian is inconsistent ◮ What does "package" mean? Inconsistency can be interesting for QA → Keep inconsistency in UDD → No foreign keys between data sources → Some VIEWs that hide inconsistency Lucas Nussbaum Ultimate Debian Database 8 / 15

  9. Current status ◮ Running as udd.debian.org ◮ Uses PostgreSQL 9.4 ◮ You can connect from {qa,alioth}.d.o using: psql service=udd ◮ Even non-DDs can connect! More info: http://wiki.debian.org/UltimateDebianDatabase Lucas Nussbaum Ultimate Debian Database 9 / 15

  10. What’s in it? ◮ Main stuff: ◮ Additional stuff: � Sources and Packages � Migrations to testing � History of uploads � Bugs (including archived � FTPmasters NEW, removals, bugs) autorejects, deferred queue ◮ Identities: � Debtags � DDTP (translation status) � Carnivore � Popularity contest � Debian LDAP (restricted to � Orphaned packages DDs) � Upstream status ( uscan ) � Debian Maintainers � Screenshots � Lintian � PTS subscriptions (OLD) � Reproducible builds ◮ Derivatives: � security tracker � VCSwatch � Sources/Packages for � Mentors Ubuntu, Aptosid � CI � Ubuntu bugs � Duck � Ubuntu upload history � Key packages � Release team hints � Ubuntu popcon � wanna-build � Ubuntu lintian � testing auto-removals Lucas Nussbaum Ultimate Debian Database 10 / 15

  11. So, what can we find out about Debian using UDD? Lucas Nussbaum Ultimate Debian Database 11 / 15

  12. Number of different lintian errors or warnings select package, count(distinct tag) as cnt from lintian where tag_type in(’error’,’warning’) group by package order by cnt desc limit 15; Lucas Nussbaum Ultimate Debian Database 12 / 15

  13. Number of different lintian errors or warnings select package, count(distinct tag) as cnt from lintian where tag_type in(’error’,’warning’) group by package order by cnt desc limit 15; package count muddleftpd 19 cyclades-serial-client 18 lpr 18 spellcast 17 bonnie++ 16 vflib3 16 scalapack-doc 15 jbofihe 15 netmaze 15 bookview 15 jailer 15 debbugs 15 Lucas Nussbaum Ultimate Debian Database 12 / 15

  14. Who uploaded sid’s packages? select changed_by, count(*) from sources s, upload_history uh where s.source = uh.source and s.version = uh.version and s.distribution=’debian’ and s.release = ’sid’ group by changed_by order by count desc limit 8; Lucas Nussbaum Ultimate Debian Database 13 / 15

  15. Who uploaded sid’s packages? select changed_by, count(*) from sources s, upload_history uh where s.source = uh.source and s.version = uh.version and s.distribution=’debian’ and s.release = ’sid’ group by changed_by order by count desc limit 8; changed_by count gregor herrmann <gregoa@debian.org> 1324 Clint Adams <clint@debian.org> 527 Andreas Tille <tille@debian.org> 512 Emmanuel Bourg <ebourg@apache.org> 462 Maximiliano Curia <maxy@debian.org> 383 Thomas Goirand <zigo@debian.org> 382 Jonas Smedegaard <dr@jones.dk> 374 Joachim Breitner <nomeata@debian.org> 296 Lucas Nussbaum Ultimate Debian Database 13 / 15

  16. UDD Dashboards ◮ UDD bugs search https://udd.debian.org/bugs/ (example) ◮ UDD Maintainer Dashboard https://udd.debian.org/dmd/ (example) ◮ UDD Bapase (search for "interesting" packages) Lucas Nussbaum Ultimate Debian Database 14 / 15

  17. Things to improve ◮ Mostly a one-person project � Good starting point: debug importers failures https://udd.debian.org/udd-status.cgi ◮ No real good development environment � Vagrant environment now available ◮ More collaboration with DDPO and Tracker � Hard to share data inside Debian infrastructure Contact: #debian-qa or debian-qa@l.d.o http://wiki.debian.org/UltimateDebianDatabase Lucas Nussbaum Ultimate Debian Database 15 / 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend