apt-xapian-index Everything You Always Wanted to Index About Debian - - PowerPoint PPT Presentation

apt xapian index
SMART_READER_LITE
LIVE PREVIEW

apt-xapian-index Everything You Always Wanted to Index About Debian - - PowerPoint PPT Presentation

Introduction The solution Interesting bits still to figure out HELP! apt-xapian-index Everything You Always Wanted to Index About Debian Packages, But were Afraid to Ask Enrico Zini enrico@debian.org 23 February 2008 Enrico Zini


slide-1
SLIDE 1

Introduction The solution Interesting bits still to figure out HELP!

apt-xapian-index

Everything You Always Wanted to Index About Debian Packages, But were Afraid to Ask Enrico Zini enrico@debian.org 23 February 2008

Enrico Zini enrico@debian.org apt-xapian-index

slide-2
SLIDE 2

Introduction The solution Interesting bits still to figure out HELP! Please help me with the notes Introduction

Outline

1

Introduction Please help me with the notes Introduction

2

The solution A tour of apt-xapian-index Code examples

3

Interesting bits still to figure out

4

HELP!

Enrico Zini enrico@debian.org apt-xapian-index

slide-3
SLIDE 3

Introduction The solution Interesting bits still to figure out HELP! Please help me with the notes Introduction

Please help me with the notes

1

apt-get install gobby

2

Run gobby

3

Connect to the session at 192.168.42.217, port 6522, password enrico

4

Join document notes.txt

Enrico Zini enrico@debian.org apt-xapian-index

slide-4
SLIDE 4

Introduction The solution Interesting bits still to figure out HELP! Please help me with the notes Introduction

Outline

1

Introduction Please help me with the notes Introduction

2

The solution A tour of apt-xapian-index Code examples

3

Interesting bits still to figure out

4

HELP!

Enrico Zini enrico@debian.org apt-xapian-index

slide-5
SLIDE 5

Introduction The solution Interesting bits still to figure out HELP! Please help me with the notes Introduction

The problem

What I want to see happening Build smart interfaces to browse the large Debian archive. The first problem I think needs solving: The only fast package index we have at the moment is APT The task of the APT index is to solve dependencies APT shouldn’t be expanded (bloated) to do much more Solution: create another index to complement APT

Enrico Zini enrico@debian.org apt-xapian-index

slide-6
SLIDE 6

Introduction The solution Interesting bits still to figure out HELP! Please help me with the notes Introduction

What the new index should have

Fast full text searches Fast tag searches Extensible, to accomodate new ideas for data to index

Enrico Zini enrico@debian.org apt-xapian-index

slide-7
SLIDE 7

Introduction The solution Interesting bits still to figure out HELP! A tour of apt-xapian-index Code examples

Outline

1

Introduction Please help me with the notes Introduction

2

The solution A tour of apt-xapian-index Code examples

3

Interesting bits still to figure out

4

HELP!

Enrico Zini enrico@debian.org apt-xapian-index

slide-8
SLIDE 8

Introduction The solution Interesting bits still to figure out HELP! A tour of apt-xapian-index Code examples

A tour of apt-xapian-index

The technology

Sits in /var/lib/apt-xapian/index Based on Xapian

Indexes text as well as numbers and dates Decent bindings in all sorts of languages Stretchable and abusable by great lengths

Self documented in /var/lib/apt-xapian-index/README

Enrico Zini enrico@debian.org apt-xapian-index

slide-9
SLIDE 9

Introduction The solution Interesting bits still to figure out HELP! A tour of apt-xapian-index Code examples

A tour of apt-xapian-index

Indexing

Done by /usr/sbin/update-apt-xapian-index Can be run interactively Runs in a weekly cron job Packages can inject extra data by adding plugins in /usr/share/apt-xapian-index/plugins

Enrico Zini enrico@debian.org apt-xapian-index

slide-10
SLIDE 10

Introduction The solution Interesting bits still to figure out HELP! A tour of apt-xapian-index Code examples

A tour of apt-xapian-index

Searching

You just need the plain Xapian API /var/lib/apt-xapian-index/README documents the index layout

Enrico Zini enrico@debian.org apt-xapian-index

slide-11
SLIDE 11

Introduction The solution Interesting bits still to figure out HELP! A tour of apt-xapian-index Code examples

Tools using it

goplay (golearn, goadmin, . . . ) debtags.debian.net (just started)

Enrico Zini enrico@debian.org apt-xapian-index

slide-12
SLIDE 12

Introduction The solution Interesting bits still to figure out HELP! A tour of apt-xapian-index Code examples

Outline

1

Introduction Please help me with the notes Introduction

2

The solution A tour of apt-xapian-index Code examples

3

Interesting bits still to figure out

4

HELP!

Enrico Zini enrico@debian.org apt-xapian-index

slide-13
SLIDE 13

Introduction The solution Interesting bits still to figure out HELP! A tour of apt-xapian-index Code examples

This page is sneakily left blank to divert your attention elsewhere.

Enrico Zini enrico@debian.org apt-xapian-index

slide-14
SLIDE 14

Introduction The solution Interesting bits still to figure out HELP!

Getting more data into the system

My proposal

One package per dataset to get Ship a copy of the dataset in the package, to use if everything fails A tool that can be run to fetch the data, or A plugin system to fetch the data using a single tool instead? Download new versions using a cron job Provide the data somewhere under /var Add an apt-xapian-index plugin to index it For example: popcon, bts statistics, iterating.org

Enrico Zini enrico@debian.org apt-xapian-index

slide-15
SLIDE 15

Introduction The solution Interesting bits still to figure out HELP!

More indexing ideas

Debian specific stemming

“libfoo” becomes “library” and “foo”; “debfoo” becomes “debian” and “foo” “cvsdelta”, “cvsgraph”, “gnomecatalog”, “gnomeradio”, “gnusomething” (but not “gnustep”), “kdesomething”... More generally, how to index “Rindfleischetiket- tierungsüberwachungsaufgabenübertragungsgesetz”? How to provide the same stemming algorithm at query time? Compensate with improved descriptions?

Enrico Zini enrico@debian.org apt-xapian-index

slide-16
SLIDE 16

Introduction The solution Interesting bits still to figure out HELP!

More indexing ideas

What else to index?

popcon bts statistics iterating.com more ideas?

Enrico Zini enrico@debian.org apt-xapian-index

slide-17
SLIDE 17

Introduction The solution Interesting bits still to figure out HELP!

i18n

How about searching translated descriptions?

Xapian already supports stemming for many languages Is it useful, with such short descriptions? One index per language? How about disk space, and indexing time?

Enrico Zini enrico@debian.org apt-xapian-index

slide-18
SLIDE 18

Introduction The solution Interesting bits still to figure out HELP!

Index update

Can it be improved?

Incremental updates

Need to track what’s new after an apt-get update Increases index size

Suid update script to run goplay right after installing it

Enrico Zini enrico@debian.org apt-xapian-index