C r e a t i n g d i c t i o n a r i e s f o r A p a c h e O p e n O f f i c e a n d m a i n t a i n i n g t h e m t h r o u g h w e b s e r v i c e s Andrea Pescetti pescetti@apache.org
A n d r e a P e s c e t t i • VP, Apache OpenOffice • Unaffiliated volunteer • Dictionary packager • Day job: web developer
Getting Started Getting Started Getting Started Getting Started Andrea Pescetti:
O p e n O f f i c e L a n g u a g e S u p p o r t $ svn ls https://svn.apache.org/repos/a sf/openoffice/trunk/extras/l10 n/source/ | grep -c / 112
W r i t i n g A i d s : A n O v e r v i e w • Spell checker • Thesaurus • Hyphenation Patterns • Grammar Checker
S p e l l C h e c k e r • Engine: Hunspell, integrated in OpenOffice. • Hunspell dictionaries available for 100+ languages. • http://hunspell.sf.net
T h e s a u r u s • Engine: integrated. • OpenOffice-specific format. • Must start from scratch. • lingucomponent.openoffice.org
H y p h e n a t i o n P a t t e r n s • Engine: Hyphen, from Hunspell. • Integrated in OpenOffice. • Format: tool-specific. • But you can convert TeX patterns: http://ctan.org/
G r a m m a r C h e c k e r • Available only as API. • Options as extensions : LanguageTool, LightProof, CoGrOO and more. • Format: tool-dependent.
Licensing Issues Licensing Issues Licensing Issues Licensing Issues Andrea Pescetti:
M e r e A g g r e g a t i o n • Crazy variety of licenses. • Many incompatible with AL2. • But bundling is allowed: “mere aggregation”, LEGAL-117
E x t e n s i o n s ( O X T ) • Writing Aids are now extensions (XML+data+ZIP) • Hosted anywhere, bundled at build time. • Reinforces “mere aggregation”.
C h o o s e y o u r l i c e n s e • AL2 : Apache License, free and permissive, GPLv3 compatible. • LGPLv3/GPLv3 : can be used through mere aggregation. • AGPLv3 : untested so far, but likely mere aggregation too.
( D o n ' t ) M e e t A p a c h e L e g a l • Extensions are externally hosted • extensions.openoffice.org considered external too. • No paperwork needed!
Distributed Management Distributed Management Distributed Management Distributed Management Andrea Pescetti:
U s e a r e p o s i t o r y • Make sources available in an online repository. • Use version control . • Expose a web-based change tracking interface.
S p e l l C h e c k e r • One file in text format. • Human readable, except rules. • Good for collaborative editing.
S p e l l C h e c k e r : e x a m p l e
T h e s a u r u s • One file in text format. • A generated index. • Human readable. • Good for collaborative editing.
T h e s a u r u s : e x a m p l e
H y p h e n a t i o n • One text file. • Format: less readable than Perl! • Changes very rarely. • Fix bugs upstream, in TeX.
H y p h e n a t i o n : e x a m p l e
G r a m m a r c h e c k e r • LanguageTool: rules in XML. • Fix upstream, in LanguageTool. • Collaboration possible.
G r a m m a r c h e c k e r : e x a m p l e
P a c k a g i n g • Generation of the OXT extension is scriptable. • Post-commit hook possible. • Keep generated OXT files in the same repository.
T e a m S t r u c t u r e • Collaboration possible in every component. • A script to package the extension. • A release manager to make stable versions available.
Community Involvement Community Involvement Community Involvement Community Involvement Andrea Pescetti:
G o i n g 2 . 0 • Native-lang community: best people to improve N-L tools. • Motivated users, interested in improving OpenOffice. • Issue: providing efficient infrastructure.
W e b - b a s e d i n t e r f a c e • An idea from OOoCon 2010. • Report missing or erroneous words from within OpenOffice. • Easy to setup as web service. • Notifications: e-mail to maintainers, suggestions in DB.
W e b b a s e d i n t e r f a c e : e x a m p l e
E x p o s e w e b s e r v i c e s • Direct usage of the web application via browser. • Access available through web services too. • Suitable for applications or macros.
W e b s e r v i c e s i n O X T • Embed a macro in the OXT dictionary package. • Right-click on a word: • Nominate for inclusion in dictionary • Nominate for removal from dictionary • Report wrong hyphenation
T h e s a u r u s m a i n t e n a n c e • Vithesaurus: free online tool for collaboratively creating and maintaining a thesaurus. • In use (German) at http://www.openthesaurus.de • https://github.com/danielnaber
H a n d l i n g D u p l i c a t e s • Millions of users can lead to duplicate reports. • But it's a plus: use frequency for ranking.
H a n d l i n g W r o n g R e p o r t s • Annoying: users make some wrong suggestions and repeat them! • The web application supports “motivated blacklisting”: repeated wrong submissions are handled and a message can be shown to the user.
T h a n k s f o r a t t e n t i o n Andrea Pescetti pescetti@apache.org www.openoffice.org Image credits: Flickr, PLIO Archives.
Recommend
More recommend