c r e a t i n g d i c t i o n a r i e s f o r a p a c h e

C r e a t i n g d i c t i o n a r i e s f o r - PowerPoint PPT Presentation

C r e a t i n g d i c t i o n a r i e s f o r A p a c h e O p e n O f f i c e a n d m a i n t a i n i n g t h e m t h r o u g h w e b s e r v i c e s Andrea Pescetti


  1. C r e a t i n g d i c t i o n a r i e s f o r A p a c h e O p e n O f f i c e a n d m a i n t a i n i n g t h e m t h r o u g h w e b s e r v i c e s Andrea Pescetti pescetti@apache.org

  2. A n d r e a P e s c e t t i • VP, Apache OpenOffice • Unaffiliated volunteer • Dictionary packager • Day job: web developer

  3. Getting Started Getting Started Getting Started Getting Started Andrea Pescetti:

  4. O p e n O f f i c e L a n g u a g e S u p p o r t $ svn ls https://svn.apache.org/repos/a sf/openoffice/trunk/extras/l10 n/source/ | grep -c / 112

  5. W r i t i n g A i d s : A n O v e r v i e w • Spell checker • Thesaurus • Hyphenation Patterns • Grammar Checker

  6. S p e l l C h e c k e r • Engine: Hunspell, integrated in OpenOffice. • Hunspell dictionaries available for 100+ languages. • http://hunspell.sf.net

  7. T h e s a u r u s • Engine: integrated. • OpenOffice-specific format. • Must start from scratch. • lingucomponent.openoffice.org

  8. H y p h e n a t i o n P a t t e r n s • Engine: Hyphen, from Hunspell. • Integrated in OpenOffice. • Format: tool-specific. • But you can convert TeX patterns: http://ctan.org/

  9. G r a m m a r C h e c k e r • Available only as API. • Options as extensions : LanguageTool, LightProof, CoGrOO and more. • Format: tool-dependent.

  10. Licensing Issues Licensing Issues Licensing Issues Licensing Issues Andrea Pescetti:

  11. M e r e A g g r e g a t i o n • Crazy variety of licenses. • Many incompatible with AL2. • But bundling is allowed: “mere aggregation”, LEGAL-117

  12. E x t e n s i o n s ( O X T ) • Writing Aids are now extensions (XML+data+ZIP) • Hosted anywhere, bundled at build time. • Reinforces “mere aggregation”.

  13. C h o o s e y o u r l i c e n s e • AL2 : Apache License, free and permissive, GPLv3 compatible. • LGPLv3/GPLv3 : can be used through mere aggregation. • AGPLv3 : untested so far, but likely mere aggregation too.

  14. ( D o n ' t ) M e e t A p a c h e L e g a l • Extensions are externally hosted • extensions.openoffice.org considered external too. • No paperwork needed!

  15. Distributed Management Distributed Management Distributed Management Distributed Management Andrea Pescetti:

  16. U s e a r e p o s i t o r y • Make sources available in an online repository. • Use version control . • Expose a web-based change tracking interface.

  17. S p e l l C h e c k e r • One file in text format. • Human readable, except rules. • Good for collaborative editing.

  18. S p e l l C h e c k e r : e x a m p l e

  19. T h e s a u r u s • One file in text format. • A generated index. • Human readable. • Good for collaborative editing.

  20. T h e s a u r u s : e x a m p l e

  21. H y p h e n a t i o n • One text file. • Format: less readable than Perl! • Changes very rarely. • Fix bugs upstream, in TeX.

  22. H y p h e n a t i o n : e x a m p l e

  23. G r a m m a r c h e c k e r • LanguageTool: rules in XML. • Fix upstream, in LanguageTool. • Collaboration possible.

  24. G r a m m a r c h e c k e r : e x a m p l e

  25. P a c k a g i n g • Generation of the OXT extension is scriptable. • Post-commit hook possible. • Keep generated OXT files in the same repository.

  26. T e a m S t r u c t u r e • Collaboration possible in every component. • A script to package the extension. • A release manager to make stable versions available.

  27. Community Involvement Community Involvement Community Involvement Community Involvement Andrea Pescetti:

  28. G o i n g 2 . 0 • Native-lang community: best people to improve N-L tools. • Motivated users, interested in improving OpenOffice. • Issue: providing efficient infrastructure.

  29. W e b - b a s e d i n t e r f a c e • An idea from OOoCon 2010. • Report missing or erroneous words from within OpenOffice. • Easy to setup as web service. • Notifications: e-mail to maintainers, suggestions in DB.

  30. W e b b a s e d i n t e r f a c e : e x a m p l e

  31. E x p o s e w e b s e r v i c e s • Direct usage of the web application via browser. • Access available through web services too. • Suitable for applications or macros.

  32. W e b s e r v i c e s i n O X T • Embed a macro in the OXT dictionary package. • Right-click on a word: • Nominate for inclusion in dictionary • Nominate for removal from dictionary • Report wrong hyphenation

  33. T h e s a u r u s m a i n t e n a n c e • Vithesaurus: free online tool for collaboratively creating and maintaining a thesaurus. • In use (German) at http://www.openthesaurus.de • https://github.com/danielnaber

  34. H a n d l i n g D u p l i c a t e s • Millions of users can lead to duplicate reports. • But it's a plus: use frequency for ranking.

  35. H a n d l i n g W r o n g R e p o r t s • Annoying: users make some wrong suggestions and repeat them! • The web application supports “motivated blacklisting”: repeated wrong submissions are handled and a message can be shown to the user.

  36. T h a n k s f o r a t t e n t i o n Andrea Pescetti pescetti@apache.org www.openoffice.org Image credits: Flickr, PLIO Archives.

Recommend


More recommend