c r e a t i n g d i c t i o n a r i e s f o r a p a c h e
play

C r e a t i n g d i c t i o n a r i e s f o r - PowerPoint PPT Presentation

C r e a t i n g d i c t i o n a r i e s f o r A p a c h e O p e n O f f i c e a n d m a i n t a i n i n g t h e m t h r o u g h w e b s e r v i c e s Andrea Pescetti


  1. C r e a t i n g d i c t i o n a r i e s f o r A p a c h e O p e n O f f i c e a n d m a i n t a i n i n g t h e m t h r o u g h w e b s e r v i c e s Andrea Pescetti pescetti@apache.org

  2. A n d r e a P e s c e t t i • VP, Apache OpenOffice • Unaffiliated volunteer • Dictionary packager • Day job: web developer

  3. Getting Started Getting Started Getting Started Getting Started Andrea Pescetti:

  4. O p e n O f f i c e L a n g u a g e S u p p o r t $ svn ls https://svn.apache.org/repos/a sf/openoffice/trunk/extras/l10 n/source/ | grep -c / 112

  5. W r i t i n g A i d s : A n O v e r v i e w • Spell checker • Thesaurus • Hyphenation Patterns • Grammar Checker

  6. S p e l l C h e c k e r • Engine: Hunspell, integrated in OpenOffice. • Hunspell dictionaries available for 100+ languages. • http://hunspell.sf.net

  7. T h e s a u r u s • Engine: integrated. • OpenOffice-specific format. • Must start from scratch. • lingucomponent.openoffice.org

  8. H y p h e n a t i o n P a t t e r n s • Engine: Hyphen, from Hunspell. • Integrated in OpenOffice. • Format: tool-specific. • But you can convert TeX patterns: http://ctan.org/

  9. G r a m m a r C h e c k e r • Available only as API. • Options as extensions : LanguageTool, LightProof, CoGrOO and more. • Format: tool-dependent.

  10. Licensing Issues Licensing Issues Licensing Issues Licensing Issues Andrea Pescetti:

  11. M e r e A g g r e g a t i o n • Crazy variety of licenses. • Many incompatible with AL2. • But bundling is allowed: “mere aggregation”, LEGAL-117

  12. E x t e n s i o n s ( O X T ) • Writing Aids are now extensions (XML+data+ZIP) • Hosted anywhere, bundled at build time. • Reinforces “mere aggregation”.

  13. C h o o s e y o u r l i c e n s e • AL2 : Apache License, free and permissive, GPLv3 compatible. • LGPLv3/GPLv3 : can be used through mere aggregation. • AGPLv3 : untested so far, but likely mere aggregation too.

  14. ( D o n ' t ) M e e t A p a c h e L e g a l • Extensions are externally hosted • extensions.openoffice.org considered external too. • No paperwork needed!

  15. Distributed Management Distributed Management Distributed Management Distributed Management Andrea Pescetti:

  16. U s e a r e p o s i t o r y • Make sources available in an online repository. • Use version control . • Expose a web-based change tracking interface.

  17. S p e l l C h e c k e r • One file in text format. • Human readable, except rules. • Good for collaborative editing.

  18. S p e l l C h e c k e r : e x a m p l e

  19. T h e s a u r u s • One file in text format. • A generated index. • Human readable. • Good for collaborative editing.

  20. T h e s a u r u s : e x a m p l e

  21. H y p h e n a t i o n • One text file. • Format: less readable than Perl! • Changes very rarely. • Fix bugs upstream, in TeX.

  22. H y p h e n a t i o n : e x a m p l e

  23. G r a m m a r c h e c k e r • LanguageTool: rules in XML. • Fix upstream, in LanguageTool. • Collaboration possible.

  24. G r a m m a r c h e c k e r : e x a m p l e

  25. P a c k a g i n g • Generation of the OXT extension is scriptable. • Post-commit hook possible. • Keep generated OXT files in the same repository.

  26. T e a m S t r u c t u r e • Collaboration possible in every component. • A script to package the extension. • A release manager to make stable versions available.

  27. Community Involvement Community Involvement Community Involvement Community Involvement Andrea Pescetti:

  28. G o i n g 2 . 0 • Native-lang community: best people to improve N-L tools. • Motivated users, interested in improving OpenOffice. • Issue: providing efficient infrastructure.

  29. W e b - b a s e d i n t e r f a c e • An idea from OOoCon 2010. • Report missing or erroneous words from within OpenOffice. • Easy to setup as web service. • Notifications: e-mail to maintainers, suggestions in DB.

  30. W e b b a s e d i n t e r f a c e : e x a m p l e

  31. E x p o s e w e b s e r v i c e s • Direct usage of the web application via browser. • Access available through web services too. • Suitable for applications or macros.

  32. W e b s e r v i c e s i n O X T • Embed a macro in the OXT dictionary package. • Right-click on a word: • Nominate for inclusion in dictionary • Nominate for removal from dictionary • Report wrong hyphenation

  33. T h e s a u r u s m a i n t e n a n c e • Vithesaurus: free online tool for collaboratively creating and maintaining a thesaurus. • In use (German) at http://www.openthesaurus.de • https://github.com/danielnaber

  34. H a n d l i n g D u p l i c a t e s • Millions of users can lead to duplicate reports. • But it's a plus: use frequency for ranking.

  35. H a n d l i n g W r o n g R e p o r t s • Annoying: users make some wrong suggestions and repeat them! • The web application supports “motivated blacklisting”: repeated wrong submissions are handled and a message can be shown to the user.

  36. T h a n k s f o r a t t e n t i o n Andrea Pescetti pescetti@apache.org www.openoffice.org Image credits: Flickr, PLIO Archives.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend