C r e a t i n g d i c t i o n a r i e s f o r - - PowerPoint PPT Presentation

c r e a t i n g d i c t i o n a r i e s f o r a p a c h e
SMART_READER_LITE
LIVE PREVIEW

C r e a t i n g d i c t i o n a r i e s f o r - - PowerPoint PPT Presentation

C r e a t i n g d i c t i o n a r i e s f o r A p a c h e O p e n O f f i c e a n d m a i n t a i n i n g t h e m t h r o u g h w e b s e r v i c e s Andrea Pescetti


slide-1
SLIDE 1

C r e a t i n g d i c t i

  • n

a r i e s f

  • r

A p a c h e O p e n O f f i c e a n d m a i n t a i n i n g t h e m t h r

  • u

g h w e b s e r v i c e s

Andrea Pescetti pescetti@apache.org

slide-2
SLIDE 2

A n d r e a P e s c e t t i

  • VP, Apache OpenOffice
  • Unaffiliated volunteer
  • Dictionary packager
  • Day job: web developer
slide-3
SLIDE 3

Andrea Pescetti:

Getting Started Getting Started Getting Started Getting Started

slide-4
SLIDE 4

O p e n O f f i c e L a n g u a g e S u p p

  • r

t

$ svn ls https://svn.apache.org/repos/a sf/openoffice/trunk/extras/l10 n/source/ | grep -c / 112

slide-5
SLIDE 5

W r i t i n g A i d s : A n O v e r v i e w

  • Spell checker
  • Thesaurus
  • Hyphenation Patterns
  • Grammar Checker
slide-6
SLIDE 6

S p e l l C h e c k e r

  • Engine: Hunspell, integrated

in OpenOffice.

  • Hunspell dictionaries

available for 100+ languages.

  • http://hunspell.sf.net
slide-7
SLIDE 7

T h e s a u r u s

  • Engine: integrated.
  • OpenOffice-specific format.
  • Must start from scratch.
  • lingucomponent.openoffice.org
slide-8
SLIDE 8

H y p h e n a t i

  • n

P a t t e r n s

  • Engine: Hyphen, from Hunspell.
  • Integrated in OpenOffice.
  • Format: tool-specific.
  • But you can convert TeX

patterns: http://ctan.org/

slide-9
SLIDE 9

G r a m m a r C h e c k e r

  • Available only as API.
  • Options as extensions:

LanguageTool, LightProof, CoGrOO and more.

  • Format: tool-dependent.
slide-10
SLIDE 10

Andrea Pescetti:

Licensing Issues Licensing Issues Licensing Issues Licensing Issues

slide-11
SLIDE 11

M e r e A g g r e g a t i

  • n
  • Crazy variety of licenses.
  • Many incompatible with AL2.
  • But bundling is allowed:

“mere aggregation”, LEGAL-117

slide-12
SLIDE 12

E x t e n s i

  • n

s ( O X T )

  • Writing Aids are now

extensions (XML+data+ZIP)

  • Hosted anywhere, bundled at

build time.

  • Reinforces “mere aggregation”.
slide-13
SLIDE 13

C h

  • s

e y

  • u

r l i c e n s e

  • AL2: Apache License, free and

permissive, GPLv3 compatible.

  • LGPLv3/GPLv3: can be used

through mere aggregation.

  • AGPLv3: untested so far, but

likely mere aggregation too.

slide-14
SLIDE 14

( D

  • n

' t ) M e e t A p a c h e L e g a l

  • Extensions are externally hosted
  • extensions.openoffice.org

considered external too.

  • No paperwork needed!
slide-15
SLIDE 15

Andrea Pescetti:

Distributed Management Distributed Management Distributed Management Distributed Management

slide-16
SLIDE 16

U s e a r e p

  • s

i t

  • r

y

  • Make sources available in an
  • nline repository.
  • Use version control.
  • Expose a web-based change

tracking interface.

slide-17
SLIDE 17

S p e l l C h e c k e r

  • One file in text format.
  • Human readable, except rules.
  • Good for collaborative

editing.

slide-18
SLIDE 18

S p e l l C h e c k e r : e x a m p l e

slide-19
SLIDE 19

T h e s a u r u s

  • One file in text format.
  • A generated index.
  • Human readable.
  • Good for collaborative

editing.

slide-20
SLIDE 20

T h e s a u r u s : e x a m p l e

slide-21
SLIDE 21

H y p h e n a t i

  • n
  • One text file.
  • Format: less readable than

Perl!

  • Changes very rarely.
  • Fix bugs upstream, in TeX.
slide-22
SLIDE 22

H y p h e n a t i

  • n

: e x a m p l e

slide-23
SLIDE 23

G r a m m a r c h e c k e r

  • LanguageTool: rules in XML.
  • Fix upstream, in

LanguageTool.

  • Collaboration possible.
slide-24
SLIDE 24

G r a m m a r c h e c k e r : e x a m p l e

slide-25
SLIDE 25

P a c k a g i n g

  • Generation of the OXT

extension is scriptable.

  • Post-commit hook possible.
  • Keep generated OXT files in

the same repository.

slide-26
SLIDE 26

T e a m S t r u c t u r e

  • Collaboration possible in

every component.

  • A script to package the

extension.

  • A release manager to make

stable versions available.

slide-27
SLIDE 27

Andrea Pescetti:

Community Involvement Community Involvement Community Involvement Community Involvement

slide-28
SLIDE 28

G

  • i

n g 2 .

  • Native-lang community: best

people to improve N-L tools.

  • Motivated users, interested

in improving OpenOffice.

  • Issue: providing efficient

infrastructure.

slide-29
SLIDE 29

W e b

  • b

a s e d i n t e r f a c e

  • An idea from OOoCon 2010.
  • Report missing or erroneous

words from within OpenOffice.

  • Easy to setup as web service.
  • Notifications: e-mail to

maintainers, suggestions in DB.

slide-30
SLIDE 30

W e b b a s e d i n t e r f a c e : e x a m p l e

slide-31
SLIDE 31

E x p

  • s

e w e b s e r v i c e s

  • Direct usage of the web

application via browser.

  • Access available through web

services too.

  • Suitable for applications or

macros.

slide-32
SLIDE 32

W e b s e r v i c e s i n O X T

  • Embed a macro in the OXT

dictionary package.

  • Right-click on a word:
  • Nominate for inclusion in dictionary
  • Nominate for removal from dictionary
  • Report wrong hyphenation
slide-33
SLIDE 33

T h e s a u r u s m a i n t e n a n c e

  • Vithesaurus: free online tool

for collaboratively creating and maintaining a thesaurus.

  • In use (German) at

http://www.openthesaurus.de

  • https://github.com/danielnaber
slide-34
SLIDE 34

H a n d l i n g D u p l i c a t e s

  • Millions of users can lead to

duplicate reports.

  • But it's a plus: use

frequency for ranking.

slide-35
SLIDE 35

H a n d l i n g W r

  • n

g R e p

  • r

t s

  • Annoying: users make some wrong

suggestions and repeat them!

  • The web application supports

“motivated blacklisting”: repeated wrong submissions are handled and a message can be shown to the user.

slide-36
SLIDE 36

T h a n k s f

  • r

a t t e n t i

  • n

Andrea Pescetti

pescetti@apache.org www.openoffice.org

Image credits: Flickr, PLIO Archives.