GATE Applications as W ATE Applications as Web Services eb Services - - PowerPoint PPT Presentation

gate applications as w ate applications as web services
SMART_READER_LITE
LIVE PREVIEW

GATE Applications as W ATE Applications as Web Services eb Services - - PowerPoint PPT Presentation

GATE Applications as W ATE Applications as Web Services eb Services Ian Roberts University of Sheffield NLP Introduction Introduction Scenario: Implementing a web service (or other web application) that uses GATE Embedded to process


slide-1
SLIDE 1

GATE Applications as W ATE Applications as Web Services eb Services

Ian Roberts

slide-2
SLIDE 2

University of Sheffield NLP

Introduction Introduction

  • Scenario:
  • Implementing a web service (or other web

application) that uses GATE Embedded to process requests.

  • Want to support multiple concurrent requests
  • Long running process - need to be careful to avoid

memory leaks, etc.

  • Example used is a plain HttpServlet
  • Principles apply to other frameworks (struts,

Spring MVC, Metro/CXF, Grails…)

slide-3
SLIDE 3

University of Sheffield NLP

Setting up Setting up

  • GATE libraries in WEB-INF/lib
  • gate.jar + JARs from lib
  • Usual GATE Embedded requirements:
  • A directory to be "gate.home"
  • Site and user config files
  • Plugins directory
  • Call Gate.init() once (and only once) before using

any other GATE APIs

slide-4
SLIDE 4

University of Sheffield NLP

Initialisation using a Initialisation using a ServletContextListener ServletContextListener

  • ServletContextListener is registered in

web.xml

  • Called when the application starts up

public void contextInitialized(ServletContextEvent e) { ServletContext ctx = e.getServletContext(); File gateHome = new File(ctx.getRealPath("/WEB-INF")); Gate.setGateHome(gateHome); File userConfig = new File(ctx.getRealPath("/WEB-INF/user.xml")); Gate.setUserConfigFile(userConfig); // site config is gateHome/gate.xml // plugins dir is gateHome/plugins Gate.init(); } <listener> <listener-class>gate.web..example.GateInitListener</listener-class> </listener>

slide-5
SLIDE 5

University of Sheffield NLP

GATE in a m ATE in a multithreaded ultithreaded environm environment ent

  • GATE PRs are not thread-safe
  • Due to design of parameter-passing as JavaBean

properties

  • Must ensure that a given PR/Controller

instance is only used by one thread at a time

slide-6
SLIDE 6

University of Sheffield NLP

First attem First attempt: one instance pt: one instance per request per request

  • Naïve approach - create new PRs for each

request

public void doPost(request, response) { ProcessingResource pr = Factory.createResource(...); try { Document doc = Factory.newDocument(getTextFromRequest(request)); try { // do some stuff } finally { Factory.deleteResource(doc); } } finally { Factory.deleteResource(pr); } }

Many levels of nested try/finally: ugly but necessary to make sure we clean up even when errors occur. You will get very used to these…

slide-7
SLIDE 7

University of Sheffield NLP

Problem Problems w s with this approach ith this approach

  • Guarantees no interference between threads
  • But inefficient, particularly with complex PRs

(large gazetteers, etc.)

  • Hidden problem with JAPE:
  • Parsing a JAPE grammar creates and compiles

Java classes

  • Once created, classes are never unloaded
  • Even with simple grammars, eventually

OutOfMemoryError (PermGen space)

slide-8
SLIDE 8

University of Sheffield NLP

Second attem Second attempt: using pt: using ThreadLocals ThreadLocals

  • Store the PR/Controller in a thread local

variable

private ThreadLocal<CorpusController> controller = new ThreadLocal<CorpusController>() { protected CorpusController initialValue() { return loadController(); } }; private CorpusController loadController() { //... } public void doPost(request, response) { CorpusController c = controller.get(); // do stuff with the controller }

slide-9
SLIDE 9

University of Sheffield NLP

Better than attem Better than attempt 1 pt 1…

  • Only initialise resources once per thread
  • Interacts nicely with typical web server thread

pooling

  • But if a thread dies, no way to clean up its

controller

  • Possibility of memory leaks
slide-10
SLIDE 10

University of Sheffield NLP

A solution: object pooling A solution: object pooling

  • Manage your own pool of Controller instances
  • Take a controller from the pool at the start of a

request, return it (in a finally!) at the end

  • Number of instances in the pool determines

maximum concurrency level

slide-11
SLIDE 11

University of Sheffield NLP

Sim Simple exam ple example ple

private BlockingQueue<CorpusController> pool; public void init() { pool = new LinkedBlockingQueue<CorpusController>(); for(int i = 0; i < POOL_SIZE; i++) { pool.add(loadController()); } } public void doPost(request, response) { CorpusController c = pool.take(); try { // do stuff } finally { pool.add(c); } } public void destroy() { for(CorpusController c : pool) Factory.deleteResource(c); }

Blocks if the pool is empty: use poll() if you want to handle empty pool yourself

slide-12
SLIDE 12

University of Sheffield NLP

Further reading Further reading

  • Spring Framework
  • http://www.springsource.org/
  • Handles application startup and shutdown
  • Configure your business objects and connections

between them using XML

  • GATE provides helpers to initialise GATE, load

saved applications, etc.

  • Built-in support for object pooling
  • Web application framework (Spring MVC)
  • Used by other frameworks (Grails, CXF, …)
slide-13
SLIDE 13

University of Sheffield NLP

Conclusions Conclusions

  • Only use GATE Resources in one thread at a

time

  • Make sure to clean up after yourself, even

when things go wrong

  • try/finally
  • Whenever you createResource, be sure to

deleteResource