Machine Translation of Medical Text in the KConnect Project Petra - - PowerPoint PPT Presentation

machine translation of medical text in the kconnect
SMART_READER_LITE
LIVE PREVIEW

Machine Translation of Medical Text in the KConnect Project Petra - - PowerPoint PPT Presentation

Machine Translation of Medical Text in the KConnect Project Petra Galukov, Jan Haji, Jindich Libovick, Pavel Pecina, Ale Tamchyna Charles University in Prague Institute of Formal and Applied Linguistics Introduction


slide-1
SLIDE 1

Machine Translation of Medical Text in the KConnect Project

Petra Galuščáková, Jan Hajič, Jindřich Libovický, Pavel Pecina, Aleš Tamchyna Charles University in Prague Institute of Formal and Applied Linguistics

slide-2
SLIDE 2

Introduction

  • KConnect is a follow-up project of Khresmoi
  • goals: provide components developed in

Khresmoi as commercialized cloud services

  • role of MT: provide cross-lingual search and

access to medical documents

– search queries – document summaries

slide-3
SLIDE 3

Training Data

  • new languages:

– Swedish, Spanish, Polish, Hungarian

  • in-domain corpora collected and processed

– UMLS, EMEA, MuchMore, Wikipedia, PatTR,

COPPA, Mesh, subtitles,...

slide-4
SLIDE 4

Training Data: Statistics

parallel monolingual only in-domain general domain in-domain general domain cs 21 665 1 93 de 126 310 4 699 es 74 1248 2 474 fr 193 896 2 589 hu 19 641 1 98 pl 17 606 1 205 sv 24 409 21 158 en – – 6087 2100

Training data sizes, all figures are in millions of words.

slide-5
SLIDE 5

Domain Adaptation

  • Data selection

– divide data into „medical-like“ and „general“ parts

(based on language model perplexity)

  • Model interpolation

– build separate models (phrase table, language model)

for each part

– use linear interpolation to combine them

  • SRILM
  • TMCombine
slide-6
SLIDE 6

MT as a Web Service

  • MTMonkey
  • developed within Khresmoi, now actively

extended and maintained

  • runs in a cluster of 20 servers
slide-7
SLIDE 7

Training Toolkit

  • Eman Lite
  • fully automated MT system training
  • command-line application implemented
  • goal: web-based interface, tight integration with

MTMonkey

slide-8
SLIDE 8

Thank you!

Questions?