NyanCAT MT Project Riddhi Desai, Noelle Hollister, Maya Katzir, - - PowerPoint PPT Presentation

nyancat mt project
SMART_READER_LITE
LIVE PREVIEW

NyanCAT MT Project Riddhi Desai, Noelle Hollister, Maya Katzir, - - PowerPoint PPT Presentation

NyanCAT MT Project Riddhi Desai, Noelle Hollister, Maya Katzir, Shaun Kelly, Shiori Misawa Pilot Project: The Basics Objective: To establish a basis for estimating the work involved in training an MT. Topic: Automobile company financial reports,


slide-1
SLIDE 1

NyanCAT MT Project

Riddhi Desai, Noelle Hollister, Maya Katzir, Shaun Kelly, Shiori Misawa

slide-2
SLIDE 2
slide-3
SLIDE 3

Pilot Project: The Basics

Objective: To establish a basis for estimating the work involved in training an MT. Topic: Automobile company financial reports, JA->EN Goals:

  • Timing: PEMT 25% faster than HT
  • Pricing: PEMT 25% cheaper than HT
  • Quality: PEMT must have a score under 10 according to a modified QA model

based off the LISA model. Final Goal: to determine if our pilot meets the requirements for a full-scale MT project.

slide-4
SLIDE 4

Pilot Project: Timeline

slide-5
SLIDE 5

Pilot Project: Details

  • Challenges in preparing aligned data

■ ■

were only available as PDF files

contained embedded tables

contained many numbers Financial statements

http://www.toyota-global.com/investors/financial_result/

Toyota financial statements

slide-6
SLIDE 6

Pilot Project: Details

PDF converted with Adobe Acrobat

CAT tool :

slide-7
SLIDE 7

Pilot Project: Details

2-column bilingual spreadsheet :

EN JA

slide-8
SLIDE 8

Changes to Procedures

  • Testing data

○ Increased the testing data from 872 segments to 1596 segments. => Recruited a volunteer

  • Training data

○ Converted all the PDFs into Word docs. => Used an online PDF converter ○ Increased the scope of data.

  • Post-editing and QA

=> To be covered later

http://pdf2docx.com/

slide-9
SLIDE 9

Quality Evaluation

  • Original Plan - 3 QA Evals

○ First (successful) training, middle, and last ○ First attempt (BLEU: 27.43) → → MT system improvements needed ■ Gave us an idea of what issues we needed to address

  • Later added FS for U.S. car companies (Ford, GM)
  • Revised Plan - 2 QA Evals

○ Middle, and second-to-last (last system had a lower BLEU score) ○ After System #4 (BLEU: 37.25) and System #12 (BLEU: 37.45)

  • Time Constraints

○ Original plan: Post-edit 1500 words ○ Revised plan: Post-edit 1500 words OR 30 minutes, whichever came first

slide-10
SLIDE 10

Quality Evaluation

Error Types Error Severity Levels Mistranslation Minor (1 point) Addition/Omission Major (5 points) Over-translation/Under-translation Critical (10 points) Source text untranslated Spelling Grammar Inconsistency Inconsistent use of terminology

slide-11
SLIDE 11

Quality Results - System 4

  • QA Score: 14 → Failed
  • Post-editor #2 did 1500 words (errors mostly due to speed)
  • Biggest issues were Mistranslation and Over-Translation
slide-12
SLIDE 12

Quality Results - System 12

  • QA Score: 4 → Passed
  • Both post-editors performed 30 minutes of post-editing work
  • Issues that remain: Mistranslation, Over-Translation, Under-Translation
slide-13
SLIDE 13

QA Summary/Caveats

  • The system passed.
  • Post-editing and QA are both extremely time-consuming
  • Did not account for (assuming meaning was uncompromised):

○ Grammar ○ Spelling

  • Subject matter experts (SMEs) would definitely be required
slide-14
SLIDE 14

Estimated Costs

Task Time (hours) Quantity Rate (hourly) Subtotal Document alignment 10.5 1 $35.00 $367.50 Round of MT training 0.5 10 $35.00 $175 Post-editing 0.5 3 $35.00 $52.50 QA 0.5 3 $35.00 $52.50 Total: $647.50 Machine translation time and costs will be compared with human translation at $0.25 per word and with the industry standard translation speed of 2500 words per day (313 words per hour).

slide-15
SLIDE 15

Actual Costs

Task Time (hours) Quantity Rate (hourly) Subtotal Document alignment 25 (including volunteer x 2 hours) 1 $35.00 $805 Volunteer & Employee Morale 1 $0.00 $29.36 Round of MT training 1.25 13 $35.00 $568.75 Post-editing 0.5 2 $35.00 $35.00 QA 2 2 $35.00 $140 Total: $1578.11

slide-16
SLIDE 16

Managing Morale

= managing expectations

slide-17
SLIDE 17

Costs: a Cautionary Tale

  • File preparation takes time!
  • “Morale”
  • Post-editing estimate was accurate, but QA took more time
slide-18
SLIDE 18

Results: BLEU Score

Note: BLEU Score 0 = failed test Reported as N/A in Microsoft Translator Hub

slide-19
SLIDE 19

Data Added Per Training

slide-20
SLIDE 20

Effects of Data on BLEU Score

slide-21
SLIDE 21

Cost and Time Results

Human Translation Post-Edited Machine Translation Estimated Cost 14,300 x $0.25/word = $3,575 *assuming a Subject Matter Expert Translator 14,300 x $0.15/word = $2145 +$1578.11 training costs = $3723.11 *assuming a Subject Matter Expert post editor Result MT is 4% More Expensive than Human Translation Estimated Time Required 14,300 words/313 words per hour = 46 hours (6 working days) 14,300 words/623 words per hour = 23 hours (3 working days) Result MT is 50% Faster than Human Translation

slide-22
SLIDE 22

Recommendations

  • System requires further

training to: ○ Use proper terminology ○ Be compliant with US GAAP.

  • However, we do not

recommend further training

http://marketbusinessnews.com/wp-content/uploads/2014/08/GAAP.jpg

slide-23
SLIDE 23

Recommendations

  • More cost-effective solution:

Creating a bilingual template for the financial statement portion of annual and quarterly reports

  • Pay for a professional

translator to translate only the Management’s Discussion and Analysis (MD&A) portions of the reports.