NyanCAT MT Project
Riddhi Desai, Noelle Hollister, Maya Katzir, Shaun Kelly, Shiori Misawa
NyanCAT MT Project Riddhi Desai, Noelle Hollister, Maya Katzir, - - PowerPoint PPT Presentation
NyanCAT MT Project Riddhi Desai, Noelle Hollister, Maya Katzir, Shaun Kelly, Shiori Misawa Pilot Project: The Basics Objective: To establish a basis for estimating the work involved in training an MT. Topic: Automobile company financial reports,
Riddhi Desai, Noelle Hollister, Maya Katzir, Shaun Kelly, Shiori Misawa
Objective: To establish a basis for estimating the work involved in training an MT. Topic: Automobile company financial reports, JA->EN Goals:
based off the LISA model. Final Goal: to determine if our pilot meets the requirements for a full-scale MT project.
■ ■
were only available as PDF files
■
contained embedded tables
■
contained many numbers Financial statements
http://www.toyota-global.com/investors/financial_result/
Toyota financial statements
PDF converted with Adobe Acrobat
CAT tool :
2-column bilingual spreadsheet :
EN JA
○ Increased the testing data from 872 segments to 1596 segments. => Recruited a volunteer
○ Converted all the PDFs into Word docs. => Used an online PDF converter ○ Increased the scope of data.
=> To be covered later
http://pdf2docx.com/
○ First (successful) training, middle, and last ○ First attempt (BLEU: 27.43) → → MT system improvements needed ■ Gave us an idea of what issues we needed to address
○ Middle, and second-to-last (last system had a lower BLEU score) ○ After System #4 (BLEU: 37.25) and System #12 (BLEU: 37.45)
○ Original plan: Post-edit 1500 words ○ Revised plan: Post-edit 1500 words OR 30 minutes, whichever came first
Error Types Error Severity Levels Mistranslation Minor (1 point) Addition/Omission Major (5 points) Over-translation/Under-translation Critical (10 points) Source text untranslated Spelling Grammar Inconsistency Inconsistent use of terminology
○ Grammar ○ Spelling
Task Time (hours) Quantity Rate (hourly) Subtotal Document alignment 10.5 1 $35.00 $367.50 Round of MT training 0.5 10 $35.00 $175 Post-editing 0.5 3 $35.00 $52.50 QA 0.5 3 $35.00 $52.50 Total: $647.50 Machine translation time and costs will be compared with human translation at $0.25 per word and with the industry standard translation speed of 2500 words per day (313 words per hour).
Task Time (hours) Quantity Rate (hourly) Subtotal Document alignment 25 (including volunteer x 2 hours) 1 $35.00 $805 Volunteer & Employee Morale 1 $0.00 $29.36 Round of MT training 1.25 13 $35.00 $568.75 Post-editing 0.5 2 $35.00 $35.00 QA 2 2 $35.00 $140 Total: $1578.11
= managing expectations
Note: BLEU Score 0 = failed test Reported as N/A in Microsoft Translator Hub
Human Translation Post-Edited Machine Translation Estimated Cost 14,300 x $0.25/word = $3,575 *assuming a Subject Matter Expert Translator 14,300 x $0.15/word = $2145 +$1578.11 training costs = $3723.11 *assuming a Subject Matter Expert post editor Result MT is 4% More Expensive than Human Translation Estimated Time Required 14,300 words/313 words per hour = 46 hours (6 working days) 14,300 words/623 words per hour = 23 hours (3 working days) Result MT is 50% Faster than Human Translation
training to: ○ Use proper terminology ○ Be compliant with US GAAP.
recommend further training
http://marketbusinessnews.com/wp-content/uploads/2014/08/GAAP.jpg
Creating a bilingual template for the financial statement portion of annual and quarterly reports
translator to translate only the Management’s Discussion and Analysis (MD&A) portions of the reports.