nyancat mt project
play

NyanCAT MT Project Riddhi Desai, Noelle Hollister, Maya Katzir, - PowerPoint PPT Presentation

NyanCAT MT Project Riddhi Desai, Noelle Hollister, Maya Katzir, Shaun Kelly, Shiori Misawa Pilot Project: The Basics Objective: To establish a basis for estimating the work involved in training an MT. Topic: Automobile company financial reports,


  1. NyanCAT MT Project Riddhi Desai, Noelle Hollister, Maya Katzir, Shaun Kelly, Shiori Misawa

  2. Pilot Project: The Basics Objective: To establish a basis for estimating the work involved in training an MT. Topic: Automobile company financial reports, JA->EN Goals: ● Timing: PEMT 25% faster than HT ● Pricing: PEMT 25% cheaper than HT ● Quality: PEMT must have a score under 10 according to a modified QA model based off the LISA model. Final Goal: to determine if our pilot meets the requirements for a full-scale MT project.

  3. Pilot Project: Timeline

  4. Pilot Project: Details ● Challenges in preparing aligned data Financial statements ■ Toyota financial statements ■ were only available as PDF files ■ contained embedded tables ■ contained many numbers http://www.toyota-global.com/investors/financial_result/

  5. Pilot Project: Details CAT tool : � PDF converted with Adobe Acrobat

  6. Pilot Project: Details 2-column bilingual spreadsheet : � JA EN

  7. Changes to Procedures ● Testing data ○ Increased the testing data from 872 segments to 1596 segments. => Recruited a volunteer ● Training data ○ Converted all the PDFs into Word docs. => Used an online PDF converter ○ Increased the scope of data. ● Post-editing and QA http://pdf2docx.com/ => To be covered later

  8. Quality Evaluation ● Original Plan - 3 QA Evals ○ First (successful) training, middle, and last ○ First attempt (BLEU: 27.43) → � → MT system improvements needed ■ Gave us an idea of what issues we needed to address ● Later added FS for U.S. car companies (Ford, GM) ● Revised Plan - 2 QA Evals ○ Middle, and second-to-last (last system had a lower BLEU score) ○ After System #4 (BLEU: 37.25) and System #12 (BLEU: 37.45) ● Time Constraints ○ Original plan: Post-edit 1500 words ○ Revised plan: Post-edit 1500 words OR 30 minutes, whichever came first

  9. Quality Evaluation Error Types Error Severity Levels Mistranslation Minor (1 point) Addition/Omission Major (5 points) Over-translation/Under-translation Critical (10 points) Source text untranslated Spelling Grammar Inconsistency Inconsistent use of terminology

  10. Quality Results - System 4 ● QA Score: 14 → Failed � Post-editor #2 did 1500 words (errors mostly due to speed) ● ● Biggest issues were Mistranslation and Over-Translation

  11. Quality Results - System 12 ● QA Score: 4 → Passed � Both post-editors performed 30 minutes of post-editing work ● ● Issues that remain: Mistranslation, Over-Translation, Under-Translation

  12. QA Summary/Caveats ● The system passed. � Post-editing and QA are both extremely time-consuming � ● ● Did not account for (assuming meaning was uncompromised): ○ Grammar Spelling ○ ● Subject matter experts (SMEs) would definitely be required

  13. Estimated Costs Task Time (hours) Quantity Rate (hourly) Subtotal Document alignment 10.5 1 $35.00 $367.50 Round of MT training 0.5 10 $35.00 $175 Post-editing 0.5 3 $35.00 $52.50 QA 0.5 3 $35.00 $52.50 Total: $647.50 Machine translation time and costs will be compared with human translation at $0.25 per word and with the industry standard translation speed of 2500 words per day (313 words per hour).

  14. Actual Costs Task Time (hours) Quantity Rate (hourly) Subtotal Document alignment 25 (including 1 $35.00 $805 volunteer x 2 hours) Volunteer & Employee 1 $0.00 $29.36 Morale Round of MT training 1.25 13 $35.00 $568.75 Post-editing 0.5 2 $35.00 $35.00 QA 2 2 $35.00 $140 Total: $1578.11

  15. Managing Morale = managing expectations

  16. Costs: a Cautionary Tale ● File preparation takes time! “Morale” ● ● Post-editing estimate was accurate, but QA took more time

  17. Results: BLEU Score Note: BLEU Score 0 = failed test Reported as N/A in Microsoft Translator Hub

  18. Data Added Per Training

  19. Effects of Data on BLEU Score

  20. Cost and Time Results Human Translation Post-Edited Machine Translation 14,300 x $0.25/word = $3,575 14,300 x $0.15/word = $2145 *assuming a Subject Matter Expert +$1578.11 training costs = $3723.11 Translator *assuming a Subject Matter Expert Estimated Cost post editor MT is 4% More Expensive than Human Translation Result 14,300 words/313 words per hour = 46 14,300 words/623 words per hour = 23 hours (6 working days) hours (3 working days) Estimated Time Required MT is 50% Faster than Human Translation Result

  21. Recommendations ● System requires further training to: ○ Use proper terminology ○ Be compliant with US GAAP. ● However, we do not recommend further training http://marketbusinessnews.com/wp-content/uploads/2014/08/GAAP.jpg

  22. Recommendations ● More cost-effective solution: Creating a bilingual template for the financial statement portion of annual and quarterly reports Pay for a professional ● translator to translate only the Management’s Discussion and Analysis (MD&A) portions of the reports.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend