empirical study of a two step approach to estimate
play

Empirical Study of a Two-Step Approach to Estimate Translation - PowerPoint PPT Presentation

Empirical Study of a Two-Step Approach to Estimate Translation Quality J. Gonz alez-Rubio, J.R. Navarro-Cerd an, F. Casacuberta jegonzalez@dsic.upv.es, jonacer@iti.upv.es, fcn@dsic.upv.es Pattern Recognition and Human Language Technology


  1. Empirical Study of a Two-Step Approach to Estimate Translation Quality J. Gonz´ alez-Rubio, J.R. Navarro-Cerd´ an, F. Casacuberta jegonzalez@dsic.upv.es, jonacer@iti.upv.es, fcn@dsic.upv.es Pattern Recognition and Human Language Technology Group Instituto Tecnol´ ogico de Inform´ atica Universitat Polit` ecnica de Val` encia (Spain) Work supported by the EU 7 th Framework program (FP/2007-2013) under the CasMaCat project (gran no 287576) JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

  2. Overview • Introduction • Proposed Two-Step Quality Estimation Approach • Experimental Setup • Results • Conclusions JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

  3. Introduction JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

  4. Motivation • Quality estimation (QE) is a key element in practical translation systems • Usually addressed as a regression problem – Predict a quality score from a set of translation features • Problem: translation features are ambiguous, noisy, and collinear • Chosen solution: a two-step training methodology reduced original translation feature feature Machine source sentence prediction Feature set set Dimensionality learning additional computation . reduction . model . sources of information JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

  5. Two-Step Quality Estimation Approach JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

  6. Dimensionality Reduction • Based on Partial Least Squares Regression (PLSR) • Widely-used PCA takes into account only the features – Principal components (PCs) contain almost not redundancy... – ...but they do not necessarily are the best features for prediction • In contrast, PLSR does take into account the values to be predicted – The new set of Latent Variables (LVs) contain almost no redundancy – Additionally, they explain most of the variability in the quality scores JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

  7. Prediction Model • Goal: predict the actual quality scores from the LVs • Model: Support Vector Machines for regression (SVR) • Good empirical prediction accuracy in a number of tasks • Widely used in the QE literature JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

  8. Experimental Setup JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

  9. Corpus • English-Spanish news texts from WMT 2012 QE task • 1832 translations for training and 422 for test • Each translation has a real-valued score between one and five • Post-edition effort likert scale: 5: The translation requires little editing to be publishable 4: 10% – 25% of the translation needs to be edited 3: 25% – 50% of the translation needs to be edited 2: 50% – 70% of the translation needs to be edited 1: The translation must be translated from scratch JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

  10. Feature Sets Clean Collinear Constant Features (308) (49) (15) (43) (56) (497) (82) (147) 100% 80% 60% 40% 20% 0% DCU-SYMC LORIA SDLLW TCD UEDIN UPV UU WLV-SHEF • Wide variety of experimental conditions JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

  11. Experimental Methodology • Evaluation metric: Root Mean Squared Error (RMSE) • Free parameters optimized by 10-fold cross-validation – Number of LVs and SVR meta-parameters • 8 dev-train folds, one dev-tuning fold, and one dev-test fold – Result: averaged prediction accuracy for the separated dev-test folds • Final models built with the whole training using best parameter values JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

  12. Results JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

  13. Cross-Validation RMSE Results Baseline PCA Our approach 0.8 RMSE 0.75 0.7 0.65 0.6 DCU-SYMC LORIA SDLLW TCD UEDIN UPV UU WLV-SHEF • Equal or lower prediction error than Baseline and PCA – Error reduction correlated with the number of noisy features JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

  14. Cross-Validation Feature Reduction Ratio Baseline PCA Our approach % of the original features 100 80 60 40 20 0 DCU-SYMC LORIA SDLLW TCD UEDIN UPV UU WLV-SHEF • About half the number of LVs than PCs • Operational time of the QE system largely reduced JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

  15. Cross-Validation learning curves 0.86 0.9 RMSE SDLLW feature set Baseline RMSE UPV feature set Baseline 0.85 PCA PCA 0.82 Our approach Our approach 0.8 0.78 0.75 0.74 0.7 0.7 0.65 0.66 0.6 # latent variables # latent variables 0.62 1 10 20 30 40 50 60 70 80 90 100 1 3 5 7 9 11 13 15 Band indicates the 95% confidence interval of prediction accuracy (RMSE) • Larger and faster error reduction for higly-redundant sets (left plot) • Same accuracy with less features for concise sets (right plot) JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

  16. Test Results Baseline PCA Our approach 1.1 RMSE 1 0.9 0.8 0.7 DCU-SYMC LORIA SDLLW TCD UEDIN UPV UU WLV-SHEF • Different result respect to cross-validation, why? JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

  17. Analysis of Test Results • Hypothesis: training partition did not adequately represent test • Studied by a series of Hotelling’s two-sample T 2 tests – Multivariate analog of Student’s t-test Samples from Samples from Samples from Samples from same population same population same population same population Population 1 Population 1 Population 1 Population 1 p > 0.01 p > 0.01 p > 0.01 p > 0.01 – Compares two independently drawn samples ∗ E.g., training and test partitions – Do they belong to the same population? Samples from Samples from Samples from Samples from different populations different populations different populations different populations Population 2 Population 2 Population 2 Population 2 p < 0.01 p < 0.01 p < 0.01 p < 0.01 JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

  18. Analysis of Test Results II • T 2 tests indicated that training and test were from different populations – Main reason: data scarcity (only 1832 training samples) • Further analysis of each individual feature: – Most had statistically different values in test – Between one quarter and three quarters depending on the set • In contrast, only about only 1% between dev-train and dev-test folds JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

  19. Conclusions JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

  20. Conclusions • Empirical results showed the soundness of the proposed approach – Improvements in prediction accuracy – Large feature reduction ratios • Not so good test results due to data scarcity • Feature reduction boosts QE scalability and time-efficiency – Suitable to be applied in scenarios with temporal restrictions – Allows the use of thousands of features JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

  21. Thank you, questions? JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend