Empirical Study of a Two-Step Approach to Estimate Translation - - PowerPoint PPT Presentation

empirical study of a two step approach to estimate
SMART_READER_LITE
LIVE PREVIEW

Empirical Study of a Two-Step Approach to Estimate Translation - - PowerPoint PPT Presentation

Empirical Study of a Two-Step Approach to Estimate Translation Quality J. Gonz alez-Rubio, J.R. Navarro-Cerd an, F. Casacuberta jegonzalez@dsic.upv.es, jonacer@iti.upv.es, fcn@dsic.upv.es Pattern Recognition and Human Language Technology


slide-1
SLIDE 1

Empirical Study of a Two-Step Approach to Estimate Translation Quality

  • J. Gonz´

alez-Rubio, J.R. Navarro-Cerd´ an, F. Casacuberta

jegonzalez@dsic.upv.es, jonacer@iti.upv.es, fcn@dsic.upv.es

Pattern Recognition and Human Language Technology Group Instituto Tecnol´

  • gico de Inform´

atica Universitat Polit` ecnica de Val` encia (Spain)

Work supported by the EU 7th Framework program (FP/2007-2013) under the CasMaCat project (gran no 287576) JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

slide-2
SLIDE 2

Overview

  • Introduction
  • Proposed Two-Step Quality Estimation Approach
  • Experimental Setup
  • Results
  • Conclusions

JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

slide-3
SLIDE 3

Introduction

JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

slide-4
SLIDE 4

Motivation

  • Quality estimation (QE) is a key element in practical translation systems
  • Usually addressed as a regression problem

– Predict a quality score from a set of translation features

  • Problem: translation features are ambiguous, noisy, and collinear
  • Chosen solution: a two-step training methodology

translation source sentence . . . additional sources of information Feature computation Dimensionality reduction Machine learning model prediction

  • riginal

feature set reduced feature set

JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

slide-5
SLIDE 5

Two-Step Quality Estimation Approach

JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

slide-6
SLIDE 6

Dimensionality Reduction

  • Based on Partial Least Squares Regression (PLSR)
  • Widely-used PCA takes into account only the features

– Principal components (PCs) contain almost not redundancy... – ...but they do not necessarily are the best features for prediction

  • In contrast, PLSR does take into account the values to be predicted

– The new set of Latent Variables (LVs) contain almost no redundancy – Additionally, they explain most of the variability in the quality scores

JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

slide-7
SLIDE 7

Prediction Model

  • Goal: predict the actual quality scores from the LVs
  • Model: Support Vector Machines for regression (SVR)
  • Good empirical prediction accuracy in a number of tasks
  • Widely used in the QE literature

JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

slide-8
SLIDE 8

Experimental Setup

JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

slide-9
SLIDE 9

Corpus

  • English-Spanish news texts from WMT 2012 QE task
  • 1832 translations for training and 422 for test
  • Each translation has a real-valued score between one and five
  • Post-edition effort likert scale:

5: The translation requires little editing to be publishable 4: 10%–25% of the translation needs to be edited 3: 25%–50% of the translation needs to be edited 2: 50%–70% of the translation needs to be edited 1: The translation must be translated from scratch

JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

slide-10
SLIDE 10

Feature Sets

0% 20% 40% 60% 80% 100%

DCU-SYMC LORIA SDLLW TCD UEDIN UPV UU WLV-SHEF Features (308) (49) (15) (43) (56) (497) (82) (147)

Clean Collinear Constant

  • Wide variety of experimental conditions

JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

slide-11
SLIDE 11

Experimental Methodology

  • Evaluation metric: Root Mean Squared Error (RMSE)
  • Free parameters optimized by 10-fold cross-validation

– Number of LVs and SVR meta-parameters

  • 8 dev-train folds, one dev-tuning fold, and one dev-test fold

– Result: averaged prediction accuracy for the separated dev-test folds

  • Final models built with the whole training using best parameter values

JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

slide-12
SLIDE 12

Results

JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

slide-13
SLIDE 13

Cross-Validation RMSE Results

0.6 0.65 0.7 0.75 0.8

DCU-SYMC LORIA SDLLW TCD UEDIN UPV UU WLV-SHEF

RMSE Baseline PCA Our approach

  • Equal or lower prediction error than Baseline and PCA

– Error reduction correlated with the number of noisy features

JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

slide-14
SLIDE 14

Cross-Validation Feature Reduction Ratio

20 40 60 80 100

DCU-SYMC LORIA SDLLW TCD UEDIN UPV UU WLV-SHEF

% of the original features Baseline PCA Our approach

  • About half the number of LVs than PCs
  • Operational time of the QE system largely reduced

JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

slide-15
SLIDE 15

Cross-Validation learning curves

0.6 0.65 0.7 0.75 0.8 0.85 0.9 1 10 20 30 40 50 60 70 80 90 100

RMSE # latent variables Baseline PCA Our approach

UPV feature set

0.62 0.66 0.7 0.74 0.78 0.82 0.86 1 3 5 7 9 11 13 15

RMSE # latent variables Baseline PCA Our approach

SDLLW feature set

Band indicates the 95% confidence interval of prediction accuracy (RMSE)

  • Larger and faster error reduction for higly-redundant sets (left plot)
  • Same accuracy with less features for concise sets (right plot)

JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

slide-16
SLIDE 16

Test Results

0.7 0.8 0.9 1 1.1

DCU-SYMC LORIA SDLLW TCD UEDIN UPV UU WLV-SHEF

RMSE Baseline PCA Our approach

  • Different result respect to cross-validation, why?

JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

slide-17
SLIDE 17

Analysis of Test Results

  • Hypothesis: training partition did not adequately represent test
  • Studied by a series of Hotelling’s two-sample T2 tests

– Multivariate analog of Student’s t-test – Compares two independently drawn samples ∗ E.g., training and test partitions – Do they belong to the same population?

Population 1 Population 2 Samples from same population p > 0.01 Samples from different populations p < 0.01 Population 1 Population 2 Samples from same population p > 0.01 Samples from different populations p < 0.01 Population 1 Population 2 Samples from same population p > 0.01 Samples from different populations p < 0.01 Population 1 Population 2 Samples from same population p > 0.01 Samples from different populations p < 0.01

JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

slide-18
SLIDE 18

Analysis of Test Results II

  • T2 tests indicated that training and test were from different populations

– Main reason: data scarcity (only 1832 training samples)

  • Further analysis of each individual feature:

– Most had statistically different values in test – Between one quarter and three quarters depending on the set

  • In contrast, only about only 1% between dev-train and dev-test folds

JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

slide-19
SLIDE 19

Conclusions

JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

slide-20
SLIDE 20

Conclusions

  • Empirical results showed the soundness of the proposed approach

– Improvements in prediction accuracy – Large feature reduction ratios

  • Not so good test results due to data scarcity
  • Feature reduction boosts QE scalability and time-efficiency

– Suitable to be applied in scenarios with temporal restrictions – Allows the use of thousands of features

JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13

slide-21
SLIDE 21

Thank you, questions?

JGR,JNC,FCN Empirical Study of a Two-Step Approach to Estimate Translation Quality IWSLT’13