tdnn a two stage deep neural network for prompt
play

TDNN: A Two-stage Deep Neural Network for Prompt-independent - PowerPoint PPT Presentation

TDNN: A Two-stage Deep Neural Network for Prompt-independent Automated Essay Scoring Outline Background Method Experiments Conclusions What is Automated Essay Scoring (AES) ? Computer produces summative assessment for


  1. TDNN: A Two-stage Deep Neural Network for Prompt-independent Automated Essay Scoring

  2. Outline • Background • Method • Experiments • Conclusions

  3. What is Automated Essay Scoring (AES) ? • Computer produces summative assessment for evaluation • Aim: reduce human workload • AES has been put into practical use by ETS from 1999

  4. Prompt-specific and -Independent AES • Most existing AES approaches are prompt-specific – Require human labels for each prompt to train – Can achieve satisfying human-machine agreement • Quadradic weighted kappa (QWK) > 0.75 [Taghipour & Ng, EMNLP 2016] • Inter-human agreement: QWK=0.754 • Prompt-independent AES remains a challenge – Only non-target human labels are available

  5. Challenges in Prompt-independent AES Source Prompts Target Prompt Prompt 1: Winter Olympics Learn Predict Prompt 2: Rugby World Cup 2018 Model World Cup Prompt 3: Australian Open

  6. Challenges in Prompt-independent AES Source Prompts Target Prompt Prompt 1: Winter Olympics Unavailability of rated essays written for the target Learn Predict Prompt 2: Rugby World Cup 2018 Model World Cup prompt Prompt 3: Australian Open

  7. Challenges in Prompt-independent AES Source Prompts Target Prompt Prompt 1: Winter Olympics Learn Predict Prompt 2: Rugby World Cup 2018 Model World Cup Prompt 3: Australian Open • Previous approaches learn on source prompts – Domain adaption [Phandi et al. EMNLP 2015] – Cross-domain learning [Dong & Zhang, EMNLP 2016] – Achieved Avg. QWK = 0.6395 at best with up to 100 labeled target essays

  8. Challenges in Prompt-independent AES Source Prompts Target Prompt Prompt 1: Winter Olympics Learn Predict Prompt 2: Rugby World Cup 2018 Model World Cup Prompt 3: Australian Open Off-topic: essays written for source prompts are mostly irrelevant

  9. Outline • Background • Method • Experiments • Conclusions

  10. TDNN: A Two-stage Deep Neural Network for Prompt- independent AES • Based on the idea of transductive transfer learning • Learn on target essays • Utilize the content of target essays to rate

  11. The Two-stage Architecture • Prompt-independent stage: train a shallow model to create pseudo labels on the target prompt

  12. The Two-stage Architecture • Prompt-dependent stage: learn an end-to-end model to predict essay ratings for the target prompts

  13. Prompt-independent stage • Train a robust prompt-independent AES model • Using Non-target prompts • Learning algorithm: RankSVM for AES • Pre-defined prompt-independent features • Select confident essays written for the target prompt

  14. Prompt-independent stage • Train a robust prompt-independent AES model • Using Non-target prompts • Learning algorithm: RankSVM • Pre-defined prompt-independent features • Select confident essays written for the target prompt Predicted Scores 0 10

  15. Prompt-independent stage • Train a robust prompt-independent AES model • Using Non-target prompts • Learning algorithm: RankSVM • Pre-defined prompt-independent features • Select confident essays written for the target prompt Predicted Scores 0 4 10 Predicted ratings in [0, 4] as negative examples

  16. Prompt-independent stage • Train a robust prompt-independent AES model • Using Non-target prompts • Learning algorithm: RankSVM • Pre-defined prompt-independent features • Select confident essays written for the target prompt Predicted Scores 8 0 4 10 Predicted ratings in [8, 10] as positive examples

  17. Prompt-independent stage • Train a robust prompt-independent AES model • Using Non-target prompts • Learning algorithm: RankSVM • Pre-defined prompt-independent features • Select confident essays written for the target prompt Predicted Scores 8 0 4 10 1 0 Converted to 0/1 labels

  18. Prompt-independent stage • Train a robust prompt-independent AES model • Using Non-target prompts • Learning algorithm: RankSVM • Pre-defined prompt-independent features • Select confident essays written for the target prompt • Common sense: ≥8 is good, <5 is bad • Enlarge sample size 0 4 8 10

  19. Prompt-dependent stage • Train a hybrid deep model for a prompt- dependent assessment • An end-to-end neural network with three parts of inputs: • Word semantic embeddings • Part-of-speech (POS) taggings • Syntactic taggings

  20. Architecture of the hybrid deep model Multi-layer structure: Words – (phrases) - Sentences – Essay

  21. Architecture of the hybrid deep model Glove word embeddings

  22. Architecture of the hybrid deep model Part-of-speech taggings

  23. Architecture of the hybrid deep model Syntactic taggings

  24. Architecture of the hybrid deep model Multi-layer structure: Words – (phrases) - Sentences – Essay

  25. Architecture of the hybrid deep model

  26. Model Training • Training loss: MSE on 0/1 pseudo labels • Validation metric: Kappa on 30% non-target essays – Select the model that can best rate

  27. Outline • Background • Method • Experiments • Conclusions

  28. Dataset & Metrics • We use the standard ASAP corpus – 8 prompts with >10K essays in total • Prompt-independent AES: 7 prompts are used for training, 1 for testing • Report on common human-machine agreement metrics – Pearson’s correlation coefficient (PCC) – Spearman’s correlation coefficient (SCC) – Quadratic weighted Kappa (QWK)

  29. Baselines • RankSVM based on prompt-independent handcrafted features • Also used in the prompt-independent stage in TDNN • 2L-LSTM [Alikaniotis et al. , ACL 2016] • Two LSTM layer + linear layer • CNN-LSTM [Taghipour & Ng, EMNLP 2016] • CNN + LSTM + linear layer • CNN-LSTM-ATT [Dong et al. , CoNLL 2017] • CNN-LSTM + attention

  30. RankSVM is the most robust baseline • High variance of DNN models’ performance on all 8 prompts • Possibly caused by learning on non-target prompts • RankSVM appears to be the most stable baseline • Justifies the use of RankSVM in the first stage of TDNN

  31. Comparison to the best baseline • TDNN outperforms the best baseline on 7 out of 8 prompts • Performance improvements gained by learning on the target prompt

  32. Average performance on 8 prompts Method QWK PCC SCC Baselines RankSVM .5462 .6072 .5976 2L-LSTM .4687 .6548 .6214 CNN-LSTM .5362 .6569 .6139 CNN-LSTM-ATT .5057 .6535 .6368 TDNN TDNN(Sem) .5875 .6779 .6795 TDNN(Sem+POS) .6582 .7103 .7130 TDNN(Sem+Synt) .6856 .7244 .7365 TDNN(POS+Synt) .6784 .7189 .7322 TDNN(ALL) .6682 .7176 .7258

  33. Average performance on 8 prompts Method QWK PCC SCC Baselines RankSVM .5462 .6072 .5976 2L-LSTM .4687 .6548 .6214 CNN-LSTM .5362 .6569 .6139 CNN-LSTM-ATT .5057 .6535 .6368 TDNN TDNN(Sem) .5875 .6779 .6795 TDNN(Sem+POS) .6582 .7103 .7130 TDNN(Sem+Synt) .6856 .7244 .7365 TDNN(POS+Synt) .6784 .7189 .7322 TDNN(ALL) .6682 .7176 .7258

  34. Average performance on 8 prompts Method QWK PCC SCC Baselines RankSVM .5462 .6072 .5976 2L-LSTM .4687 .6548 .6214 CNN-LSTM .5362 .6569 .6139 CNN-LSTM-ATT .5057 .6535 .6368 TDNN TDNN(Sem) .5875 .6779 .6795 TDNN(Sem+POS) .6582 .7103 .7130 TDNN(Sem+Synt) .6856 .7244 .7365 TDNN(POS+Synt) .6784 .7189 .7322 TDNN(ALL) .6682 .7176 .7258

  35. Sanity Check: Relative Precision How the quality of pseudo examples affects the performance of TDNN? ➢ The sanctity of the selected essays, namely, the number of positive (negative) essays that are better (worse) than all negative (positive) essays. ➢ Such relative precision is at least 80% and mostly beyond 90% on different prompts ➢ TDNN can at least learn from correct 0/1 labels

  36. Conclusions • It is beneficial to learn an AES model on the target prompt • Syntactic features are useful addition to the widely used Word2Vec embeddings • Sanity check: small overlap between pos/neg examples • Prompt-independent AES remains an open problem – ETS wants Kappa>0.70 – TDNN can achieve 0.68 at best

  37. Thank you!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend