mandarin chinese
play

Mandarin Chinese Bai Li Advised by Frank Rudzicz Li B ., Hsu Y-T., - PowerPoint PPT Presentation

Automatic Detection of Dementia in Mandarin Chinese Bai Li Advised by Frank Rudzicz Li B ., Hsu Y-T., Rudzicz F. Detecting dementia in Mandarin Chinese using transfer learning from a parallel corpus. To appear at NAACL 2019. Alzheimers


  1. Automatic Detection of Dementia in Mandarin Chinese Bai Li Advised by Frank Rudzicz Li B ., Hsu Y-T., Rudzicz F. “Detecting dementia in Mandarin Chinese using transfer learning from a parallel corpus”. To appear at NAACL 2019.

  2. Alzheimer’s Disease (AD) and Dementia • Neurodegenerative disease • 5.7 million patients in the USA, 50 million worldwide • Symptoms: • Early: forgetfulness, language impairment • Late: loss of motor control, death • One of the most costly diseases • No cure is known • For this presentation, Alzheimer’s disease ≈ Dementia 2

  3. Why detect Alzheimer’s disease? • Early treatment • No known drugs to slow down progression of AD • But can reduce symptoms! • Clinical trials • Current treatments may be ineffective because started too late! 3

  4. Detecting AD • Many tests: MRI, PET scan • Cognitive tests • Category naming • Picture naming • Picture description • Cognitive tests: cheap, non-intrusive, screening mechanism 4

  5. Category Fluency • Name as many {animals, fruits, colours} as possible in 60 seconds 5

  6. Picture Description • Describe this picture in as much detail as possible 6

  7. Linguistic impairment of AD • People with dementia use language differently! • Word finding difficulties • “the boy is standing on the chair ” • More pronouns / adverbial constructions • “ he’s reaching up there ” • Acoustic abnormality • Higher pause rate, slower speech • Less complex sentences 7

  8. Feature extraction for AD detection Automated tools to extract relevant features • Length of narration • Vocabulary diversity 𝑣𝑜𝑗𝑟𝑣𝑓 𝑥𝑝𝑠𝑒𝑡 • Type-token Ratio: 𝑢𝑝𝑢𝑏𝑚 𝑥𝑝𝑠𝑒𝑡 • Frequency metrics in corpus 8

  9. Syntactic features • Part-of-speech tag counts (e.g: #adj, #noun, #pronoun/#noun) • Constituency parse tree • Max, mean, median heights • Production rule counts • Length of clauses, dependent clauses, coordinate phrases • Dependency parse tree • Mean, median, max dependency distance 9

  10. Machine learning to detect dementia • Fraser (2016) extracts over 400 features and achieves 81% classification accuracy using logistic regression • Fraser, Kathleen C., Jed A. Meltzer, and Frank Rudzicz . "Linguistic features identify Alzheimer’s disease in narrative speech." Journal of Alzheimer's Disease 49.2 (2016): 407-422. 10

  11. DementiaBank • Collected between 1983 to 1988 at University of Pittsburgh • 551 cookie theft narrations (241 healthy, 310 dementia) • Mini Mental State Exam (MMSE), scored out of 30 • Other tasks • Demographic information, diagnosis 11

  12. Mandarin Dataset: Lu Corpus • 49 speakers of Taiwanese Mandarin • Several tasks for each speaker • Cookie theft picture description • Category Fluency (animals, fruits, colours, places in Taiwan) • Picture Naming (30 items) • Transcripts of the picture description available • Diagnostic information unknown 12

  13. Dementia Score using PCA • Derive a proxy score for dementia 13

  14. How to detect dementia in Chinese • Q: Why not just do the same thing that we did with English? • A: Not enough data • Solution: Need to combine datasets somehow, across different languages • Use transfer learning ! 14

  15. Some Domain Adaptation Methods • Large corpus in domain S, small corpus in domain T • Want accurate model for domain T • Existing methods require same features in S and T Daume III, Hal. "Frustratingly Easy Domain Adaptation." Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics . 2007. 15

  16. Cross-language features: Difficulties 她站在一个椅子上 she's standing on a chair she-stand-at-one-CL-chair-on she/PRP 's/VBZ standing/VBG 她 /PN 站 /VV 在 /P on/IN a/DT chair/NN 一 /CD 个 /M 椅子 /NN 上 /LC ???? 16

  17. Cross-language features: Difficulties • Experiments: poor accuracy with universal cross-language features • Model needs to learn not only to detect dementia • It also needs to learn how features correspond across languages • We only have n=49 samples! 17

  18. Idea: Extract features separately, learn correspondences using out-of-domain data! 18

  19. Movie subtitles! • OpenSubtitles corpus, containing aligned subtitles in 62 languages 19

  20. Baselines 1. Unilingual: train model using Mandarin data only, evaluate using cross-validation 2. Google Translate: translate Mandarin narration into English, then run the English classifier 20

  21. Evaluation: Spearman’s correlation between model’s output and dementia score 21

  22. Proposed Model (Learning Feature Correspondence) 1. Extract feature vector 𝑦 in Chinese 2. Extract feature vector 𝑧 in English, independently 3. Learn mapping function 𝑔: 𝑦 → 𝑧 using OpenSubtitles movie dialogue corpus This is a multi-output regression problem 22

  23. Proposed Model (Learning Feature Correspondence) Unsupervised – only English dementia data used during training! 23

  24. Independent Linear Regressions • For each target feature, train a separate linear regression • Use ElasticNet regularization, independent hyperparameter search num_characters num_words pronoun_count noun_verb_ratio … … English Chinese 24

  25. Reduced Rank Regression • Problem: not taking advantage of relationship between outputs • Solution: • Note: equivalent to linear neural network with hidden layer of size 𝑆 num_words num_characters pronoun_count noun_verb_ratio … … 25 English Chinese

  26. Joint Feature Selection • Problem: some features are noisy or impossible to reconstruct • Solution: order by 𝑆 2 , use only the top 𝐿 features 𝑆 2 = 0.8 num_words num_characters pronoun_count 𝑆 2 = 0.1 noun_verb_ratio … … English Chinese 26

  27. Results 𝑞 = 0.06 • Initial model not very good • Reduced rank regression also not effective • Joint feature selection beats baselines 27

  28. Results: Number of features Accuracy of English Performance of classifier using K whole model features 28

  29. 29

  30. Ablation study About 1000-2000 parallel sentences needed 30

  31. Summary First use of NLP to detect dementia in Mandarin Chinese 1. Extracted lexicosyntactic features in English and Chinese 2. Used out-of-domain corpus to learn correspondence model 3. Combined with English dementia classifier 31

  32. Future Work • Need for human transcripts • Incorporate speech data • Apply to other languages (French, Korean) • Collect quality data in multiple languages 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend