Exploring Measures of Readability for Spoken Language Introduction - PowerPoint PPT Presentation

’Readability’ for Spoken Language Sowmya Vajjala and Detmar Meurers Exploring Measures of “Readability” for Spoken Language Introduction Analyzing linguistic features of subtitles to identify age-specific TV programs Our Approach The Corpus Features Tools and Resources Sowmya Vajjala and Detmar Meurers Experiments Setup and General Results Feature Selection University of T¨ ubingen, Germany Ablation Test Results Confusion Matrix Effect of Text Size Conclusions The 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations @ EACL, Gothenburg, April 27, 2014 1 / 19

’Readability’ for The talk in a nutshell Spoken Language Sowmya Vajjala and Detmar Meurers Introduction Our Approach ◮ Idea: investigate if features from readability assessment The Corpus can be used to characterize age-specific TV programs. Features Tools and Resources ◮ based on a corpus of BBC subtitles Experiments Setup and General Results ◮ using a text classification approach Feature Selection Ablation Test Results Confusion Matrix Effect of Text Size ◮ We show that the authentic materials targeting specific Conclusions age groups exhibit ◮ a broad range of linguistic and psycholinguistic characteristics ◮ indicative of the complexity of the language used. ◮ Our approach reaches an accuracy of 95.9%. 2 / 19

’Readability’ for Motivation Spoken Language Sowmya Vajjala and Detmar Meurers Introduction ◮ Reading, listening and watching TV are all ways to Our Approach obtain information. The Corpus Features ◮ Some TV programs are also created for particular Tools and Resources Experiments age-groups (similar to graded readers). Setup and General Results Feature Selection ◮ Audio-visual presentation and language are important Ablation Test Results Confusion Matrix factors in creating age-specific TV programs. Effect of Text Size Conclusions ◮ How characteristic of the targeted age group is language by itself? ◮ We hypothesize that the linguistic complexity of the subtitles is a good predictor. ◮ We explore this hypothesis using features from automatic readability assessment. 3 / 19

’Readability’ for Our Approach: Overview Spoken Language Sowmya Vajjala and Detmar Meurers Introduction Our Approach The Corpus Features ◮ Corpus : BBC subtitles (Van Heuven et al. 2014) Tools and Resources Experiments ◮ TV programs targeting different age groups Setup and General Results Feature Selection Ablation Test Results ◮ Features : range of properties, mostly from Second Confusion Matrix Effect of Text Size Language Acquisition and Psycholinguistic research Conclusions ◮ Modeling : three-class text classification ◮ Evaluation : accuracy, with 10-fold cross-validation 4 / 19

’Readability’ for The BBC Subtitles Corpus Spoken Language Sowmya Vajjala and Detmar Meurers Introduction ◮ The BBC started subtitling all scheduled programs on Our Approach its main channels in 2008. The Corpus Features Tools and Resources ◮ Van Heuven et al. (2014) compiled a subtitles corpus Experiments Setup and General Results from nine BBC TV channels. Feature Selection Ablation Test Results Confusion Matrix ◮ Subtitles of four channels are annotated: Effect of Text Size CBeebies, CBBC, News and Parliament. Conclusions ◮ Corpus in numbers: Program Category Age group # texts avg. tokens avg. sentence length per text (in words) CBEEBIES < 6 years 4846 1144 4.9 CBBC 6–12 years 4840 2710 6.7 Adults (News + Parliament) > 12 years 3776 4182 12.9 ◮ We use a balanced set consisting of 3776 texts per class. 5 / 19

’Readability’ for Features 1 Spoken Language Sowmya Vajjala and Detmar Meurers ◮ Lexical Features Introduction ◮ lexical richness features from Second Language Our Approach Acquisition (SLA) research The Corpus Features ◮ e.g., Type-Token ratio, noun variation, . . . Tools and Resources Experiments ◮ POS density features Setup and General Results ◮ e.g., # nouns/# words, # adverbs/# words, . . . Feature Selection Ablation Test Results Confusion Matrix ◮ traditional features and formulae Effect of Text Size ◮ e.g., # characters per word, Flesch-Kincaid score, . . . Conclusions ◮ Syntactic Features ◮ syntactic complexity features from SLA research. ◮ e.g., # dep. clauses/clause, average clause length, . . . ◮ other parse tree features ◮ e.g., # NPs per sentence, avg. parse tree height, . . . = Features from Vajjala & Meurers (2012) 6 / 19

’Readability’ for Features 2 Spoken Language Sowmya Vajjala and Detmar Meurers Introduction Our Approach The Corpus ◮ Morphological properties of words Features Tools and Resources ◮ e.g., Does the word contain a stem along with an affix? Experiments Setup and General Results abundant=abound+ant Feature Selection Ablation Test Results ◮ Age of Acquisition (AoA) Confusion Matrix Effect of Text Size ◮ average age-of-acquisition of words in a text Conclusions ◮ Other Psycholinguistic features ◮ e.g., word abstractness ◮ Avg. number of senses per word (obtained from WordNet) 7 / 19

’Readability’ for Implementation details Spoken Language Sowmya Vajjala and Tools, Resources and Algorithms used Detmar Meurers Introduction ◮ Tools: Our Approach ◮ For Lexical Features The Corpus Features ◮ Stanford Tagger (Toutanova et al. 2003) Tools and Resources Experiments ◮ For Syntactic Features Setup and General Results Feature Selection ◮ Berkeley Parser (Petrov & Klein 2007) Ablation Test Results Confusion Matrix ◮ Tregex Pattern Matcher (Levy & Andrew 2006) Effect of Text Size ◮ For Classification: Conclusions ◮ algorithms implemented in WEKA (http://www.cs.waikato.ac.nz/ml/weka/) . ◮ Resources: ◮ Celex Lexical Database (http://celex.mpi.nl) ◮ Kuperman et al. (2012)’s AoA ratings ◮ MRC Psycholinguistic database (http://ota.oucs.ox.ac.uk/headers/1054.xml) ◮ Wordnet Database (http://wordnet.princeton.edu) 8 / 19

’Readability’ for Classification Experiments Spoken Language Sowmya Vajjala and Detmar Meurers Introduction Our Approach The Corpus Features ◮ We explored several classification algorithms Tools and Resources (SMO, J48 Decision tree, Logistic Regression) Experiments Setup and General Results ◮ SMO marginally outperformed the others (1–1.5%). Feature Selection Ablation Test Results ◮ So, all further experiments were performed with SMO. Confusion Matrix Effect of Text Size Conclusions ◮ Random baseline: 33% ◮ Sentence length baseline: 71.4% ◮ Accuracy using the full set of 152 features: 95.9%. 9 / 19

’Readability’ for Feature Selection Spoken Language Sowmya Vajjala and Detmar Meurers Introduction Our Approach The Corpus We explored two feature selection approaches to understand Features Tools and Resources which features contribute the most to classification accuracy. Experiments Setup and General Results Feature Selection Ablation Test Results 1. Select features individually based on Information Gain (IG) Confusion Matrix Effect of Text Size ◮ implemented as InfoGainAttributeEval in WEKA Conclusions 2. Select a subset of those features that do not correlate with each other but are highly predictive. ◮ implemented as CfsSubsetEval (Hall 1999) in WEKA 10 / 19

’Readability’ for Results for Top 10 IG features Spoken Language Sowmya Vajjala and Detmar Meurers Introduction Rank Feature Accuracy Our Approach 1 avg. AoA (Kuperman et al. 2012) 82.4% The Corpus 2 avg. # PPs in a sentence 74.0% Features Tools and Resources 3 avg. # instances where the lemma 77.7% Experiments has stem and affix Setup and General Results Feature Selection 4 avg. parse tree height 73.4% Ablation Test Results Confusion Matrix 5 avg. # NPs in a sentence 73.0% Effect of Text Size 6 avg. # instances of affix substitution 74.3% Conclusions 7 avg. # prep. in a sentence 72.0% 8 avg. # instances where a lemma is 68.3% not a count noun 9 avg. # clauses per sentence 72.5% 10 sentence length 71.4% Accuracy with all the 10 features together: 84.5%. 11 / 19

’Readability’ for Feature selection with CfsSubsetEval Spoken Language Sowmya Vajjala and Detmar Meurers Introduction Our Approach The Corpus Features Tools and Resources Experiments ◮ 41 features of the total of 152 features are selected. Setup and General Results Feature Selection Ablation Test Results ◮ The full list of selected features is provided in the paper. Confusion Matrix Effect of Text Size ◮ Classification accuracy with 41 features: 93.9%. Conclusions → only 2% less than classification with all the features 12 / 19

’Readability’ for Feature selection: Result Summary Spoken Language Sowmya Vajjala and Detmar Meurers Introduction Our Approach The Corpus Features Tools and Resources Feature Subset (#) Accuracy SD on 10-fold CV Experiments All Features (152) 95.9% 0.37 Setup and General Results Feature Selection Cfs on all features (41) 93.9% 0.59 Ablation Test Results Confusion Matrix Top-10 IG features (10) 84.5% 0.70 Effect of Text Size Conclusions Avg. SD for the test sets in all CV folds is given to make comparisons in terms of statistical significance possible. 13 / 19

Exploring Measures of Readability for Spoken Language Introduction - PowerPoint PPT Presentation

Readability for Spoken Language Sowmya Vajjala and Detmar Meurers Exploring Measures of Readability for Spoken Language Introduction Analyzing linguistic features of subtitles to identify age-specific TV programs Our Approach The

Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring

Spoken Language Structure Hsin-min Wang References: - X. Huang et al., Spoken Language

Readability: a one-hundred-year-old field still in his teens Thomas Franois CENTAL (IL&C),

Spoken Language Structure Berlin Chen 2004 References: - X. Huang et. al., Spoken Language

Spoken Language Structure Berlin Chen 2003 References: - X. Huang et. al., Spoken Language

Defining EBCL descriptors for Reception Spoken and Production Spoken Federica Casalin

On the Readability of Boundary Labeling Lukas Barth, Andreas Gemsa, Benjamin Niedermann, Martin

Readability Assessment for Sentences Introduction Motivation, Methods and Evaluation Background

Spoken and Sign Languages Spoken and Sign Languages A Cross Modal Study Purushottam Kar Achla

STANDARDS IN SPOKEN CORPORA OUTLINE (1) Case study: Spoken

Uncertainty in Spoken Uncertainty in Spoken Multimodal - speakers have intentions - speech,

THE SPOKEN BLESSING Numbers 6:22 27 Since the start of human history, the spoken blessing

Speech Processing 15-492/18-492 Spoken Dialog Systems Conversing with machines Spoken Dialog

Grading Quiz in Moodle Spoken Tutorial Project https://spoken-tutorial.org National Mission on

Grounding LING 575: Spoken Dialog Systems May 12 th , 2016 1 What is Grounding? Spoken Dialog

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Steven C. Campbell, MD, PhD Chair AUA Guidelines Panel Professor Surgery, Vice Chair, Program

Anticoagulation Therapy Your key questions for 2018 clinical practice addressed Supported by an

Corporate Overview LISHAN AKLOG, MD Chairman & CEO February 18, 2020 Nasdaq: PAVM, PAVMZ

Understanding Alzheimer diseases structural connectivity through explainable AI Essemlali

Shortcut-Stacked Sentence Encoders for Multi-Domain Inference Yixin Nie & Mohit Bansal 1

SVS AVF Clinical Practice Guidelines Venous Ulcer SVS AVF

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning

Laser drilling of a Copper Mesh Vincenzo Berardi U.O.S. Bari, Italy Dip. Interuniversitario di

Exploring Measures of Readability for Spoken Language Introduction - PowerPoint PPT Presentation

Readability for Spoken Language Sowmya Vajjala and Detmar Meurers Exploring Measures of Readability for Spoken Language Introduction Analyzing linguistic features of subtitles to identify age-specific TV programs Our Approach The

Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring

Spoken Language Structure Hsin-min Wang References: - X. Huang et al., Spoken Language

Readability: a one-hundred-year-old field still in his teens Thomas Franois CENTAL (IL&amp;C),

Spoken Language Structure Berlin Chen 2004 References: - X. Huang et. al., Spoken Language

Spoken Language Structure Berlin Chen 2003 References: - X. Huang et. al., Spoken Language

Defining EBCL descriptors for Reception Spoken and Production Spoken Federica Casalin

On the Readability of Boundary Labeling Lukas Barth, Andreas Gemsa, Benjamin Niedermann, Martin

Readability Assessment for Sentences Introduction Motivation, Methods and Evaluation Background

Spoken and Sign Languages Spoken and Sign Languages A Cross Modal Study Purushottam Kar Achla

STANDARDS IN SPOKEN CORPORA OUTLINE (1) Case study: Spoken

Uncertainty in Spoken Uncertainty in Spoken Multimodal - speakers have intentions - speech,

THE SPOKEN BLESSING Numbers 6:22 27 Since the start of human history, the spoken blessing

Speech Processing 15-492/18-492 Spoken Dialog Systems Conversing with machines Spoken Dialog

Grading Quiz in Moodle Spoken Tutorial Project https://spoken-tutorial.org National Mission on

Grounding LING 575: Spoken Dialog Systems May 12 th , 2016 1 What is Grounding? Spoken Dialog

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Steven C. Campbell, MD, PhD Chair AUA Guidelines Panel Professor Surgery, Vice Chair, Program

Anticoagulation Therapy Your key questions for 2018 clinical practice addressed Supported by an

Corporate Overview LISHAN AKLOG, MD Chairman &amp; CEO February 18, 2020 Nasdaq: PAVM, PAVMZ

Understanding Alzheimer diseases structural connectivity through explainable AI Essemlali

Shortcut-Stacked Sentence Encoders for Multi-Domain Inference Yixin Nie &amp; Mohit Bansal 1

SVS AVF Clinical Practice Guidelines Venous Ulcer SVS AVF

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with 2D-3D Multi-Feature Learning

Laser drilling of a Copper Mesh Vincenzo Berardi U.O.S. Bari, Italy Dip. Interuniversitario di

Readability: a one-hundred-year-old field still in his teens Thomas Franois CENTAL (IL&C),

Corporate Overview LISHAN AKLOG, MD Chairman & CEO February 18, 2020 Nasdaq: PAVM, PAVMZ

Shortcut-Stacked Sentence Encoders for Multi-Domain Inference Yixin Nie & Mohit Bansal 1