Language Technologies Or why we all need large data sets, automatic - PowerPoint PPT Presentation

Oct 03, 2022 •286 likes •360 views

Sociolinguistics and Human Language Technologies Or why we all need large data sets, automatic tools and sharing! Thesis LDC and others collect LARGE data sets to drive speech technology research (LID, ASR, DID, etc) LARGE =

Sociolinguistics and Human Language Technologies Or why we all need large data sets, automatic tools and sharing!
Thesis • LDC and others collect LARGE data sets to drive speech technology research (LID, ASR, DID, etc) • LARGE = – Hundred/Thousands of hours of data per language/dialect – Hundreds/Thousands of speakers – E.g. mixer, fisher, HUB4-5, etc • Many of the technologies that have been developed could support dialect/variation research! – Analysis of large data (word usage, pronunciation, etc.) – Measurement of speaker/dialect variability (intra and inter) – Measurement of channel affects
Case 1 British English vs. American English • WSJ (US English): 200+ hours of read speech • WSJ-CAM0 (British): 90+ hours of read speech • 200+ speakers • Use ASR techniques to learn pronunciation models Literature Proposed System Rule Learned Rule Prob [ae] -> [aa] /_ [+fric, -voiced] [ae] -> [aa] /_ [+fric, -voiced, +front] 0.84 (trap-bath split) [ae] -> [aa] / [-voiced]_ [+fric, -voiced, -front] 0.52 [r] -> ø / _ [+cons] [er] ins -> [ah] / [+vowel] _ [+affric] 1.0 (R Dropping) [er] -> [ah] / l _ [+affric] 1.0 We rediscover known rules AND automatically measured prevalence
Case 2 AAVE/non-AAVE variability • StoryCorps: oral history collect of AAVE/non-AAVE talkers • Simultaneous collection in 15 US cities for NPR • 300+ speakers, 400+ hours / dialect • Automatically identify and retrieve instances of AAVE specific transformations (21 from Wolfram 2005) 0.5 S2: Tri PPM (standard) S3: Tri PPM (sophisticated) A2: Tri APM (standard) 0.4 0.3 0.2 0.1 0 F-measure Precision (recall=0.1)
Mining data for analysis Using the model to explore your corpus Learned rules: uw-[l]: uw-l t iy ch ih z aa r r iy l k uw Sur. t iy ch er z aa r r iy l k uw l Ref. Words: Teachers are real cool
This is just the beginning With more data we will be able to: 1. Characterize in-dialect speaker variability 2. Measure acoustic variability that is too subtle for categorical labeling (see [Shen 09] and [Chen/Shen 11]) 3. Learn rare transformations that are difficult to observe in small data sets. [Chen 10] proposed 700+ AAVE-specific pronunciation transforms 4. Speed data analysis: find regions of dialectal difference using automatic methods

Recommend

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Language and Language and Language and Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6: CALL Language learning Language learning Language learning First language aquisition First language

342 views • 6 slides

Technologies : Retour sur le Futur ? Technologies : Retour sur le Futur ? Technologies : Retour

Technologies : Retour sur le Futur ? Technologies : Retour sur le Futur ? Technologies : Retour sur le Futur ? Technologies : Retour sur le Futur ? FOURPOINTS Funds Info Tech November 4. 2014 Fund Managers: Benot Flamant Twitter:

642 views • 47 slides

BBC Technologies: Our LATAM Experience Who are BBC Technologies? BBC Technologies Where we are

BBC Technologies: Our LATAM Experience Who are BBC Technologies? BBC Technologies Where we are today Technology World leaders in Advance Processing Technologies for small fruits and vegetables. R&D Strong focus on research

383 views • 17 slides

ZEBRA TECHNOLOGIES ZEBRA TECHNOLOGIES DevTalk - Enterprise Browser 2.5 Darryn Campbell SW

ZEBRA TECHNOLOGIES ZEBRA TECHNOLOGIES DevTalk - Enterprise Browser 2.5 Darryn Campbell SW Architect, Zebra Technologies May 20 th 2020 ZEBRA TECHNOLOGIES DevTalk Enterprise Browser 2.5 ZEBRA TECHNOLOGIES DevTalk Enterprise Browser

367 views • 33 slides

Developmental Developmental Disorders affecting Disorders affecting language language

Developmental Developmental Disorders affecting Disorders affecting language language Specific Language Impairment Specific Language Impairment (SLI) (SLI) Specific Language Impairment Specific Language Impairment Specific Language

463 views • 24 slides

Language and Computers Relation to language Encoding written language Prologue: Encoding

Language and Computers Prologue: Encoding Language Writing systems Alphabetic Syllabic Logographic Systems with unusual realization Language and Computers Relation to language Encoding written language Prologue: Encoding Language

893 views • 60 slides

Language and Computers Relation to language Encoding written Prologue: Encoding Language

760 views • 63 slides

CS11-737: Multilingual Natural Language Processing Language contact Yulia Tsvetkov Language

CS11-737: Multilingual Natural Language Processing Language contact Yulia Tsvetkov Language contact Language contact is the use of more than one language in the same place at the same time (Thomason 95) Language contact drives

688 views • 26 slides

Models of Language Evolution models thereof its evolution language Models of Language Evolution

Models of Language Evolution models thereof its evolution language Models of Language Evolution ? What is language? never start with a dictionary definition!! the language of Google search query completion A language is a dialect with

1.13k views • 30 slides

Language Technologies for The Irish Language (Gaeilge) Dr Aodhn Mac Cormaic Assistant

Language Technologies for The Irish Language (Gaeilge) Dr Aodhn Mac Cormaic Assistant Principal Department of Arts, Heritage and the Gaeltacht Ireland Language Technologies for Irish Achoimre / Summary Current status of LT for Irish.

2.08k views • 16 slides

Mathematics Education and Language Diversity: From Language-as-Problem to Language-as-Resource

Mathematics Education and Language Diversity: From Language-as-Problem to Language-as-Resource Mamokgethi Phakeng University of Cape Town www.mamokgethi.com Why is language important for Mathematics teaching and learning? 1. We use language

428 views • 24 slides

Language Types We re going to look at two types of language: figurative language and literal

Language Types We re going to look at two types of language: figurative language and literal language 1 Figurative vs. Literal To understand what figurative language is,one needs to understand the difference between figurative and literal

330 views • 17 slides

Hypertext Markup Language Introduction to Web Design Hypertext Markup Language Introduction to

Hypertext Markup Language Introduction to Web Design Hypertext Markup Language Introduction to Web Design A language for describing Web pages HTML HTML is not a programming language, it is a markup language A markup language is a set of

279 views • 12 slides

The language COMP 520 Fall 2009 The JOOS language (2) The Java language: was originally

COMP 520 Fall 2009 The JOOS language (1) The language COMP 520 Fall 2009 The JOOS language (2) The Java language: was originally called Oak; was developed as a small, clean, OO language for programming consumer devices; was built

599 views • 35 slides

The language COMP 520 Fall 2012 The JOOS language (2) The Java language: was originally

COMP 520 Fall 2012 The JOOS language (1) The language COMP 520 Fall 2012 The JOOS language (2) The Java language: was originally called Oak; was developed as a small, clean, OO language for programming consumer devices; was built

632 views • 35 slides

Hypertext Markup Language Drawing on the Web Hypertext Markup Language Drawing on the Web A

Hypertext Markup Language Drawing on the Web Hypertext Markup Language Drawing on the Web A language for describing Web pages HTML HTML is not a programming language, it is a markup language A markup language is a set of markup tags HTML

652 views • 13 slides

collect | preserve | share | animate history! New England Archivist Fall Mee5ng October 14,

collect | preserve | share | animate history! New England Archivist Fall Mee5ng October 14, 2016 "Building Bridges: Theory and Prac5ce for Collec5ons and User Access Across Boundaries" Amita Kiley, Collec5ons Manager, Lawrence

303 views • 13 slides

Documentary 2020-21 Season Nikki Ward Hannah Zechman State Coordinator, Tennessee History Day

Documentary 2020-21 Season Nikki Ward Hannah Zechman State Coordinator, Tennessee History Day Assistant State Coordinator, Tennessee History Day Tennessee Historical Society Tennessee Historical Society Ground Floor, War Memorial Building

424 views • 20 slides

Universitt des Saarlandes FR 3.4

Universitt des Saarlandes FR 3.4 Geschichte, FR 4.7 Allgemeine Linguistik Seminar: Unlocking the Secrets of the Past: Text Mining for Historical Documents C. Sporleder, M. Schreiber

333 views • 12 slides

Using the Isabelle Ontology Framework Using the Isabelle Ontology Framework Linking the Formal

Using the Isabelle Ontology Framework Using the Isabelle Ontology Framework Linking the Formal with the Informal Linking the Formal with the Informal Achim D. Brucker, Idir Ait-Sadoune Achim D. Brucker, Idir Ait-Sadoune Paolo Crisafulli and

453 views • 21 slides

FILLING IN THE MARGINS: THE USE OF QUEER THEORY, FEMINIST STANDPOINT THEORY, & CRITICAL RACE

FILLING IN THE MARGINS: THE USE OF QUEER THEORY, FEMINIST STANDPOINT THEORY, & CRITICAL RACE THEORY TO BUILD INCLUSIVE ARCHIVAL COLLECTIONS Jen LaBarbera THEORETICAL FRAMEWORK critical race theory queer theory feminist standpoint

694 views • 11 slides

SIGMICRO Viability Review Pradip Bose Chair, ACM SIGMICRO March 14, 2013, Chicago, IL SIGMICRO

SIGMICRO Viability Review Pradip Bose Chair, ACM SIGMICRO March 14, 2013, Chicago, IL SIGMICRO Goals Provide a forum for discussing the state-of-the-art in computer microarchitecture and for stimulating the advancement of that state

461 views • 6 slides

PRIM&Rs Primer on the Notice of Proposed Rulemaking (NPRM) September 15, 2015 1:00-2:30

9/15/2015 PRIM&Rs Primer on the Notice of Proposed Rulemaking (NPRM) September 15, 2015 1:00-2:30 PM ET 1 9/15/2015 The Process Advanced Notice of Notice of Proposed Proposed Rule Final Rule Making Rule Making (NPRM)

1.09k views • 53 slides

OpenWordnet-PT: A Project Report Alexandre Rademaker 1 , 5 Valeria de Paiva 2 Gerard de Melo 3 Livy

OpenWordnet-PT: A Project Report Alexandre Rademaker 1 , 5 Valeria de Paiva 2 Gerard de Melo 3 Livy Maria Real Coelho 4 Maria Gatti 5 FGV/EMAp Nunance Comm. Tsinghua University UFP IBM Research February 2, 2014 Why we started openWordnet-PT?

646 views • 27 slides