Natural Language 19th century Processing & Machine Learning - PowerPoint PPT Presentation

Natural Language 19th century Processing & Machine Learning for Speech Paula Buttery, Andrew Caines, Helen Yannakoudakis; NLIP Group, Dept. Computer Science & Technology, Cambridge. Source: https://commons.wikimedia.org/wiki/File:Spectrogram-19thC.png

Overview Speech vs Writing Speech processing Speech scoring Speech and writing How to treat Example of NLP and share some transcriptions of machine learning for commonalities but speech so that we can speech applications: exhibit many apply natural language automated assessment differences, not just in processing techniques: mode of transmission, more training data, but form, construction test data and grammar normalisation, domain adaptation

Speech vs Writing Andrew Caines Source: https://commons.wikimedia.org/wiki/File:Spectrogram-19thC.png

Characteristics of Speech ● Speech is very different from writing ● Background reading: ○ mode of transmission ○ Carter & McCarthy, 2017, ‘Spoken phonetics, prosody, gesture (including Grammar: Where are we and where ○ sign language) are we going?’ Applied Linguistics. put these aside for now: consider the Lau, Clark & Lappin, 2017, ○ ○ aspects we can examine in ‘Grammaticality, Acceptability, and transcriptions Probability: A probabilistic view of ○ i.e. the lexis, morphology, syntax, linguistic knowledge.’ Cognitive Science . semantics, pragmatics, discourse Plank, 2016, ‘What to do about ○ ○ note that the default speech mode non-standard (or non-canonical ) involves interaction, no editing, language in NLP.’ KONVENS . multimodal grounding, background ○ Jurafsky & Martin, 2nd edn., Ch. 9 & 10. noise, facial expression and gesture

Characteristics of Speech ● BBC News ‘Brexit: May to make plea to ● Spoken corpus examples MPs for time to change deal’ ○ um he’s a closet yuppie is what he is (Leech 2000) https://www.bbc.co.uk/news/uk-47187491 ○ Prime Minister Theresa May will ask ○ I played, I played against um (Leech MPs to give her more time to secure 2000) changes to the controversial part of her ○ You’re happy to -- welcome to include it Brexit deal - the Northern Irish (Levelt 1989) backstop. Mrs May is due to report back ● British National Corpus conversations: to MPs this week, after trying to Oi you, he's playing with your ○ persuade the EU to make last-minute ○ Oh let's have a, is it in there? changes. Labour wants to hold Mrs May (unclear) no ○ to her word and make sure the vote is ○ (pause) right, we'll have another cup of held. The shadow Brexit secretary, Sir tea and then we'll have that nice cake Keir Starmer, has said Labour has ○ https://corpus.byu.edu/bnc [KGC] drafted an amendment which, if passed this week, would guarantee a vote by the end of the month.

Characteristics of Speech That was the only thing I got in braille, ○ pretty much, the seven years I was ● BBC Radio 4 In Touch: Navigating there. So, they hooked me in and then University yeah … https://www.bbc.co.uk/sounds/play/m0001f1d (2:20) White: And didn’t really follow ○ ○ Megan: When I started as an through. undergraduate, I’d chosen the Megan: No and the sad thing was, as ○ University of Gloucestershire and when well, I’d emailed the disability I went on the open days they were the department just before I got on the only university who gave me a plane to Germany and I said – please, prospectus in braille. I was so made up. could you make the lecturers aware It was interesting because I actually that I’m registered blind so that we can applied two years in advance because I start those discussions early with two took a year out to go and teach English years to go. And when I started at the in Germany. And by the time I came university I walked in to my lectures back, all the disability staff who were and I was met with dismay, clued up seemed to have gone or moved indifference and my lecturers had no on and the disability department was clue about me arriving at all. completely different.

Characteristics of Speech ● Speech is very different from writing ● Problems for NLP: ● Even when viewed in writing ○ Disfluencies Tendency for long coordinated ○ ● (vice versa: imagine hearing written structures / Speech-unit delimitation text read aloud, as in speeches, prayers, Overlap, interruption, subject-less ○ old-school conference papers) structures, verb-less structures, ● Become an observer! acceptability appropriateness clarity over absolute grammaticality, incomplete propositions ○ Co-construction, multimodal physical context, background inter-personal relations & common ground Creativity and language play ○

NLP of speech ● Caines, McCarthy, Buttery, SCNLP 2017

NLP of speech ● Caines & Buttery, SANCL 2014 A = ‘as is’ B = less disfluency C = less morpho-syntactic error D = less lexical error

What to do about speech ● Annotate more data ○ e.g. Switchboard, British National Corpus, CrowdED, ... ● Bring training and test data closer together: i.e. ‘normalisation’ of speech to written-like form e.g. Moore et al 2015, 2016 ○ https://aclanthology.info/papers/C16-1075/c16-1 075 ● Domain adaptation e.g. Daumé III 2009, 2010 ○ https://aclanthology.info/papers/W10-2608/w10- 2608

The normalisation approach Transcription through automatic Detect and remove speech recognition disfluencies SOUND ASR SEGMENT CLEAN NLP Input speech stream Speech-unit Proceed as normal delimitation with all the NLP

Speech Processing Paula Buttery Source: https://commons.wikimedia.org/wiki/File:Spectrogram-19thC.png

Which grammar should we use? Let’s consider grammars you’ve encountered: Phrase Structure Grammars ● ● Dependency Grammars ● Categorial Grammars Feature Structure Grammars ●

Phrase Structure Grammars

Dependency Grammars

Categorial Grammar

Feature Structure Grammars

Parsing can be informed by extra linguistic info

Parsing can be informed by features

Parsing can be informed by extra linguistic info

Speech-unit delimitation

Natural Language 19th century Processing & Machine Learning - PowerPoint PPT Presentation

Natural Language 19th century Processing & Machine Learning for Speech Paula Buttery, Andrew Caines, Helen Yannakoudakis; NLIP Group, Dept. Computer Science & Technology, Cambridge. Source:

Natural Language Understanding We want to communicate with computers using natural language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Outline of todays lecture Overview of Natural Language Generation Components of Natural

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

CS11-737: Multilingual Natural Language Processing Language contact Yulia Tsvetkov Language

Let the AI do the Talk Adventures with Natural Language Generation @MarcoBonzanini PyParis 2018

Natural Language Generation Andrea Zugarini SAILab December 5th, 2019 LabMeeting, December 5th

Natural language is a programming language: Applying natural language processing to software

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Sphere packing, lattice packing, and related problems Abhinav Kumar Stony Brook April 25, 2018

PPSP Tracker Protocol draft-gu-ppsp-tracker-protocol PPSP WG IETF 82 Taipei Rui Cruz

PPSP Peer Protocol draft-gu-ppsp-peer-protocol PPSP WG IETF 82 Taipei Rui Cruz (presenter)

Application layer: Roadmap Principles of network applications Web and HTTP FTP

The Moonshine Module for Conways Group John Duncan and Sander Mack-Crane Case Western

Stone duality for skew Boolean algebras Ganna Kudryavtseva Ljubljana University TACL, 2011

Energy Optimization with Orthogonal Potentials on the Sphere Ryan W. Matzke University of

Public FPGA based DM Public FPGA based DMA Atta A Attacking king UlfFrisk Agenda Background