speech processing 15 492 18 492
play

Speech Processing 15-492/18-492 Spoken Dialog Systems SDS - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Spoken Dialog Systems SDS components Spoken Dialog Systems More than just ASR and TTS More than just ASR and TTS Recognition Recognition Parsing Parsing Manipulation of utterances


  1. Speech Processing 15-492/18-492 Spoken Dialog Systems SDS components

  2. Spoken Dialog Systems More than just ASR and TTS � More than just ASR and TTS � � Recognition Recognition � � Parsing Parsing � � Manipulation of utterances Manipulation of utterances � � Generation of new information Generation of new information � � Text generation Text generation � � Synthesis Synthesis �

  3. SDS Architecture

  4. SDS Internals � Parser Parser � � From words to structure From words to structure � � Dialog Manager Dialog Manager � � State of dialog (who is talking) State of dialog (who is talking) � � Direction of dialog (what next) Direction of dialog (what next) � � References, user profile etc References, user profile etc � � Interaction of database/internet Interaction of database/internet � � Language Generation Language Generation � � From structure to words From structure to words �

  5. Parsing Parsing of SPEECH not TEXT � Parsing of SPEECH not TEXT � � Eh, I Eh, I wanna wanna go, go, wanna wanna go to Boston tomorrow go to Boston tomorrow � � If its not too much trouble I’d be very grateful if If its not too much trouble I’d be very grateful if � one might be able to aid me in arranging my one might be able to aid me in arranging my travel arrangements to Boston, Logan airport, travel arrangements to Boston, Logan airport, at sometime tomorrow morning, thank you. at sometime tomorrow morning, thank you. � Boston, tomorrow Boston, tomorrow �

  6. Parsing: Output structure “I I wanna wanna go to Boston, tomorrow” go to Boston, tomorrow” � “ � � ���������������� ���������������� � � ����������������������� ����������������������� � � �������������������� �������������������� � � �������������������� �������������������� � Convert speech to structure � Convert speech to structure � � Sufficient for further processing/query Sufficient for further processing/query �

  7. Phoenix Parser [Place] [NextBus] (carnegie mellon university) (*WHEN_IS *the next *BUS) (downtown) (*WHEN_IS *the BUS after that *BUS) (robinson towne center) (the airport) WHEN_IS (south hills junction) (when is) (mount oliver) (when's) (the south side) (oakland) BUS (bloomfield) (bus) (polish hill) (one) (the strip district) ; (the north side) 7 ;

  8. Phoenix Parser Parse what is important � Parse what is important � Ignore other parts � Ignore other parts � Map know parts to usually information � Map know parts to usually information �

  9. Parsing vs Language Model Language Model � Language Model � � Model what actually gets says Model what actually gets says � Parsing � Parsing � � Extract the information you want Extract the information you want � Models *can* be shared � Models *can* be shared � � Only accept things in the grammar Only accept things in the grammar � � Can be over limiting Can be over limiting �

  10. Dialog Manager Maintain state � Maintain state � � Where are we in the dialog Where are we in the dialog � � Whose turn is it Whose turn is it �  Waiting for speaker Waiting for speaker   Waiting for database query (stall user) Waiting for database query (stall user)  � Deal with barge Deal with barge- -in in �

  11. Language Generation Query for flights to Boston � Query for flights to Boston � Template fill answer(s answer(s) ) � Template fill � � The next flight to DEST leaves at The next flight to DEST leaves at � DEPART_TIME arriving at ARRIVE_TIME. DEPART_TIME arriving at ARRIVE_TIME. Templates may be much more complex � Templates may be much more complex �

  12. Language Generation � Choose which template to use Choose which template to use � � Based on state, answer type Based on state, answer type � � Natural variation Natural variation � � Statistical variation Statistical variation � � Include < Include <ssml ssml> tags to help synthesis > tags to help synthesis � � Can < Can <emph emph>emphasize</ >emphasize</emph emph> parts > parts � � Can identify dates, numbers etc. Can identify dates, numbers etc. � � Humans like variation in the output Humans like variation in the output � � It is rare for a human to repeat things exactly It is rare for a human to repeat things exactly �

  13. Language Generation � Frames structures to (marked up) text Frames structures to (marked up) text � � START: Pittsburgh START: Pittsburgh � � END: Boston END: Boston � � DATE: 20081028 DATE: 20081028 � � TIME: 07:45 TIME: 07:45 � � FLIGHT: US075 FLIGHT: US075 � � Can generation Can generation � � I have US 075 leaving at 07:45 tomorrow I have US 075 leaving at 07:45 tomorrow � � US Airways has a flight departing tomorrow at 07:45 US Airways has a flight departing tomorrow at 07:45 �

  14. Standardized things Help � Help � � User should be able to get help at any time User should be able to get help at any time � � Explain where they are and what they are Explain where they are and what they are � expected to say (with explicit examples) expected to say (with explicit examples) Errors � Errors � � “I didn’t understand” … “I didn’t understand” … � Confirmation � Confirmation � � Did you say “Boston”? Did you say “Boston”? �

  15. Confirmation Explicit confirmation � Explicit confirmation � � Where are you traveling to ? Where are you traveling to ? � Boston Boston � Boston, did I get that right? Boston, did I get that right? � Yes Yes

  16. Confirmation Implicit confirmation � Implicit confirmation � � Where are you traveling to? Where are you traveling to? � Boston Boston � Boston, where … Boston, where … � <can barge in> <can barge in>

  17. Confirmation Explicit confirmation � Explicit confirmation � � Safe but slow Safe but slow � Implicit confirmation � Implicit confirmation � � Natural, but requires good support for barge Natural, but requires good support for barge- -in in �

  18. Grounding Showing evidence the system understands � Showing evidence the system understands � � Where are you traveling to? Where are you traveling to? � Boston. Boston. Right. Where …. Right. Where …. Boston, right. Where …. Boston, right. Where ….

  19. Designing Prompts Constrain your questions: � Constrain your questions: � � How may I help you? How may I help you? �  Long story reply Long story reply  � What bus number would like schedules for? What bus number would like schedules for? �  Expect bus number replies Expect bus number replies 

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend