NICT Use Cases and Requirements for New Models of Human Language to Support Mobile Conversational Systems
Chiori Hori and Teruhisa Misu Spoken Language Communication Group NICT, Japan
W3C: Workshop on Conversational Applications, June 2010,
NICT Use Cases and Requirements for New Models of Human Language to - - PowerPoint PPT Presentation
W3C: Workshop on Conversational Applications, June 2010, NICT Use Cases and Requirements for New Models of Human Language to Support Mobile Conversational Systems Chiori Hori and Teruhisa Misu Spoken Language Communication Group NICT, Japan
W3C: Workshop on Conversational Applications, June 2010,
1986 2006
Advanced Telecommunications Research Institute International
Speech-to-Speech Translation + Spoken Dialog System
eps: Grt(start) Stt(prf (spot/general)): eps eps: Extrct_kywd eps: Mk_ rcmdlist(kwd) eps: Set_tgt eps: Expln(tgt) eps: OQ(DST) Rqst(rcmd) : eps eps: Set_rcmdlist eps: Rcmd(tgt) eps : Chck_ forloop(rcmdlist) Accept : Set_imprs (Decided) eps: Cnfrm(dcst) eps: Prcs4imprs(tgt) Neutral: Keep(tgt, rcmdlist) Decided: Mv(tgt, rcmdlist, dcsnlist) Negative: Remove(tgt, rcmdlist) Eexperienced:Remove(tgt, rcmdlist) eps: Trnst(if_forloop_done) eps : Stt(next_act) eps: Trnst(if_forloop_not_end) eps : Aagree Stt(ro_requirement) : Grt(end) eps : Stt(prcs(rcmd)) Stt(exprnc)/ : Set_imprs(Experienced) Stt(imprs(Next/Bad) : Set_imprs(Neutral/Bad) eps: Rspns2imprs eps: Grt(end) eps : Chck_rcmdlist eps : Trnst (if_data_in_rcmdlist) eps: Rqst(dcsn4rcmdlist) Stt (no_prefered_tgt) : eps eps : Trnst(if_nodata_in rcmdlist) Stt(prf(tgt)) : Set A as tgt Set_imprs(Positive) eps: Set_imprs(Decided) Good : Set_imprs (Positive)
Make Make a recommendation list a recommendation list Recommend each spot in the list Recommend each spot in the list Check user’s Check user’s preference preference
Confirm users’ final decision Confirm users’ final decision
Corpus-based Corpus-based DM DM
Convert
W Weighted eighted F Finite inite State tate T Transducer ransducer
input and output
symbols with weights
the weights.
Slot-Filling for Origin and Destination
User input System response Input Concept tag Action Tag Response From where? From Osaka. To where? To Tokyo
ε
Ask_ORG From_<city> Fill_ORG Ask_DST Fill_DST To_<city>
ε ε ε ε
exit
ε :
Ask_DST/1 To_<city> : Fill_DST/0 From_<city> : Fill_ORG/0
ε :
Ask_ORG/0
ε:ε /0 ε : exit/2 ε ε
* Slot handling
<word-class label="station"> Tokyo Kyoto </word-class> <keyword-class label="origin"> (station) </keyword-class> <keyword-class label="time"> six seven eight nine ten eleven twelve </keyword-class> <keyword-class label="destination"> (station) </keyword-class> <plan repeat="true"> from,(origin) to,(destination) </plan> <depart> at,(time) </depart
Statistical Language models for ASR are required to be tuned depending on the current dialogue context determined by previous system prompt, dialogue situations.
We need to implement speech recognition systems which are more robust to natural language expressions. N-gram language models can be a solution. Consequently, we will need a framework to label semantic annotations on ASR results, afterward.
To realize context sensitive semantic annotation for SLU, we need a description for WFST.