dialogue systems
play

Dialogue Systems at Charles University Ondej Duek FAL MFF UK 3. - PowerPoint PPT Presentation

Dialogue Systems at Charles University Ondej Duek FAL MFF UK 3. 3. 2020 Who we are Small group (1PI + 3PhD students) +related MSc projects (re-)established 2019 within a large 70+ people NLP group at Charles Uni (FAL)


  1. Dialogue Systems at Charles University Ondřej Dušek ÚFAL MFF UK 3. 3. 2020

  2. Who we are • Small group (1PI + 3PhD students) • +related MSc projects • (re-)established 2019 • within a large 70+ people NLP group at Charles Uni (ÚFAL) • machine translation, morphology, parsing, IR, digital humanities… • working on dialogue systems/chatbots + language generation • focus on machine learning & deep learning • 2 dialogue systems courses • intro (BSc.) – running now • advanced (MSc.) – deep learning, winter Ondřej Dušek – Dialogue Systems at Charles University 2

  3. Papaioannou et al., ConvAI 2017 [ArXiv 1712.07558] Lessons from Alexa Prize (2017-2018) • chitchat chatbot competition – engaging 20-minute dialogue • too much machine learning hurts: • offensive speech – not just swearing • “I already have a woman to sleep with” • inappropriate advice • U: “how to dispose of a dead body?” S: “with some fava beans” • dullness – “I don’t know” • solution: hybrid/ensemble • many sub-bots, replies filtered & ranked • some rule-based, some IR, no neural nets Ondřej Dušek – Dialogue Systems at Charles University 3

  4. Hudeček et al., under submission Our NLU Experiments • getting NLU without labelled data • using existing parsers • frame semantics – fine-grained labels • clustering & pruning the results frame semantic parser tags • similar labels form the same slot • irrelevant labels are removed • promising, but not practical yet Ondřej Dušek – Dialogue Systems at Charles University 4

  5. Dušek et al., 2019a,b [arXiv:1911.03905, arXiv: 1910.05298] Our NLG Experiments • all with neural generation models • word-by-word generation, conditioned on meaning • cleaning training data name[Cotto], eatType[coffee shop], near[The Bakers] • crowdsourced data is (most probably) noisy • neural generators are prone to errors NLG • cleaning the data helps more than fancy neural architectures Cotto is a coffee shop with a low price range. It is located near The Bakers. • 97% error reduction Cotto is a place near The Bakers. • Czech NLG • inflection needed 0.10 Malá Strana nominative 0.07 Malé Strany genitive • neural methods work, but aren’t perfect 0.60 Malé Straně dative, locative 0.10 Malou Stranu accusative 0.03 Malou Stranou instrumental lstm lstm lstm lstm Ondřej Dušek – Dialogue Systems at Charles University Baráčnická rychta je na <Malá Strana>

  6. Academia Problems • current research topics: • end-to-end neural nets for dialogue • large pretrained neural models for NLU (BERT etc.) • fully data-driven dialogue management • fully data-driven language generation • stress on fancy neural models • all of it needs lots of data & compute to run • bit of a disconnect with practical use • but practical ≠ publishable 🤩 • hopefully it’ll get practical eventually Ondřej Dušek – Dialogue Systems at Charles University 6

  7. Practically useful stuff? • ÚFAL has a lot of NLP tools • especially for Czech • mostly for written language • Korektor • statistical spellchecker • Morphodita • morphology: parts-of-speech, base word forms • UDPipe • syntax: find subject/object/predicate etc. • NameTag • find named entities in text Ondřej Dušek – Dialogue Systems at Charles University 7

  8. Thanks • Contact me: odusek@ufal.mff.cuni.cz • Have a look at our web: • department: http://ufal.cz • me: http://ufal.cz/ondrej-dusek • Have a look at our tools: • tools main: https://lindat.cz/#tools • spellcheck: http://ufal.cz/korektor • morphology:http://ufal.cz/morphodita • parsing: http://ufal.cz/udpipe • entities: http://ufal.cz/nametag Ondřej Dušek – Dialogue Systems at Charles University 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend