Preliminary Meeting of the NLP Lab Course SS2020 Master Lab Course - - - PowerPoint PPT Presentation

preliminary meeting of the nlp lab course ss2020
SMART_READER_LITE
LIVE PREVIEW

Preliminary Meeting of the NLP Lab Course SS2020 Master Lab Course - - - PowerPoint PPT Presentation

Research Group Social Computing Department of Informatics Technical University of Munich Preliminary Meeting of the NLP Lab Course SS2020 Master Lab Course - Machine Learning for Natural Language Processing Applications (IN2106, IN4249) Gerhard


slide-1
SLIDE 1

Research Group Social Computing Department of Informatics Technical University of Munich

Preliminary Meeting of the NLP Lab Course SS2020

Master Lab Course - Machine Learning for Natural Language Processing Applications (IN2106, IN4249) Gerhard Hagerer, M.Sc., Monika Wintergerst, M.Sc., Maximilian Wich, M.Sc., PD Dr. Georg Groh

Research Group Social Computing, Department of Informatics, Technical University of Munich

31.01.2020

slide-2
SLIDE 2

Research Group Social Computing Department of Informatics Technical University of Munich

Outline

1 Requirements 2 Registration 3 Procedure 4 Domains

− Opinion Mining − Virtual Dietary Advisor − Hate Speech Detection

Preliminary Meeting of the NLP Lab Course SS2020 1

slide-3
SLIDE 3

Research Group Social Computing Department of Informatics Technical University of Munich

Requirements

Minimum:

  • Master student in computer science, data engineering, or "alike"
  • Good enough English skills
  • Basic programming and machine learning knowledge

Important:

  • Hands-on experience in Python, Pandas, Numpy, and SciPy
  • Basic knowledge about artificial neural networks
  • Basic knowledge about natural language processing

Optimal:

  • Practical experience with Deep Learning frameworks, such as PyTorch, Tensorflow, Theano, Keras, etc.

Preliminary Meeting of the NLP Lab Course SS2020 2

slide-4
SLIDE 4

Research Group Social Computing Department of Informatics Technical University of Munich

Registration

  • Until 12 Feb, send an email to ghagerer@mytum.de, maximilian.wich@tum.de, and monika.wintergerst@tum.de containing

− the subject "NLP Lab Course Registration - Domain X" − your CV, − your transcript of records, − a ranked list (1., 2., 3.) of the domains you are interested in, − a motivational statement (one paragraph).

  • This email is considered when ranking the interested students for the course.
  • Until 12 Feb, you also have to register for the course on the matching system.
  • Until 20 Feb, you are (probably) notified by the matching system about the status of participation.
  • Until the end of February, you are informed by me about the available topics.

Preliminary Meeting of the NLP Lab Course SS2020 3

slide-5
SLIDE 5

Research Group Social Computing Department of Informatics Technical University of Munich

Email template

Only emails following this format will be considered:

To: Subject: Text: ghagerer@mytum.de; maximilian.wich@tum.de; monika.wintergerst@tum.de NLP Lab Course Registration - Domain 1 Hi, I would like to participate in the NLP Lab Course. My domain priorities are:

  • 1. Domain 1
  • 2. Domain 2
  • 3. Domain 3

Motivation: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. (max. 300 characters) Relevant skills/experiences: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. (max. 300 characters) Best, Your Name CV.pdf Transcript.pdf

Preliminary Meeting of the NLP Lab Course SS2020 4

slide-6
SLIDE 6

Research Group Social Computing Department of Informatics Technical University of Munich

Procedure

Project teams:

  • You are going to work in teams of 2 or 3 persons on one project topic.
  • You can choose with whom to work with the project topic.
  • Every project member has to report and work equally (no dirty business!).

Procedure:

  • There will be one kickoff meeting at the beginning of the semester.
  • There are going to be bi-weekly consulting and progress report sessions.
  • You have to conduct a final project presentation and report at the end of the semester.

Everything else will be announced at the beginning of the semester.

Preliminary Meeting of the NLP Lab Course SS2020 5

slide-7
SLIDE 7

Research Group Social Computing Department of Informatics Technical University of Munich

Domains

The course consists of three parts which are contentwise and organisational independent.

  • Opinion Mining – Gerhard Hagerer
  • Virtual Dietary Advisor – Monika Wintergerst
  • Hate Speech Detection – Maximilian Wich

In your registration email, you have to tell us which domain you find most interesting by ranking all three domains from 1 (most interesting) to 3 (least interesting). Please do not forget to mention the most favorable domain in the subject!

Preliminary Meeting of the NLP Lab Course SS2020 6

slide-8
SLIDE 8

Research Group Social Computing Department of Informatics Technical University of Munich

Domains – Opinion Mining

Gerhard Hagerer, M.Sc.

Preliminary Meeting of the NLP Lab Course SS2020 7

Our case study:

  • Research Questions:

− What are the consumer beliefs on organic food expressed in social media? − In cooperation with TUM School of Management,

Chair of Marketing and Consumer Research

Internet Web Crawler

(relevance filter)

Review Collection OPINION MINING SYSTEM Aspects + Sentiments

slide-9
SLIDE 9

Research Group Social Computing Department of Informatics Technical University of Munich

Domains – Opinion Mining

Gerhard Hagerer, M.Sc.

Methodology:

  • Deep pre-trained embedding models

− BERT, Universal Sentence Encoder, GloVe, fastText,

word2vec, ...

− Semantically aligned multi-lingual embeddings (XLING,

MUSE, ...)

− Derive meaningful document representations from these.

  • Clustering techniques

− Optimization of semantic coherence − Density-based vs. convex clustering

  • Tasks

− Unsupervised Aspect Extraction − Sentiment Analysis − Neural Topic Modelling − Cluster visualization − Class coherence and overlapping analysis

Samples by nationality

Preliminary Meeting of the NLP Lab Course SS2020 8

slide-10
SLIDE 10

Research Group Social Computing Department of Informatics Technical University of Munich

Domains – Virtual Dietary Advisor

Monika Wintergerst, M.Sc.

Preliminary Meeting of the NLP Lab Course SS2020 9

slide-11
SLIDE 11

Research Group Social Computing Department of Informatics Technical University of Munich

Domains – Virtual Dietary Advisor

Monika Wintergerst, M.Sc.

Preliminary Meeting of the NLP Lab Course SS2020 10

Areas of interest:

  • Substitute recommendation

− Motivation: make favorite dishes healthier through small changes − Use recipe texts, ontological knowledge − Identify similar ingredient alternatives

  • Dialog interaction

− Moivation: emulate a real dietician − Detect a user’s state of mind and react empathically − Encourage self-reflection and mindfulness

slide-12
SLIDE 12

Research Group Social Computing Department of Informatics Technical University of Munich

Domains – Hate Speech Detection | Overview

Maximilian Wich, M.Sc.

Improve machine learning models for hate speech detection (e.g., integrating social media, identifying other relevant features besides text) Make predictions of machine learning models more transparent (explainable AI) Hate Speech Detection

Preliminary Meeting of the NLP Lab Course SS2020 11

slide-13
SLIDE 13

Research Group Social Computing Department of Informatics Technical University of Munich

Domains – Hate Speech Detection | Topics

Maximilian Wich, M.Sc.

Preliminary Meeting of the NLP Lab Course SS2020 12

Potential topics/ideas:

  • Multitask learning to combine data sets with different labeling schemes

− Problem:

there are many hate speech data sets, but they use different labeling schemes

− Idea:

train a multitask classifier (e.g., BERT) with shared layers based on several data sets

  • Learning from weak supervision to increase the amount of training data without manual labeling

− Problem:

we do not have enough trainings data

− Idea:

train classifiers on available data, collect new data with these classifiers, and retrain the classifiers

  • Classify hate speech based on stylistic elements (e.g., POS, usage of emojis...)

− Problem:

implicit hate speech is often hard to identify

− Idea:

use stylistic elements to find patterns in hate speech and train an classifier

slide-14
SLIDE 14

Research Group Social Computing Department of Informatics Technical University of Munich

Questions?

Preliminary Meeting of the NLP Lab Course SS2020 13