Data-driven Approaches for Detection of Antisocial Behavior - - PowerPoint PPT Presentation

data driven approaches for detection of antisocial
SMART_READER_LITE
LIVE PREVIEW

Data-driven Approaches for Detection of Antisocial Behavior - - PowerPoint PPT Presentation

Data-driven Approaches for Detection of Antisocial Behavior Veronika atkov, Ivan Srba, Rbert Mro (FIIT STU) PyData Bratislava 22 nd May 2019 WHO ARE WE? Ivan and Rbert - Researchers @FIIT STU Veronika - Master student @FIIT STU Our


slide-1
SLIDE 1

PyData Bratislava

22nd May 2019

Data-driven Approaches for Detection of Antisocial Behavior

Veronika Žatková, Ivan Srba, Róbert Móro (FIIT STU)

slide-2
SLIDE 2

Our topics of interest:

▪ Data science ▪ Machine learning ▪ Data mining

WHO ARE WE?

Ivan and Róbert - Researchers @FIIT STU Veronika - Master student @FIIT STU

2

▪ Computational social science ▪ Social computing

slide-3
SLIDE 3

3

Source: https://kinsta.com/blog/wordpress-social-media-plugins/

slide-4
SLIDE 4

4

Source: https://www.wsj.com/articles/scholars-get-the-real-scoop-on-fake-news-1515360315

slide-5
SLIDE 5

5

Source: https://www.poynter.org/fact-checking/2019/is-expert-crowdsourcing-the-solution-to-health-misinformation/

slide-6
SLIDE 6

6

Source: https://www.edutopia.org/blog/how-respond-when-students-use-hate-speech-richard-curwin

slide-7
SLIDE 7

7

DATA SCIENCE

can help to characterize, detect and mitigate such antisocial behavior? How

slide-8
SLIDE 8

Two research projects: ▪

  • Antisocial behavior in general

  • Medical misinformation

https://rebelion.fiit.stuba.sk/

WHAT ARE WE WORKING ON?

8

Cooperation:

slide-9
SLIDE 9

Data science perspective

Antisocial behaviour

slide-10
SLIDE 10

ANTISOCIAL BEHAVIOR

10

slide-11
SLIDE 11

TASKS

11

Characterization

▪ what does characterize/distinguish, e.g., fake news from true news, how is it spread and by whom is it shared?

Detection

▪ how can we automatically detect fake news, hate speech, etc.?

Mitigation

▪ how can we stop, e.g., the spread of fake news in a transparent, trustworthy, ethical way?

slide-12
SLIDE 12

TECHNIQUES

12

Machine learning Data mining Natural language processing Neural networks and deep learning

slide-13
SLIDE 13

OPEN PROBLEMS

13

Exploiting content, user and context data

▪ Multisource approaches ▪ Multimodal approaches ▪ Multilingual approaches ▪ Extended context

slide-14
SLIDE 14

OPEN PROBLEMS

14

Exploiting content, user and context data

▪ Multisource approaches ▪ Multimodal approaches ▪ Multilingual approaches ▪ Extended context

Addressing unlabelled and dynamic data

▪ Unsupervised, semi-supervised and ensemble models (e.g. multiview learning) ▪ Active learning

slide-15
SLIDE 15

OPEN PROBLEMS

15

Exploiting content, user and context data

▪ Multisource approaches ▪ Multimodal approaches ▪ Multilingual approaches ▪ Extended context

Addressing unlabelled and dynamic data

▪ Unsupervised, semi-supervised and ensemble models (e.g. multiview learning) ▪ Active learning

Investigating new mitigation approaches

▪ Early warning system ▪ On-site warning system ▪ Education and training

slide-16
SLIDE 16

OPEN PROBLEMS

16

No suitable content-rich and benchmark datasets No suitable applications and platforms to deploy solutions

slide-17
SLIDE 17

Platform for monitoring antisocial behavior

Monant platform

slide-18
SLIDE 18

18

slide-19
SLIDE 19

IMPLEMENTATION

19

Primary implementation language: Python Dev ops

▪ Docker ▪ Travis CI

slide-20
SLIDE 20

CENTRAL DATA STORAGE

20

Mediates data transfer between all platform modules Three layers

▪ Evidence layer ▪ Inference and prediction layer ▪ Platform management layer

slide-21
SLIDE 21

CENTRAL DATA STORAGE

21

Mediates data transfer between all platform modules Implementation

▪ Flask ▪ PostgreSQL ▪ REST APIs + Apistrap + Schematics ▪ Swagger

http://flask.pocoo.org/ https://github.com/Cognexa/apistrap https://schematics.readthedocs.io/en/latest https://swagger.io/

slide-22
SLIDE 22

WEB MONITORING

22

Crawls and parses data from various data sources by means of data providers Data sources

▪ News sites ▪ Fact-checking sites ▪ Social networks ▪ Existing datasets Event-based architecture Supports scheduling

slide-23
SLIDE 23

WEB MONITORING

23

Crawls and parse data from various data sources by means of data providers Data providers

▪ Site-specific crawlers and parsers ▪ RSS feeds ▪ News site generic crawler and parser ▪ News API

Chaining of data providers

RSS feed Site-specific parser

https://newsapi.org/

slide-24
SLIDE 24

WEB MONITORING

24

Crawls and parse data from various data sources by means of data providers Implementation

▪ Scrapy library ▪ Beautiful Soup library ▪ Newspaper library ▪ Feedparser library ▪ Celery + RabbitMQ + Flower

https://scrapy.org/ https://www.crummy.com/software/BeautifulSoup/ https://github.com/codelucas/newspaper/tree/master/newspape https://github.com/kurtmckee/feedparser

slide-25
SLIDE 25

PLATFORM MANAGEMENT

25

Manages the data flows between all platform modules Web monitoring management

▪ Monitors (e.g. “Monitoring of health misinformation in Europe”)

Data storage management ▪ Access control to central data storage

slide-26
SLIDE 26

PLATFORM MANAGEMENT

26

Manages the data flows between all platform modules Implementation ▪ Django ▪ Flask-JWT (not implemented yet)

https://www.djangoproject.com/ https://flask-jwt-extended.readthedocs.io/en/latest/

slide-27
SLIDE 27

27

slide-28
SLIDE 28

AI CORE

28

Allows to easily extend the platform with a wide variety of data-driven methods User and domain modeling methods

▪ Derive and maintain user and content characteristics ▪ Sources and their trust, authors’ credibility, ...

Prediction methods

▪ Characterize and detect antisocial behavior

slide-29
SLIDE 29

AI CORE

29

Allows to easily extend the platform with a wide variety of data-driven methods Implementation

▪ Independant from platform ▪ Central storage allows easy data exchange between methods

slide-30
SLIDE 30

END-USER SERVICES

30

Serve as an interface for experts (e.g., journalists) and general public Examples

▪ Real-time monitoring and visualization tool ▪ URL and user history verifier ▪ Education and training tool

slide-31
SLIDE 31

31

The first prototype of Monant was developed by a team of our students

slide-32
SLIDE 32

32

Source: https://patientengagementhit.com/news/patient-access-to-preventive-care-key-for-cancer-care-equity

slide-33
SLIDE 33

33

Source: https://www.cancer.news/2019-04-24-green-coffee-blueberries-tomatoes-strawberries-have-chlorogenic-acid.html

NATURAL NEWS NETWORK

slide-34
SLIDE 34

CASE STUDY - HEALTHCARE MISINFORMATION

34

Task: To characterize the amount of misinformative articles containing false claims related to cancer treatment Data providers

▪ Custom crawlers and parsers of Natural News network ▪ Additional data providers to be used

▪ badatel.net ▪ RSS parser ▪ Newspaper crawler and parser ▪ News API

slide-35
SLIDE 35

CASE STUDY - HEALTHCARE MISINFORMATION

35

Articles: 40,198 news articles from 23 sites

slide-36
SLIDE 36

CASE STUDY - HEALTHCARE MISINFORMATION

36

Articles: 40,198 news articles from 23 sites Claims: 139 cancer "treatments"

Source of claims: https://docs.google.com/spreadsheets/d/1EyhHFv2WswRNrFZ-O6SjF5m_9EhnV6zCZ0RdSX5TtFM/edit#gid=0

slide-37
SLIDE 37

CASE STUDY - HEALTHCARE MISINFORMATION

37

Articles: 40,198 news articles from 23 sites Claims: 139 cancer "treatments" Mapping: 6,222 news articles (15.5%) contains at least one cancer “treatment” claim

▪ An average number of claims per article is 1.93 ▪ A maximal number of claims was 9

Source of claims: https://docs.google.com/spreadsheets/d/1EyhHFv2WswRNrFZ-O6SjF5m_9EhnV6zCZ0RdSX5TtFM/edit#gid=0

slide-38
SLIDE 38

CASE STUDY - HEALTHCARE MISINFORMATION

38

Articles: 40,198 news articles from 23 sites Claims: 139 cancer "treatments" Mapping: 6,222 news articles (15.5%) contains at least one cancer “treatment” claim

▪ An average number of claims per article is 1.93 ▪ A maximal number of claims was 9 ▪ The most frequent claims

▪ Antioxidants (2459 articles) ▪ Herbalism (1715 articles) ▪ Poly-MVA (Lipoic Acid Mineral Complex, 723 articles) ▪ Superfood (609 articles)

Source of claims: https://docs.google.com/spreadsheets/d/1EyhHFv2WswRNrFZ-O6SjF5m_9EhnV6zCZ0RdSX5TtFM/edit#gid=0

slide-39
SLIDE 39

CONCLUSIONS

39

Monant addresses a lack of datasets and suitable

  • platforms. There is still a problem of missing labelled data.
slide-40
SLIDE 40

CONCLUSIONS

40

Monant addresses a lack of datasets and suitable

  • platforms. There is still a problem of missing labelled data.

More interesting problems (e.g., automatic detection) lie ahead of us. We have some first results in fake news detection that we plan to deploy to the platform.

slide-41
SLIDE 41

CONCLUSIONS

41

Monant addresses a lack of datasets and suitable

  • platforms. There is still a problem of missing labelled data.

More interesting problems (e.g., automatic detection) lie ahead of us. We have some first results in fake news detection that we plan to deploy to the platform. Interested in more info?

https://rebelion.fiit.stuba.sk/