Art Analysis Dataset Characterization MIEIC IC 2020/2 /2021 - - PowerPoint PPT Presentation

art analysis
SMART_READER_LITE
LIVE PREVIEW

Art Analysis Dataset Characterization MIEIC IC 2020/2 /2021 - - PowerPoint PPT Presentation

Art Analysis Dataset Characterization MIEIC IC 2020/2 /2021 Descr cri o, , Arma mazename mentoe e Pe Pesquisa sade de In Informao Ana Silva, up201604105 Gonalo Santos, up201603265 Fbio Arajo, up201607944 Susana


slide-1
SLIDE 1

Art Analysis

Dataset Characterization

Ana Silva, up201604105 Fábio Araújo, up201607944 Gonçalo Santos, up201603265 Susana Lima, up201603634

MIEIC IC 2020/2 /2021 Descr criçã ção, , Arma mazename mentoe e Pe Pesquisa sade de In Informação

1

slide-2
SLIDE 2

Data Co Collec ecti tion

  • n

2

slide-3
SLIDE 3

Co Conc nceptu tual Mod

  • del

3

slide-4
SLIDE 4

Data Ch Characterizati tion

  • n

4

slide-5
SLIDE 5

Data Ch Characterizati tion

  • n

5

slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33
slide-34
SLIDE 34

European Parliament Data

Dataset Preparation

slide-35
SLIDE 35

705 MEPs 27 Countries 7 Political groups 22 Committees 6000 Voting sessions per year ~200 National parties

What is our project about?

slide-36
SLIDE 36

Search tasks in the original source

slide-37
SLIDE 37

The dataset

MEMBERS OF THE EUROPEAN PARLIAMENT FINAL REPORTS AND JOINT MOTIONS VOTES

  • Name
  • Country
  • Age
  • Political

Group(s)

  • National Party
  • Committees
  • Social Media
  • Title
  • (A lot of) Text
  • Rapporteur(s)
  • (Committee)

+ In favor

  • Against

0 Abstention Text Text

slide-38
SLIDE 38

Data sources

slide-39
SLIDE 39

Data preparation

+

slide-40
SLIDE 40

Possible search tasks

Search for a report Search for a MEP Search for a committee Get the votes casted by a MEP Get the votes on a specific report Get reports of a committee

slide-41
SLIDE 41

Dis Diseases, Sy Sympt ptoms an and d Tr Treat atments

Group 6:

  • André Esteves - up201606673
  • Francisco Filipe - up201604601
  • Helena Montenegro - up201604184
  • Juliana Marques - up201605568

Information Description, Storage and Retrieval

2020/2021

slide-42
SLIDE 42

In Intro roduction

1 Problem:

  • Documents shown by current search mechanisms may not be reliable.
  • Misleading and exaggerated information may lead to panic.
  • Lack of search mechanism focused on health matters.

Goal of the project:

  • Develop search mechanism focused on diseases, treatments and symptoms.
slide-43
SLIDE 43

Co Conceptual Model

2

slide-44
SLIDE 44

Data Data Pipel eline

3

slide-45
SLIDE 45

Data Data Collecti tion

4

Structural data obtained through API requests. The information is:

  • Unreliable.
  • Incomplete.

License:

  • Creative Commons Public Domain

Dedication 1.0

  • Free to modify and share, even for

commercial purposes.

Wikidata

Textual data obtained through scraping (with Scrapy). The information is:

  • Verifiable against authoritative

sources. License:

  • Creative Commons Attribution-

ShareAlike 3.0 Unported

  • Free to modify and share, even for

commercial purposes, as long as credit is given.

Wikipedia

slide-46
SLIDE 46

Data Data Sto Storage age

5 Azure SQL Database

  • Language: SQL Server
  • Tools: Azure Data Studio

Data Data Enrichment

  • UPDATE statements to add the overviews scraped from

Wikipedia to the database.

slide-47
SLIDE 47

Data Data Clean eaning

6 On the structural data:

  • Remove diseases that had less than 2 connections to other classes.
  • Remove all symptoms, treatments, drugs, causes and specialties that did not

have a connection to a disease. On the textual data:

  • Extract text using BeautifulSoup.
  • Remove special characters.
  • Remove references.
slide-48
SLIDE 48

Data Data Char arac acterizati tion

7

slide-49
SLIDE 49

Data Data Char arac acterizati tion

8

slide-50
SLIDE 50

Sys System tem Res esults

9 Disease:

  • Overview.
  • List of symptoms, treatments, drugs, causes

and health specialties. Treatment:

  • Overview.
  • List of diseases.

Symptom:

  • Overview.
  • List of diseases.
slide-51
SLIDE 51

Re Retrieval Tasks

10

  • Retrieve disease, treatment or symptom based on its information (name or

word in overview)

  • Retrieve disease by symptom.
  • Retrieve disease by health specialty.
  • Retrieve treatment by disease.
  • Retrieve symptom by disease.
slide-52
SLIDE 52
slide-53
SLIDE 53
slide-54
SLIDE 54
slide-55
SLIDE 55
slide-56
SLIDE 56
slide-57
SLIDE 57
slide-58
SLIDE 58
slide-59
SLIDE 59

BILLBOARD 200: POPULAR ALBUMS AND ARTISTS

DATASET PREPARATION

Grupo 8 João Miguel (up201604241@fe.up.pt) José Azevedo (up201506448@fe.up.pt) Ricardo Ferreira (up200305418@fe.up.pt)

slide-60
SLIDE 60

DATASET CHARACTERIZATION

Acoustic and meta features of albums and songs on the Billboard 200

  • SQL Database Format
  • Two Tables (Albums

and acoustic features)

  • Free for use and

download

Last FM

  • Website with a huge

list of artists, songs and albums

  • Easy to build urls
  • Crawlers allowed

Metro Lyrics

  • Website with songs

lyrics

  • Easy to build urls
  • Crawlers allowed
slide-61
SLIDE 61

DATA PIPELINE

slide-62
SLIDE 62

CONCEPTUAL MODEL

slide-63
SLIDE 63

SEARCH AND RETURNED DOCUMENTS

Returned documents

  • Albums
  • Artists
  • Musics
  • Rank

Possible search tasks:

  • Rank by date (year, month, day)
  • Will return: Albums, Artists, Rank
  • Artists (Band or Solo)
  • Will return: Artist, Albums
  • Location
  • Will return: Artists
  • Album
  • Will return: Album, Artist, Musics, Best Rank
  • Release Date (year, month, day)
  • Will return: Albums, Artists
  • Musical Genre
  • Will return: Albums, Artists, Musics
  • Musics (By name or words/sentences from the lyrics)
  • Will return: Musics
slide-64
SLIDE 64

D AT A S E T P R E PA R AT I O N

A N I M A T I O N I N J A P A N D A P I 2 0 2 0 / 2 0 2 1

slide-65
SLIDE 65

I N T R O D U C T I O N

  • Japanese-style animated film
  • Popular form of entertainment for all kinds of audience.
  • Originated from novels or vídeo games adaptations
slide-66
SLIDE 66

D ATA S E T

  • Fans gather in plataforms to talk about animes
  • Information regarding animes and animes reviews are collected and can be accessed.
  • Gathering of data separately
  • All users can have na overview of anime rating
slide-67
SLIDE 67

I N F O R M A T I O N R E T R I E V A L

slide-68
SLIDE 68

D A T A P R E P A R A T I O N