How to Read Paintings: Semantic Art Understanding with Multi-Modal - - PowerPoint PPT Presentation

how to read paintings semantic art understanding with
SMART_READER_LITE
LIVE PREVIEW

How to Read Paintings: Semantic Art Understanding with Multi-Modal - - PowerPoint PPT Presentation

How to Read Paintings: Semantic Art Understanding with Multi-Modal Retrieval Noa Garcia & George Vogiatzis 4th Workshop on Computer Vision for Art Analysis Motivation Semantic Art Understanding In this painting the church in Auvers has


slide-1
SLIDE 1

How to Read Paintings: Semantic Art Understanding with Multi-Modal Retrieval

Noa Garcia & George Vogiatzis

4th Workshop on Computer Vision for Art Analysis

slide-2
SLIDE 2

Motivation

slide-3
SLIDE 3

Semantic Art Understanding

In this painting the church in Auvers has been transformed by the artist into a vision using form and colour. Painted in portrait format, the church towers up before the

  • nlooker like a fortification. The path leading to it forks in the foreground into two

narrow paths passing the church on either side. On the path to the left, her back turned toward us, a peasant woman is walking into the distance. The path is bathed in light, while the church is viewed against the backdrop of a dark blue sky that merges with the black-blue of the night sky at the edges of the picture. The brushwork is restless and full of movement, and the forms of the church are distorted in the Expressionist manner.

slide-4
SLIDE 4

Semantic Art Understanding

In this painting the church in Auvers has been transformed by the artist into a vision using form and colour. Painted in portrait format, the church towers up before the

  • nlooker like a fortification. The path leading to it forks in the foreground into two

narrow paths passing the church on either side. On the path to the left, her back turned toward us, a peasant woman is walking into the distance. The path is bathed in light, while the church is viewed against the backdrop of a dark blue sky that merges with the black-blue of the night sky at the edges of the picture. The brushwork is restless and full of movement, and the forms of the church are distorted in the Expressionist manner.

slide-5
SLIDE 5

Semantic Art Understanding

In this painting the church in Auvers has been transformed by the artist into a vision using form and colour. Painted in portrait format, the church towers up before the

  • nlooker like a fortification. The path leading to it forks in the foreground into two

narrow paths passing the church on either side. On the path to the left, her back turned toward us, a peasant woman is walking into the distance. The path is bathed in light, while the church is viewed against the backdrop of a dark blue sky that merges with the black-blue of the night sky at the edges of the picture. The brushwork is restless and full of movement, and the forms of the church are distorted in the Expressionist manner.

slide-6
SLIDE 6

Semantic Art Understanding

In this painting the church in Auvers has been transformed by the artist into a vision using form and colour. Painted in portrait format, the church towers up before the

  • nlooker like a fortification. The path leading to it forks in the foreground into two

narrow paths passing the church on either side. On the path to the left, her back turned toward us, a peasant woman is walking into the distance. The path is bathed in light, while the church is viewed against the backdrop of a dark blue sky that merges with the black-blue of the night sky at the edges of the picture. The brushwork is restless and full of movement, and the forms of the church are distorted in the Expressionist manner.

slide-7
SLIDE 7

Semantic Art Understanding

In this painting the church in Auvers has been transformed by the artist into a vision using form and colour. Painted in portrait format, the church towers up before the

  • nlooker like a fortification. The path leading to it forks in the foreground into two

narrow paths passing the church on either side. On the path to the left, her back turned toward us, a peasant woman is walking into the distance. The path is bathed in light, while the church is viewed against the backdrop of a dark blue sky that merges with the black-blue of the night sky at the edges of the picture. The brushwork is restless and full of movement, and the forms of the church are distorted in the Expressionist manner.

slide-8
SLIDE 8

Related Work

Painting-91, 2014 PRINTART, 2012 Rijksmuseum, 2014 Paintings Database, 2014 Wikipaintings, 2014 Art500k, 2016

slide-9
SLIDE 9

Related Work

Painting-91, 2014 PRINTART, 2012 Rijksmuseum, 2014 Paintings Database, 2014 Wikipaintings, 2014 Art500k, 2016 Classification Classification Classification Classification Classification Object Recognition

slide-10
SLIDE 10

SemArt Dataset

Data collected from the Web Gallery of Art

https://www.wga.hu/

Data collected from the Web Gallery of Art

slide-11
SLIDE 11

SemArt Dataset

image, attributes and comments Each sample in the dataset is a triplet

slide-12
SLIDE 12

SemArt Dataset

image, attributes and comments Each sample in the dataset is a triplet

slide-13
SLIDE 13

SemArt Dataset

image, attributes and comments Each sample in the dataset is a triplet

slide-14
SLIDE 14

SemArt Dataset

image, attributes and comments Each sample in the dataset is a triplet

slide-15
SLIDE 15

SemArt Dataset

Attributes

Author, Title, Date, Technique, Type, School, Timeframe

slide-16
SLIDE 16

SemArt Dataset

Attributes

Author, Title, Date, Technique, Type, School, Timeframe

slide-17
SLIDE 17

SemArt Dataset

Attributes

Author, Title, Date, Technique, Type, School, Timeframe

slide-18
SLIDE 18

SemArt Dataset

Attributes

Author, Title, Date, Technique, Type, School, Timeframe

slide-19
SLIDE 19

SemArt Dataset

Comments

70% with 100 words or less

slide-20
SLIDE 20

SemArt Dataset

Partition

  • Num. Triplets

% Training 19,244 90 Validation 1,069 5 Test 1,069 5 Total 21,383 100

Data splits

slide-21
SLIDE 21

Text2Art Challenge

Multi-modal retrieval

slide-22
SLIDE 22

Text2Art Challenge

Text-to-Image Retrieval

slide-23
SLIDE 23

Text2Art Challenge

Image-to-Text Retrieval

slide-24
SLIDE 24

Models

We study 3 fundamental parts: visual encoding, text encoding and multi-modal transformation

slide-25
SLIDE 25

Models

Visual Encoding

We consider the following visual encoders:

  • VGG16 (Simonyan and Zisserman, 2014)
  • ResNets (He et al. 2016)
  • RMAC (Tolias et al. 2016)
slide-26
SLIDE 26

Models

We encode titles and comments independently and concatenate their vectors. We consider the following text encoders:

  • BOW (bag-of-words)
  • MLP (multilayer preceptron)
  • RNN (recurrent neural networks)

Textual Encoding

slide-27
SLIDE 27

Models

We map visual and text encodings into the common semantic space using the following methods: CCA, CML and AMD

Multi-Modal Transformation

slide-28
SLIDE 28

We map visual and text encodings into a common semantic space using the following methods: CCA, CML and AMD

Models

Multi-Modal Transformation

slide-29
SLIDE 29

We map visual and text encodings into a common semantic space using the following methods: CCA, CML and AMD

Models

Multi-Modal Transformation

slide-30
SLIDE 30

Evaluation

Visual Encoding

ResNet152 is the best visual encoder

slide-31
SLIDE 31

Evaluation

Textual Encoding

Simple BOW performs better than recurrent models, as

  • bserved in other multi-modal retrieval work (Wang et al. 2018)
slide-32
SLIDE 32

Evaluation

Multi-Modal Transformation

CML is the best model

slide-33
SLIDE 33

Qualitative Results

slide-34
SLIDE 34

Human Evaluation

Easy Difficult

slide-35
SLIDE 35

Summary

  • SemArt dataset for semantic art understanding
slide-36
SLIDE 36

Summary

  • SemArt dataset for semantic art understanding
  • Text2Art challenge as a retrieval task
slide-37
SLIDE 37

Summary

  • SemArt dataset for semantic art understanding
  • Text2Art challenge as a retrieval task
  • Best model based on ResNet, BOW and CML
slide-38
SLIDE 38

Summary

  • SemArt dataset for semantic art understanding
  • Text2Art challenge as a retrieval task
  • Best model based on ResNet, BOW and CML
  • Not that far from human performance
slide-39
SLIDE 39

Thank you!

Noa Garcia Aston University Project Website: http://noagarciad.com/SemArt/

4th Workshop on Computer Vision for Art Analysis