[PPT] - Name-dropping in 18th Century Public Discourse Aleksi Jalavala 1 , PowerPoint Presentation

SLIDE 1

Name-dropping in 18th Century Public Discourse

Aleksi Jalavala1, Annika Pensola1, Bruno Sartini3, David Rosson2, Peeter Tinits5, Selina Lehtoranta1, Sophie Schneider4, Veera Oksala1 Team leaders: Si mon Hengchen1 , Tanja Säily1

1University of Helsinki 2Aalto University 3University of Bologna 4Potsdam University of Applied Sciences 5Tallinn University

SLIDE 2

Research Questions

Which personal names are frequently mentioned in 18th century British publications? On the basis of the frequency and co-occurrence of individual names, what kind of patterns can we detect that are characteristic of genres and time periods?

SLIDE 3

Data

Eighteenth Century Collections Online (ECCO)

High representativeness: ca. 50% of all 18th century British printed texts

(180,000 titles, 32 million pages)

OCR issues, unreliable metadata

SLIDE 4

OCR example

They do not ddfcovcr much taste or ingenuity in building their hoifllts; though the defe& is rather in the delign than the execution. Those of the lower people are poor huts, thole of the better are larger and more comfortable. Their houfcs, properly speaking, are thatched roofs or sheds supported by pofgs and r:fters dilpofed in a tolerably judicious manner.

SLIDE 5

Data

Eighteenth Century Collections Online (ECCO)

High representativeness: ca. 50% of all 18th century British printed texts

(180,000 titles, 32 million pages)

OCR issues, unreliable metadata
3 distinct subsets (history, religion, social afgairs) combining the metadata

and keyword analysis

SLIDE 6

Keyword analysis

Keyness analysis

SLIDE 7

Final subset from ECCO

SLIDE 8

Methods

1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

SLIDE 9

Methods

1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

SLIDE 10

Methods

1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

SLIDE 11

Methods

1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

SLIDE 12

Methods

1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

SLIDE 13

Methods

1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

SLIDE 14

Methods

1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

SLIDE 15

Quiz

Go to www.menti.com and use the code: 91 83 31

SLIDE 16

Results

Identified the most common people mentioned in texts

SLIDE 17

SLIDE 18

Results

Identified the most common people mentioned in texts
Looked at this over time (20-year periods)

○ Religion more static, others more dynamic ○ Classics and religious figures, e.g. Jesus, Cicero, Virgil, are mentioned across genres & remain constant throughout the century ■ hints at name-dropping as a proof of one’s education

SLIDE 19

SLIDE 20

Results

Identified the most common people mentioned in texts
Looked at this over time (20-year periods)

○ Religion more static, others more dynamic ○ Classics and religious figures, e.g. Jesus, Cicero, Virgil, are mentioned across genres & remain constant throughout the century ■ hints at name-dropping as a proof of one’s education

The people mentioned in books do reveal similarity in content

○ Cluster similar to modules in ECCO ○ A way to approximate genres?

SLIDE 21

SLIDE 22

Results

Identified the most common people mentioned in texts

○ Religion more static, others more dynamic ○ Classics and religious figures, e.g. Jesus, Cicero, Virgil, are mentioned across genres & remain constant throughout the century ■ hints at name-dropping as a proof of one’s education

The people mentioned in books do reveal similarity in content

○ Cluster similar to modules in ECCO ○ A way to approximate genres?

Patterns in types of people referred to by genre.

SLIDE 23

SLIDE 24

Future research

Automatic genre classification based on named entity networks
Improve NER and subsets by using domain-specific resources
Examine less popular entities and their role for certain genres
Focus on 1st editions, inspecting specific sections of books
Creating interactive visualizations

SLIDE 25

Future visualization

SLIDE 26

Public outreach

SLIDE 27

@GenreAndStyle
Regular tweeting on process and

related things

Audience: academic community,

DHH19 participants

TWITTER

SLIDE 28

Blogs - medium.com and blogs.helsinki.fi

@GenreAndStyle

One post a day, each group

member wrote once

Audience: academic community,

DHH19 participants

SLIDE 29

INSTAGRAM

Personal accounts ○ Stories, links to blog ○ Audience: personal contacts within as well as outside of the academic community

SLIDE 30

Thank you!

Questions?

SLIDE 31