Name-dropping in 18th Century Public Discourse Aleksi Jalavala 1 , - - PowerPoint PPT Presentation

name dropping in 18th century public discourse
SMART_READER_LITE
LIVE PREVIEW

Name-dropping in 18th Century Public Discourse Aleksi Jalavala 1 , - - PowerPoint PPT Presentation

Name-dropping in 18th Century Public Discourse Aleksi Jalavala 1 , Annika Pensola 1 , Bruno Sartini 3 , David Rosson 2 , Peeter Tinits 5 , Selina Lehtoranta 1 , Sophie Schneider 4 , Veera Oksala 1 mon Hengchen 1 , Tanja Sily 1 Team leaders: Si 1


slide-1
SLIDE 1

Name-dropping in 18th Century Public Discourse

Aleksi Jalavala1, Annika Pensola1, Bruno Sartini3, David Rosson2, Peeter Tinits5, Selina Lehtoranta1, Sophie Schneider4, Veera Oksala1 Team leaders: Si mon Hengchen1 , Tanja Säily1

1University of Helsinki 2Aalto University 3University of Bologna 4Potsdam University of Applied Sciences 5Tallinn University

slide-2
SLIDE 2

Research Questions

Which personal names are frequently mentioned in 18th century British publications? On the basis of the frequency and co-occurrence of individual names, what kind of patterns can we detect that are characteristic of genres and time periods?

slide-3
SLIDE 3

Data

Eighteenth Century Collections Online (ECCO)

  • High representativeness: ca. 50% of all 18th century British printed texts

(180,000 titles, 32 million pages)

  • OCR issues, unreliable metadata
slide-4
SLIDE 4

OCR example

They do not ddfcovcr much taste or ingenuity in building their hoifllts; though the defe& is rather in the delign than the execution. Those of the lower people are poor huts, thole of the better are larger and more comfortable. Their houfcs, properly speaking, are thatched roofs or sheds supported by pofgs and r:fters dilpofed in a tolerably judicious manner.

slide-5
SLIDE 5

Data

Eighteenth Century Collections Online (ECCO)

  • High representativeness: ca. 50% of all 18th century British printed texts

(180,000 titles, 32 million pages)

  • OCR issues, unreliable metadata
  • 3 distinct subsets (history, religion, social afgairs) combining the metadata

and keyword analysis

slide-6
SLIDE 6

Keyword analysis

Keyness analysis

slide-7
SLIDE 7

Final subset from ECCO

slide-8
SLIDE 8

Methods

1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

slide-9
SLIDE 9

Methods

1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

slide-10
SLIDE 10

Methods

1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

slide-11
SLIDE 11

Methods

1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

slide-12
SLIDE 12

Methods

1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

slide-13
SLIDE 13

Methods

1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

slide-14
SLIDE 14

Methods

1) Data Extraction - subsets from ECCO Corpus 2) Named Entity Extraction from the subsets 3) NER Validation: both qualitative (manual checking) and quantitative (automatic match with DBPedia) 4) Sampling of the data 5) Visualization of the data through Networks and metadata filtering 6) Qualitative Examination of the results in the Visualization 7) Refinement of the quantitative techniques based on the feedback of the qualitative examination (Repeat step 3 until we are satisfied)

slide-15
SLIDE 15

Quiz

Go to www.menti.com and use the code: 91 83 31

slide-16
SLIDE 16

Results

  • Identified the most common people mentioned in texts
slide-17
SLIDE 17
slide-18
SLIDE 18

Results

  • Identified the most common people mentioned in texts
  • Looked at this over time (20-year periods)

○ Religion more static, others more dynamic ○ Classics and religious figures, e.g. Jesus, Cicero, Virgil, are mentioned across genres & remain constant throughout the century ■ hints at name-dropping as a proof of one’s education

slide-19
SLIDE 19
slide-20
SLIDE 20

Results

  • Identified the most common people mentioned in texts
  • Looked at this over time (20-year periods)

○ Religion more static, others more dynamic ○ Classics and religious figures, e.g. Jesus, Cicero, Virgil, are mentioned across genres & remain constant throughout the century ■ hints at name-dropping as a proof of one’s education

  • The people mentioned in books do reveal similarity in content

○ Cluster similar to modules in ECCO ○ A way to approximate genres?

slide-21
SLIDE 21
slide-22
SLIDE 22

Results

  • Identified the most common people mentioned in texts

○ Religion more static, others more dynamic ○ Classics and religious figures, e.g. Jesus, Cicero, Virgil, are mentioned across genres & remain constant throughout the century ■ hints at name-dropping as a proof of one’s education

  • The people mentioned in books do reveal similarity in content

○ Cluster similar to modules in ECCO ○ A way to approximate genres?

  • Patterns in types of people referred to by genre.
slide-23
SLIDE 23
slide-24
SLIDE 24

Future research

  • Automatic genre classification based on named entity networks
  • Improve NER and subsets by using domain-specific resources
  • Examine less popular entities and their role for certain genres
  • Focus on 1st editions, inspecting specific sections of books
  • Creating interactive visualizations
slide-25
SLIDE 25

Future visualization

slide-26
SLIDE 26

Public outreach

slide-27
SLIDE 27
  • @GenreAndStyle
  • Regular tweeting on process and

related things

  • Audience: academic community,

DHH19 participants

TWITTER

slide-28
SLIDE 28

Blogs - medium.com and blogs.helsinki.fi

@GenreAndStyle

  • One post a day, each group

member wrote once

  • Audience: academic community,

DHH19 participants

slide-29
SLIDE 29

INSTAGRAM

Personal accounts ○ Stories, links to blog ○ Audience: personal contacts within as well as outside of the academic community

slide-30
SLIDE 30

Thank you!

Questions?

slide-31
SLIDE 31