Who? Investigating the social entities in a corpus Max Kemman - - PowerPoint PPT Presentation

who investigating the social entities in a corpus
SMART_READER_LITE
LIVE PREVIEW

Who? Investigating the social entities in a corpus Max Kemman - - PowerPoint PPT Presentation

Who? Investigating the social entities in a corpus Max Kemman University of Luxembourg December 6, 2016 Doing Digital History: Introduction to Tools and Technology Where assignment How is the assignment going so far? Any questions about the


slide-1
SLIDE 1

Who? Investigating the social entities in a corpus

Max Kemman

University of Luxembourg December 6, 2016

Doing Digital History: Introduction to Tools and Technology

slide-2
SLIDE 2

Where assignment

How is the assignment going so far? Any questions about the tools?

slide-3
SLIDE 3

Today

Final assignment

  • Networks
  • From Hermeneutics to Data to Networks
  • Next time
slide-4
SLIDE 4

Final assignment

  • 1. Analyse the 30k emails with the W-questions, or specify a subselection
  • 2. Reflect upon your analysis
slide-5
SLIDE 5

W questions

Can you come up with more W questions?

  • 1. What?

What are the emails about? How does this change over time?

  • 2. Where?

Where are the locations mentioned in the emails? What does this say about the (inter)national perspective of the writer(s)?

  • 3. When?

When were the emails sent? How do the emails change over time?

  • 4. Who?

Who are the emails sent from & to? Who are the people mentioned in the emails, and how do they relate to the writer & reader? What does this say about the social perspective of the writer?

slide-6
SLIDE 6

The report

Work in groups of three or four (in group of 3: discuss 3 W questions) Include a link to your Google Sheet (via the Share button) or other sources Hand in the assignment in HTML, include your name and a decent profile photo 3000-5000 words, in English

slide-7
SLIDE 7

Grading

Grading of the course Grading of the final assignment

Weekly assignments (40%)

  • Final group project (60%)
  • 1pt for the HTML
  • 1pt for CSS
  • 2pts for documentation of your process
  • 4pts for discussion of the W questions
  • 2pts for critical reflection
slide-8
SLIDE 8

Deadline

Send in your assignment before 20 January 2017 23:59 (tentative) Send them to max.kemman@uni.lu as usual: I will confirm your submission

slide-9
SLIDE 9

Networks

Our final W question Historical research incorporates:

What - what happened?

  • Where - where did this happen?
  • When - when did this happen?
  • Who - who was involved?
slide-10
SLIDE 10

How to describe the people

Given a corpus, multiple ways of describing people

A list of all the people

  • Biographies
  • Classes of people
  • Genealogies
  • Networks of people
slide-11
SLIDE 11

What is a network?

Two components:

(Images and information based on Martin Grandjean's tutorial)

  • 1. Actors - the people - represented as

nodes

  • 2. Relations - the connections -

represented as edges

slide-12
SLIDE 12

What is a network?

Attributes of nodes:

  • 1. Label

Here: Name

  • 2. Colour

Here: Gender

  • 3. Size

Number of connections Not in the data, but derived

slide-13
SLIDE 13

What is a network?

Attributes of edges:

  • 1. Label
  • 2. Colour
  • 3. Size
  • 4. Direction

Networks can be directed or undirected Here: directed

slide-14
SLIDE 14

Reading a network

Imagine the connection here means "likes"

John likes many people, but no one likes John

  • Everybody likes Diana, but Diana

doesn't like anyone

  • There are no 2 people who like each
  • ther
  • Everyone is connected
  • No isolated nodes
slide-15
SLIDE 15

Types of network

  • 1. Graphs - a web of relations including

circles

  • 2. Trees - no circles
slide-16
SLIDE 16

Types of network

  • 1. Graphs - a web of relations

including circles

  • 2. Trees - no circles
  • 3. Bipartite - 2 sets of nodes with

links between the sets but not within each set

slide-17
SLIDE 17

Analysing the network

Four types of centrality measures

  • 1. Degree centrality - the numbers of

connections

  • 2. Closeness centrality - closeness to the

entire network

  • 3. Betweenness centrality - bridges
  • 4. Eigenvector centrality - connection to

well-connected nodes

slide-18
SLIDE 18

Central nodes

  • 1. Which node has the most connections?
  • 2. Which node is the closest to the entire

network?

  • 3. Which node acts as a bridge between

different communities?

  • 4. Which node is connected to well-

connected nodes?

Besides nodes, we see communities

slide-19
SLIDE 19

A network of letter writers

For historical research, letters are an interesting corpus for network analysis We (usually) know:

  • 1. Sender
  • 2. Location of the sender
  • 3. Receiver
  • 4. Location of the receiver
  • 5. Date of the letter
  • 6. Contents of the letter
slide-20
SLIDE 20

ePistolarium

For example, ePistolarium or Six Degrees of Francis Bacon

slide-21
SLIDE 21

From Hermeneutics to Data to Networks

The following slides are based on Marten Düring's tutorial From Hermeneutics to Data to Networks: Data Extraction and Network Visualization of Historical Sources Available from http://programminghistorian.org/lessons/creating-network- diagrams-from-historical-sources

slide-22
SLIDE 22

Structured data

As mentioned, we can show letters (or emails) as a network An Excel sheet of metadata of letters is what we call structured data

Nodes: senders & receivers

  • Edges: the sending of a letter
  • Attribute of nodes: location
  • But what if the data is unstructured?
slide-23
SLIDE 23

Anything goes

When the data does not itself define the relations, we can come up

  • urselves with the relations we are interested in

For example: nodes can besides people be “a film, a place, a job title, a point in time, a venue” Likewise, edges can besides direct connections represent how “two theaters could be connected by a film shown in both of them, or by co-

  • wnership, geographical proximity, or being in business in the same year”

The nature of the nodes and edges thus depends on your research interests

slide-24
SLIDE 24

Network Data Extraction

It is more difficult to extract network data from unstructured text The challenge is to “systematize text interpretation” The data will not represent the full complexity of the source, but acts as a model of the relationships you are interested in Any data you produce will only be as clear as your coding scheme

slide-25
SLIDE 25

Developing a coding scheme

First task: decide who should be part of the network, and which relations between actors are to be coded Questions to ask:

  • 1. Which aspects of relationships between two actors are relevant?
  • 2. Who is part of the network? Who is not?
  • 3. Which attributes matter?
  • 4. What do you aim to find?
slide-26
SLIDE 26

Düring's research

Marten Dürings PhD concerned the covert support networks during WWII Three research questions:

  • 1. To what extent can social relationships can help explain why ordinary people took the

risks associated with helping?

  • 2. How did such relationships enable people to provide these acts of help given that only

very limited resources were available to them?

  • 3. How did social relationships help Jewish refugees to survive in the underground?

Case study: first person narrative of Ralph Neuman, a Jewish survivor of the Holocaust. PDF: http://bit.ly/neumantext

slide-27
SLIDE 27

His answers to develop his coding scheme

  • 1. Which aspects of relationships between two actors are relevant?

“Any action which directly contributed to the survival of persecuted persons in hiding”

  • 2. Who is part of the network? Who is not?

“Anyone who is mentioned as a helper, involved in helping activities, involved in activities which aimed to suppress helping behaviour”

  • 3. Which attributes matter?

Concerning edges: “Rough categorizations of: Form of help, intensity of relationships, duration of help, time of help, time of first meeting (both coded in 6- months steps).” Concerning nodes: “Mainly racial status according to National Socialist legislation.”

  • 4. What do you aim to find?

“A deeper understanding of who helps whom how, and discovery of patterns in the data that correspond to network theory”

slide-28
SLIDE 28

Creating our own coding schema

What do we know we will need to describe? Let's create a Google Sheet with columns Giver and Recipient

Nodes: givers & recipients of help

  • Relations: help given
  • Attributes: ?
  • Consider the sentence: Alice gave Paul some food for the road, what can

we describe? Another sentence: “In September 1944 Paul stayed at his friend Alice’s place; they had met around Easter the year before” We need at least two columns describing the attributes

slide-29
SLIDE 29

Coding the sample sentence

“In September 1944 Paul stayed at his friend Alice’s place; they had met around Easter the year before”

slide-30
SLIDE 30

Values

Notice that instead of text, the data contain numbers: easier to process afterwards Notice the 99: this represents an unknown value What if we have multiple values? For example: “In September 1944 Paul stayed at his friend Alice’s place; Alice gave Paul forged documents for the road” Solution: Make another row to describe the second relation

slide-31
SLIDE 31

Describing the actors

Now we know that Alice helped Paul, but what can we tell about these people? Remember: Düring was interested in the helping of Jews, and self- help In a new sheet, we can describe the actors

slide-32
SLIDE 32

Coding all sources

Unfortunately, the source will rarely describe sentences like “Person A is connected to Persons B, C and D through relation X at time Y” So, a lot of close reading is required Moreover, when reading more sources, you will discover more actors and connections of interest, expanding your codes and forcing you to go back and update earlier coded sources

slide-33
SLIDE 33

Let's try

Let's try with the case study: http://bit.ly/neumantext Look up p15, Living underground and describe codes for the first 3 paragraphs

slide-34
SLIDE 34

To Networks

Now that we have structured data, we can create a network This is for next week!

slide-35
SLIDE 35

For next time

13 December

Who? Investigating the social entities in a corpus

Reading: (see Moodle)

Weingart, S. (2013). Networks Demystified 8: When Networks are Inappropriate. http://www.scottbot.net/HIAL/?p=39600