SLIDE 1 Who? Investigating the social entities in a corpus
Max Kemman
University of Luxembourg December 7, 2015
Doing Digital History: Introduction to Tools and Technology
SLIDE 2 Today
Final assignment
- From Hermeneutics to Data to Networks
- Preparing the data with Palladio
- Visualising with Palladio
- Reflections on the tools
- Next time
SLIDE 3 Final assignment
Sources in Moodle: Sonja Kmec (2004) Noblewomen and Family Fortunes in Seventeenth Century England and France. A Study of the Lives of the Countess of Derby and her sister-in-law, the Duchess de La Trémoïlle Two collections of letters: Two protestant preachers in France, writing about daily life
- 1. Letters by André Rivet (sent between 1606-1646)
- 2. Letters by Abraham Rambour (sent between 1619-1650)
SLIDE 4 The assignment
- 1. Prepare the sources so they can be analysed as data
Describe all the letters in Google spreadsheet (see link in Moodle) This is a group effort: around 200p of letters so approx. 12 pages per person Check the work of at least one other person Once the Google Sheet is done, make your own copy so you can annotate it further
- 2. Analyse the letters with the W-questions
- 3. Reflect upon your analysis
SLIDE 5 W questions
Can you come up with more W questions?
What are the letters about? How does this change over time?
Where are the letters sent from & to? Where are the locations mentioned in the letters? What does this say about the (inter)national perspective of the writer?
When were the letters sent? How do the letters change over time?
Who are the letters sent from & to? Who are the people mentioned in the letters, and how do they relate to the writer & reader? What does this say about the social perspective of the writer?
SLIDE 6
The report
Work in pairs of two or three Include a link to your Google Sheet (via the Share button) or other sources Hand in the assignment in HTML, include your name and a decent profile photo 3000-4000 words, in English
SLIDE 7 Grading
Grading of the course Grading of the final assignment
Weekly assignments (30%)
- Final group project (70%)
- 1pt for the HTML
- 1pt for CSS
- 2pts for documentation of your process
- 4pts for discussion of the W questions
- 2pts for critical reflection
SLIDE 8
Deadline
Send in your assignment before Sunday January 31th 2016 23:59 Send them to max.kemman@uni.lu as usual
SLIDE 9
From Hermeneutics to Data to Networks
Today's lecture is based on Marten Düring's tutorial From Hermeneutics to Data to Networks: Data Extraction and Network Visualization of Historical Sources Available from http://programminghistorian.org/lessons/creating-network- diagrams-from-historical-sources Tools we will be using: Google Sheets and Palladio
SLIDE 10 Structured data
Last week we used letters as a network An Excel sheet of letters is what we call structured data
Nodes: senders & receivers
- Edges: the sending of a letter
- Attribute of nodes: location
- But what if the data is unstructured?
SLIDE 11 Anything goes
When the data does not itself define the relations, we can come up
- urselves with the relations we are interested in
For example: nodes can besides people be “a film, a place, a job title, a point in time, a venue” Likewise, edges can besides direct connections represent how “two theaters could be connected by a film shown in both of them, or by co-
- wnership, geographical proximity, or being in business in the same year”
The nature of the nodes and edges thus depends on your research interests
SLIDE 12
Network Data Extraction
It is more difficult to extract network data from unstructured text The challenge is to “systematize text interpretation” The data will not represent the full complexity of the source, but acts as a model of the relationships you are interested in Any data you produce will only be as clear as your coding scheme
SLIDE 13 Developing a coding scheme
First task: decide who should be part of the network, and which relations between actors are to be coded Questions to ask:
- 1. Which aspects of relationships between two actors are relevant?
- 2. Who is part of the network? Who is not?
- 3. Which attributes matter?
- 4. What do you aim to find?
SLIDE 14 Düring's research
Marten Dürings PhD concerned the covert support networks during WWII Three research questions:
- 1. To what extent can social relationships can help explain why ordinary people took the
risks associated with helping?
- 2. How did such relationships enable people to provide these acts of help given that only
very limited resources were available to them?
- 3. How did social relationships help Jewish refugees to survive in the underground?
Case study: first person narrative of Ralph Neuman, a Jewish survivor of the Holocaust. PDF: http://bit.ly/neumantext
SLIDE 15 His answers to develop his coding scheme
- 1. Which aspects of relationships between two actors are relevant?
“Any action which directly contributed to the survival of persecuted persons in hiding”
- 2. Who is part of the network? Who is not?
“Anyone who is mentioned as a helper, involved in helping activities, involved in activities which aimed to suppress helping behaviour”
- 3. Which attributes matter?
Concerning edges: “Rough categorizations of: Form of help, intensity of relationships, duration of help, time of help, time of first meeting (both coded in 6- months steps).” Concerning nodes: “Mainly racial status according to National Socialist legislation.”
- 4. What do you aim to find?
“A deeper understanding of who helps whom how, and discovery of patterns in the data that correspond to network theory”
SLIDE 16 Creating our own coding schema
What do we know we will need to describe? Let's create a Google Sheet with columns Giver and Recipient
Nodes: givers & recipients of help
- Relations: help given
- Attributes: ?
- Consider the sentence: Alice gave Paul some food for the road, what can
we describe? Another sentence: “In September 1944 Paul stayed at his friend Alice’s place; they had met around Easter the year before” We need at least two columns describing the attributes
SLIDE 17
Coding the sample sentence
“In September 1944 Paul stayed at his friend Alice’s place; they had met around Easter the year before”
SLIDE 18
Values
Notice that instead of text, the data contain numbers: easier to process afterwards Notice the 99: this represents an unknown value What if we have multiple values? For example: “In September 1944 Paul stayed at his friend Alice’s place; Alice gave Paul forged documents for the road” Solution: Make another row to describe the second relation
SLIDE 19
Describing the actors
Now we know that Alice helped Paul, but what can we tell about these people? Remember: Düring was interested in the helping of Jews, and self- help In a new sheet, we can describe the actors
SLIDE 20
Coding all sources
Unfortunately, the source will rarely describe sentences like “Person A is connected to Persons B, C and D through relation X at time Y” So, a lot of close reading is required Moreover, when reading more sources, you will discover more actors and connections of interest, expanding your codes and forcing you to go back and update earlier coded sources
SLIDE 21
Let's try
Let's try with the case study: http://bit.ly/neumantext Look up p15, Living underground and describe codes for the first 3 paragraphs
SLIDE 22
Preparing the data with Palladio
To visualize the coded data, we will use Palladio: http://palladio.designhumanities.org/ First we need to prepare the data for Palladio We will use the sample data set from http://bit.ly/duringdata (No need to copy anything just yet)
SLIDE 23
Loading the actors/nodes
Select the Sheet Attributes which describes the actors Copy everything by ctrl+a ctrl+c (Windows) or cmd+a cmd+c (Apple) In the Palladio screen, paste and click Load Palladio now contains a primary table of actors When using Chrome: rename the table to something like People
SLIDE 24
Adding relations
Click on Person Click Add new table Go back to the Google Sheet, select the Relations sheet, and copy & paste all the data into Palladio, click Load, and click Done
SLIDE 25
Connecting the two tables
We now have two tables in Palladio We will link them by the names of the actors involved These names act as identifiers
SLIDE 26
Connecting the two tables
Select Giver in the second table At the bottom, select Extension and select the option (such as People) Click Done, and repeat the same for Recipient
SLIDE 27
Temporal data
For the When question, we can let Palladio use the Time data as temporal data Select Time Step Start and change the Data type to Date, and click Done Repeat the same for Time Step End You'll notice the data are not actual dates, but at least the data shows some chronology
SLIDE 28
Ready?
Now we have the data ready for visualisation Do not close or refresh the Palladio tab: you will have to start all over
SLIDE 29
Visualising with Palladio
Now let's look at the network by selecting Graph at the top bar As a source, choose the Source Giver and close the popup As a target, choose the Target Recipient Watch the result!
SLIDE 30
Palladio Graph Settings
Try the two Highlighting check-boxes Try Size nodes What can we learn from this graph?
SLIDE 31
Facet
To filter for certain attributes, select Facet in the lower-left corner As a Dimension select Form of help and close the popup Now you can select to filter by the different forms of help To refine even further, we can select more facets by selecting the Dimension and selecting other options, such as Date of Activity To remove a filter, delete the red trashcan in the lower right corner
SLIDE 32 Bipartite networks
2 sets of nodes with links between the sets but not within each set In Palladio, change the settings for Source and Target nodes, for example:
Source: Form of help, Target: Giver
- Source: Recipient, Target: Sex
SLIDE 33
Further visualisations
Put the graph back to Giver and Recipient You can also play around with the Timeline and Time step to see how the network changes over time (these features don't work in Firefox, but do work in Chrome)
SLIDE 34
Example of another project
https://www.youtube.com/watch?v=41_DQQii628
SLIDE 35
Reflections on the tools
What did you think of the tools thus far?
SLIDE 36
Specific versus Generic tools
The most difficult tool proved Excel / Google Sheets Could this be because it offers far more functionality than what we needed?
SLIDE 37
Easy work
It appears, creating the network visualisation only comes at the very end, and is maybe not even the most work Digital methods do not necessarily make your research easier, but they can provide you with new perspectives
SLIDE 38
For next time
14 December
Digital Public History (Anita Lucchesi - @alucchesi)
Reading: (see Moodle)
Noiret, S. (2015) Digital Public History: bringing the public back in. Public History Weekly 3(13) DOI: http://doi.org/10.1515/phw-2014-2647