Who? Networks of social entities Max Kemman University of - - PowerPoint PPT Presentation

who networks of social entities
SMART_READER_LITE
LIVE PREVIEW

Who? Networks of social entities Max Kemman University of - - PowerPoint PPT Presentation

Who? Networks of social entities Max Kemman University of Luxembourg December 13, 2016 Doing Digital History: Introduction to Tools and Technology Today Final assignment Preparing the data with Palladio (Cleaning the date


slide-1
SLIDE 1

Who? Networks of social entities

Max Kemman

University of Luxembourg December 13, 2016

Doing Digital History: Introduction to Tools and Technology

slide-2
SLIDE 2

Today

Final assignment

  • Preparing the data with Palladio
  • (Cleaning the date column with Google Spreadsheets)
  • Visualising with Palladio
  • Next time
slide-3
SLIDE 3

Final assignment

Some additional info about the final assignment The computers in the TIC-Lab are powerful enough to work with all mails in Google Spreadsheets (You may also use Excel if you prefer, but more difficult for me to help when you're stuck) Create a selection and argument why this selection Deadline: 20 January 2017 23:59 You receive grades on Friday 27 January 2017

slide-4
SLIDE 4

Final assignment data

All data is in Moodle in folder Final Assignment:

allmails-metadata.csv & allmails-metadata.ods

  • allmails-ner.csv & allmails-metadata.ods (including mentioned people, organisations,

locations)

  • allmails-geocoded.csv (about 108k locations)
  • Folder with text files per 1k
slide-5
SLIDE 5

Preparing the data with Palladio

To visualize the coded data, we will use Palladio: http://hdlab.stanford.edu/palladio/ First we need to prepare the data for Palladio

slide-6
SLIDE 6

Loading the data

Click Start We will use the 1000mails-cleandate.csv file from Moodle in the Who folder) Drag the CSV file onto the text input field Click Load

slide-7
SLIDE 7

Preparing the data

You will get a list of the columns from the spreadsheet You can already give your project a title and your data table as well Do not close this tab or refresh, you will have to start over! Let's look at several columns

slide-8
SLIDE 8

From

Sort the values by Frequency Check the data type Click Close

slide-9
SLIDE 9

Date

To set the data type to date we need the format: YYYY-MM-DD In our original CSV the format included the clock, but here we have the data in the right format so it's automatically recognised See next section for how to clean the date Click Close

slide-10
SLIDE 10

People

This contains the named entities per email To separate multiple people in an email, enter the delimiter | in the Multiple values box Click Close

slide-11
SLIDE 11

People

This contains the named entities per email To separate multiple people in an email, enter the delimiter | in the Multiple values box Click Close

slide-12
SLIDE 12

Cleaning the date column with Google Spreadsheets

Here we used Google Spreadsheets, but also possible in Excel & LibreOffice You can skip this for now, but important for final assignment

slide-13
SLIDE 13

Cleaning the Date field

Select the Date column, and go to Format > Number > More Formats > More date and time formats

slide-14
SLIDE 14

Cleaning the Date field

Select the appropriate

  • ption YYYY-MM-DD and

click Apply

slide-15
SLIDE 15

Cleaning the Date field

The Date column will now have the appropriate form

slide-16
SLIDE 16

Exporting the CSV

Click File > Download as > Comma-separated values (.csv, current sheet)

slide-17
SLIDE 17

Visualising with Palladio

Now let's look at the network by selecting Graph at the top bar As a source, choose the From and close the popup As a target, choose the To and close the popup Wait and watch the result!

slide-18
SLIDE 18
slide-19
SLIDE 19

Palladio Graph Settings

Try the two Highlighting check-boxes Try Size nodes What can we learn from this graph?

slide-20
SLIDE 20
slide-21
SLIDE 21

Facet

To filter for certain attributes, select Facet in the lower-left corner As a Dimension select From and close the popup Now you can select to filter emails only from one person You could alternatively filter emails mentioning a specific person, location,

  • r organisation

To refine even further, we can select more facets by selecting the Dimension and selecting more options To remove a facet, delete the red trashcan in the lower right corner

slide-22
SLIDE 22

Facet selection from From column

slide-23
SLIDE 23

Facet selection from People column

slide-24
SLIDE 24

Timeline

We can also create a timeline of the emails by clicking Timeline Drag the mouse in the timeline to create a bar that acts as a filter And drag the bar to move it around so you can see how the network develops: you could compare months or years To remove the timeline filter, delete the red trashcan in the lower right corner

slide-25
SLIDE 25

Timeline

slide-26
SLIDE 26

Filtering one part of the timeline

slide-27
SLIDE 27

Filtering another part of the timeline

slide-28
SLIDE 28

Why filtering?

The network can become quite large when you have more emails, or when you select one of the people, locations, organisations columns in the graph Filtering will help to be able to read the spaghetti/graph See next slide an example of a spaghetti ball (trying to do this might make your computer quite slow)

slide-29
SLIDE 29
slide-30
SLIDE 30

Sharing

To export a graph, click the Download button in the settings (the lower one). This will export an SVG file that you can embed in your HTML report with

img src Palladio Graph.svg alt graph

slide-31
SLIDE 31

To export the entire workspace, click the upper Download button. This will export a JSON file that you can load next time (see next slide)

slide-32
SLIDE 32

If you previously exported your workspace, you can load it in by selecting "Load an existing project" and choosing the JSON file. Also useful to share with project partners

slide-33
SLIDE 33

For next time

20 December

Wrap-up