When? Quantitative history Max Kemman University of Luxembourg - - PowerPoint PPT Presentation

when quantitative history
SMART_READER_LITE
LIVE PREVIEW

When? Quantitative history Max Kemman University of Luxembourg - - PowerPoint PPT Presentation

When? Quantitative history Max Kemman University of Luxembourg November 15, 2015 While waiting: please login to Moodle and Google Drive Download the files luxembourg and 1000emails in both formats Doing Digital History: Introduction to Tools


slide-1
SLIDE 1

When? Quantitative history

Max Kemman

University of Luxembourg November 15, 2015

While waiting: please login to Moodle and Google Drive Download the files luxembourg and 1000emails in both formats Doing Digital History: Introduction to Tools and Technology

slide-2
SLIDE 2

Today

Quantitative history

  • Quantitative data
  • Entering the data into Google Drive
  • Creating a timeline in Google Sheets
  • Sharing the Google Sheet
  • Editing values
  • Creating different timelines
  • Next time
  • Assignment
slide-3
SLIDE 3

Quantitative history

Why would we want to analyse history by the numbers?

slide-4
SLIDE 4

Longue Durée

What if you want to analyse: Cannot focus on all the stories, need for something else

40 years?

  • 400 years?
  • 4,000 years?
  • 40,000 years?
slide-5
SLIDE 5

Cliometrics

[T]he study of History through the history of things that can be quantitatively measured – wealth, goods, and services that were taxed and recorded, and population.

Guldi & Armitage, p97

slide-6
SLIDE 6

Causality?

Big data enhance our ability to grapple with historical information. They may help us to decide the hierarchy of causality – which events mark watershed moments in their history, and which are merely part

  • f a larger pattern.

Guldi & Armitage, p89

But from our discussion of Big Data, we focused on correlation

slide-7
SLIDE 7

Comparisons and correlations

It could be interesting to see:

How two properties evolve over time: are they correlated?

  • Compare between two or more different countries for the same property
slide-8
SLIDE 8

Quantitative vs Qualitative

Time on the Cross tried this approach: is the qualitative judgement of slavery also a quantitative one? Questions: did slaves really live in such awful circumstances? And was slavery economically inefficient? Not all slaves had it bad, and Southern states 35% more efficient than Northern states

slide-9
SLIDE 9

Criticisms [T]he authors argued that each slave was only whipped something like 7.2 times per year and so slavery wasn’t as brutal as its conventional image. As if one severe whipping in an entire lifetime wouldn’t be bad enough.

(Source)

Time on the Cross also received quantitative criticisms: statistical mistakes or wrong assumptions

slide-10
SLIDE 10

Quantitative vs Qualitative Leezenberg & De Vries (2001, Wetenschapsfilosofie voor Geesteswetenschappen) ask: Does quantitative history 'undress the historical argument' (Nawrotzki & Doughterty)?

Does this mean scientific historiography doesn't work and should be stopped? Or

  • Did the explicit data and method enable scholarly discussion?
slide-11
SLIDE 11

Quantitative data

For your next assignment you will download quantitative data to analyze http://data.worldbank.org We will browse the data by country, let's look up Luxembourg

slide-12
SLIDE 12

Data formats

There are three data formats: We will be using the CSV file luxembourg.csv But also download the Excel/OpenOffice file just in case

  • 1. Excel / OpenOffice
  • 2. XML
  • 3. CSV
slide-13
SLIDE 13

CSV

Comma Separated Values, is an open standard In HTML we learned how to represent data in a table

1960 1970 1980 Luxembourg property 1.2 1.4 2.0

In CSV: "","1960","1970","1980" "Luxembourg property","1.2","1.4","2.0"

slide-14
SLIDE 14

Why make it so difficult?

Because CSV is a standard:

Many programs can read it

  • Not dependent on any one commercial program
  • It will still be readable in many years
slide-15
SLIDE 15

Entering the data into Google Drive

Go to http://drive.google.com, log in, and click the big red "NEW" button and select "File upload"

slide-16
SLIDE 16

Find the file on your hard drive, select, and click "Open" to upload it to Google Drive When your Google Drive is in English you can select the CSV, otherwise the Excel-file will work better

slide-17
SLIDE 17

Find the file in your Google Drive, right-click, select "Open with" and select Google Sheets

slide-18
SLIDE 18

Google Sheets should now open a nicely ordered sheet as shown

  • here. To clean it up, select the

first 4 rows, right-click, and select delete

slide-19
SLIDE 19

Select the first 2 columns, right- click, and select delete

slide-20
SLIDE 20

Select the 2nd column with Indicator Codes, right-click, and select delete

slide-21
SLIDE 21

Select the 1st row with the years and copy using ctrl+c (Windows)

  • r cmd+c (Apple)
slide-22
SLIDE 22

Click the + in the lower-left corner (encircled) to create a new sheet, and paste the row here using ctrl+v (Windows) or cmd+v (Apple)

slide-23
SLIDE 23

To search in the first sheet, select it at the bottom, and use ctrl+f (Windows) or cmd+f (Apple) to search for gdp (current US$). Select and copy it

slide-24
SLIDE 24

After copying the row with gdp (current US$), go to your new sheet and paste

slide-25
SLIDE 25

Creating a timeline in Google Sheets

Select the two rows by dragging your mouse from the 1 in row 1 to the 2 in row 2

slide-26
SLIDE 26

Select "Insert" in the menu bar and select "Chart..."

slide-27
SLIDE 27

Google Sheets will suggest several charts. Choose the second line-chart and select "Insert" When your chart looks completely different even though you use the same property, and you have tried with the CSV, go back and upload the Excel file instead

slide-28
SLIDE 28

Go back to your first sheet and search for "electric power" to find the row electric power consumption (kWh per capita). Select this row and copy

slide-29
SLIDE 29

Paste the row in the new sheet under the rows you have. The chart should be updated automatically

slide-30
SLIDE 30

The 2 properties have very different values. Hold mouse on the second line until you see "Edit series", and select the _l symbol (encircled) to create a second y- axis

slide-31
SLIDE 31

You should now see 2 lines that you can compare. Change the x- axis title by clicking it and enter "Year"

slide-32
SLIDE 32

Press enter to apply

slide-33
SLIDE 33

To edit the chart further, select the chart and click the triangle in the upper-right corner and select "Advanced edit..."

slide-34
SLIDE 34

In this window you can further customize the chart

slide-35
SLIDE 35

One interesting visual change is to select "Smooth". Click "Update"

  • nce you're done
slide-36
SLIDE 36

Sharing the Google Sheet

To share the Google Sheet, click the big blue "Share" button in the top-right corner and click "Get shareable link"

slide-37
SLIDE 37

The sharing window will now show a URL you can copy-paste into your report.

slide-38
SLIDE 38

When you click the dropdown "Anyone with the link can view" you are provided other options

slide-39
SLIDE 39

To share only the chart, you can click the triangle in the upper- right corner and select "Save image"

slide-40
SLIDE 40

Editing values

Download the 1000mails.csv file from Moodle and upload to Google Spreadsheets as before

slide-41
SLIDE 41

Specifying column headers

Drag the gray line above row 1 to below row 1 (see red circle) This way you can easily sort columns alphabetically or

  • therwise without losing

the headers

slide-42
SLIDE 42

Working with the Date field: selecting characters

6/30/2010 11:53:00 → M(M)/DD/YYYY HH:MM:SS

(I the ODS file the date may be written out slightly differently, but same principle applies)

Rather than as a date, we can treat this as a string of 18/19 characters

Create a new column next to Date, call it Date2

  • With =LEFT(field;length) we can select a number of characters
  • To select the month: =LEFT(G2;2)
  • Now we have months 6/ and 12, etc
  • To repeat for all rows: select the field, select the bottom right, and drag all the way down
slide-43
SLIDE 43

Selecting parts of Date

To work from the other end, use =RIGHT(field;length) To get for example only the year, select with left the first 10 characters, then in another column take the right 4 from that

Create column next to Date, Date2

  • With =LEFT(G2;10) we select 6/30/2010
  • Create another column next to Date2, Date3
  • With =RIGHT(H2;4) we select 2010
slide-44
SLIDE 44

Removing specific characters

Of course, 6/ isn't a real month To remove the /, we will use =SUBSTITUTE(field,"char","")

Create another column next to Date2, call it Date3

  • Write =SUBSTITUTE(I2,"/",""): the result should be 6
  • To repeat for all rows: select the field, select the bottom right, and drag all the way down
slide-45
SLIDE 45

Sorting by the new time column

To save only the result and not the formula, copy the entire Date3 column, create a new column Date4 Right-click the new column, and select paste special > paste values

  • nly
slide-46
SLIDE 46

Select the Date4 column with all the values, click the 123 button in the topbar and click Number (even when this is already checked) Now you can sort by the Date4 column

slide-47
SLIDE 47

Creating different timelines

The spreadsheet contains a number of other fields To make a timeline of just emails written by Clinton rather than others, sort the From field and select only relevant emails (Tip: maybe copy only the relevant rows to a new sheet to keep a view of what you want) This way you could compare between different email authors

slide-48
SLIDE 48

Advanced: Creating different timelines

(Skip this if you can't get the formula to work) Try to make a selection per question (e.g., all emails written by HC, or all mentioning a specific organisation)

To find fields with a specific value, create another column

  • Write the formula =COUNTIF(field;"value")
  • To get alloccurrences, use wildcards: e.g. =COUNTIF(field,"*department*")
  • The result will be 1 (yes) or 0 (no)
  • Sort by these values to select only emails with those terms and create a timeline of that
slide-49
SLIDE 49

Separating data for different timelines

For example, if we want to show a timeline for just the emails by Hillary Clinton, we: To visualise, see the next slide

Sort the spreadsheet by From column

  • Select all the emails where From is Hillary Clinton
  • Copy these to a new sheet
slide-50
SLIDE 50

Visualising the emails

In your chart, the X-axis will be what you selected from Date (for example, the months), the Y-axis the number of emails For example, if we want to show a timeline for the emails by Hillary Clinton, we:

Use the previously created spreadsheet of just emails from Hillary Clinton

  • Sort the spreadsheet by the new time column you created so this is 1-12
slide-51
SLIDE 51

Select the time column, and insert a chart Aggregate by the time column, so that is shows the number of

  • ccurrences of each

month You're done!

slide-52
SLIDE 52

For next time

22 November

Where? Maps in History (Catherine Jones)

slide-53
SLIDE 53

Lunchtime seminar: Discover hidden history in the city

Catherine Jones & Daniele Guido Thursday 17 November 2016, 12:00 "Aquarium", 4th floor MSH www.crosscult.eu

slide-54
SLIDE 54

Assignment

Take the 1000mails.csv file and work with it in Google Spreadsheet Try to create several timelines of interest using modifications of the Date field For the brave: you can also use the files 10kmails.csv or allmails.csv

slide-55
SLIDE 55

Assignment

Work in pairs of two or three Link to the original data and include a link to your Google Sheet (via the Share button) Hand in the assignment in HTML, include your name and a decent profile photo

slide-56
SLIDE 56

Assignment

800-1500 words, in English Grading Email to max.kemman@uni.lu before the start of the lecture of 29 November

1pt for free

  • 3pts for HTML and CSS
  • 3pts for documentation of your process (why these charts?)
  • 3pts for critical reflection on your charts (what can we learn from the charts?)