Beatrice Alex balex@inf.ed.ac.uk
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Digital history and big data: Text mining historical documents on - - PowerPoint PPT Presentation
Digital history and big data: Text mining historical documents on trade in the British Empire Beatrice Alex balex@inf.ed.ac.uk Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013 OVERVIEW What is text mining? Text Mining in digital
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Cinchona plantations in George King’s A Manual of Cinchona Cultivation in India (1880). Global Fats Supply 1894-98
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Trading Consequences
Bea Alex, Timothy Bristow, Jim Clifford, Colin Coates, Ian Fieldhouse, Claire Grover, Uta Hinrichs, Ewan Klein, Clare Llewellyn, Nicola Osborne, Aaron Quigley, James Reid and Richard Tobin
Contact: dig-trade@inf.ed.ac.uk, Twitter: @digtrade Blog: http://tradingconsequences.blogs.edina.ac.uk/
Informa(on)visualisa(on Text)mining)and)ontology)management Historical)analysis)&))
Data)integra(on)&)dissemina(on
Type to enter text!! From!Padang!was!exported,!in!1871,!6,127!piculs!of ! cassia!bark,!of!which!a!large!portion!was!shipped!to ! America!(Fliickiger!and!Hanbury).!...!! ! (excerpt!from!Spices,!Ridley,!1912)
Early Canadiana Online AMD Confidential Prints ProQuestʼs House of Commons Parliamentary Papers Kew Gardenʼs Directorʼs Correspondence Archive JSTORʼs Foreign and Commonwealth Office collection (sample)
"Captive Tomes" by traceyp3031 on Flickr "Library Archives 05” by peteashton on Flickr :Cinnamon_Spice skos:prefLabel :Spice skos:narrowerThan cassia bark cinnamon cinnamomum vera skos:prefLabel skos:altLabel tc:Cassia_Bark skos:narrowerThan cinnamomum cassia skos:altLabel Document id spices1912ridley docid spices1912ridley title Spices url http://archive.org/details/spiceshenry00ridlrich pubdate 1912 type text author Ridley, Henry N. lang eng Collection id books text books Lo Location id geonames: 1633419 text Padang latitudeDigital scholarship: day of ideas 2, Edinburgh, 02/05/2013
resource commodities in the nineteenth century?
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
commodity: cassia bark date: 1871 location: Padang location: America quantity + unit: 6,127 piculs
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
commodity: cassia bark date: 1871 (year=1871) location: Padang (lat=-0.94924;long=100.35427;country=ID) location: America (lat=39.76;long=-98.50;country=n/a) quantity + unit: 6,127 piculs
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
destination location: America commodity–date relation: cassia bark – 1871 commodity–location relation: cassia bark – Padang commodity–location relation: cassia bark – America
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
:Cinnamon_Spice skos:prefLabel :Spice skos:narrowerThan cassia bark cinnamon cinnamomum vera skos:prefLabel skos:altLabel tc:Cassia_Bark skos:narrowerThan cinnamomum cassia skos:altLabel
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Seminar Talk, School of Computing, Dundee, 19/03/2013
Sophistication of the OCR engine and scanning equipment. Quality of the original print and paper. Use of historical language. Information in page margins (header, page numbers, etc.). Information in tables. Language of the text.
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Dehyphen all token-splitting hyphens using a dictionary- based approach.
Convert all false f characters to s using a corpus.
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Extract from document 10.2307/60238580 in FCOC.
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Extract from document 10.2307/60238580 in FCOC.
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Everything is based on the use cases and build on users’ hypotheses/research questions. They are responsible for identification of relevant collections and are involved in the ontology development. They provide feedback for us to improve technology iteratively: Partners at York use of the prototype for their research and track errors; Workshop at CHESS 2013 with a group of independent historians
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013
Digital scholarship: day of ideas 2, Edinburgh, 02/05/2013