geoparsing the digitzation and historical georeferencing
play

GeoParsing: the digitzation and historical georeferencing of text - PowerPoint PPT Presentation

GeoParsing: the digitzation and historical georeferencing of text documents Stuart Dunn Centre for e-Research, Kings College London ISGC, Taipei 10th March 2010 Bicameral parliament at Stormont 1921-1972 Transcripts of all debates -


  1. GeoParsing: the digitzation and historical georeferencing of text documents Stuart Dunn Centre for e-Research, King’s College London ISGC, Taipei 10th March 2010

  2. • Bicameral parliament at Stormont 1921-1972 • Transcripts of all debates - Hansards • Fundamental aim - to broaden access

  3. • 2004: Digitzation of Lower House Hansards (80 volumes) • 2008: Digitzation of Upper House Hansards (53 volumes) • Aim is to co-locate the collections in a single, sustainable repository • Georeferencing, based on NER approach

  4. Georeferencing: basic principles • Informal : based on placenames • Formal : based on coordinates, or some other mathematical expression Benefits • Resolving ambiguity • Ease of access to data objects • Integration of data from heterogeneous sources • Resolving space and time

  5. Gazetteer ID Geometric location Feature type Toponym

  6. From the parsed text From a reference gazetteer

  7. Problems:- • Identification of place names (as opposed to [e.g.] person names) • Disambiguation of place names (e.g. Belfast, Antrim versus Belfast, Maine) • Document structure - inevitably affects how the Geoparser works with individual corpora • Lack of standardized way of dealing with georeferencing • Only point data

  8. Defining spatial footprints 34.87 24.87 ANDROS

  9. Point data is problematic... 723 722 618722 721 617 618 169

  10. • ‘Enforced crispness’ • The camera (or the geovisualization) never lies • Some attempts to improve this model, e.g. anchor theory, buffering procedures

  11. Other applications

  12. How do we get more out of digitization? • Not just about ‘linear’ reading • Need for authoritative cross-domain vocabularies and gazetteers, FTTs etc • Trusted repositories • Linking between resources • Useful and useable interfaces

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend