GeoParsing: the digitzation and historical georeferencing of text - - PowerPoint PPT Presentation

geoparsing the digitzation and historical georeferencing
SMART_READER_LITE
LIVE PREVIEW

GeoParsing: the digitzation and historical georeferencing of text - - PowerPoint PPT Presentation

GeoParsing: the digitzation and historical georeferencing of text documents Stuart Dunn Centre for e-Research, Kings College London ISGC, Taipei 10th March 2010 Bicameral parliament at Stormont 1921-1972 Transcripts of all debates -


slide-1
SLIDE 1

GeoParsing: the digitzation and historical georeferencing of text documents

Stuart Dunn Centre for e-Research, King’s College London ISGC, Taipei 10th March 2010

slide-2
SLIDE 2
  • Bicameral parliament at Stormont 1921-1972
  • Transcripts of all debates - Hansards
  • Fundamental aim - to broaden access
slide-3
SLIDE 3
  • 2004: Digitzation of Lower House Hansards (80

volumes)

  • 2008: Digitzation of Upper House Hansards (53

volumes)

  • Aim is to co-locate the collections in a single,

sustainable repository

  • Georeferencing, based on NER approach
slide-4
SLIDE 4

Georeferencing: basic principles

  • Informal: based on placenames
  • Formal: based on coordinates, or some other mathematical

expression Benefits

  • Resolving ambiguity
  • Ease of access to data objects
  • Integration of data from heterogeneous sources
  • Resolving space and time
slide-5
SLIDE 5
slide-6
SLIDE 6

Gazetteer ID Geometric location Toponym Feature type

slide-7
SLIDE 7

From the parsed text From a reference gazetteer

slide-8
SLIDE 8
slide-9
SLIDE 9

Problems:-

  • Identification of place names (as opposed to [e.g.]

person names)

  • Disambiguation of place names (e.g. Belfast, Antrim

versus Belfast, Maine)

  • Document structure - inevitably affects how the

Geoparser works with individual corpora

  • Lack of standardized way of dealing with

georeferencing

  • Only point data
slide-10
SLIDE 10
slide-11
SLIDE 11

ANDROS 24.87 34.87 Defining spatial footprints

slide-12
SLIDE 12

618722 Point data is problematic... 617 618 169 721 722 723

slide-13
SLIDE 13
  • ‘Enforced crispness’
  • The camera (or the

geovisualization) never lies

  • Some attempts to improve

this model, e.g. anchor theory, buffering procedures

slide-14
SLIDE 14

Other applications

slide-15
SLIDE 15
slide-16
SLIDE 16
  • Need for authoritative cross-domain vocabularies and

gazetteers, FTTs etc How do we get more out of digitization?

  • Not just about ‘linear’ reading
  • Trusted repositories
  • Useful and useable interfaces
  • Linking between resources