Introduction to Historical Text Reuse Detection Marco Bchler, Emily - - PowerPoint PPT Presentation

introduction to historical text reuse detection
SMART_READER_LITE
LIVE PREVIEW

Introduction to Historical Text Reuse Detection Marco Bchler, Emily - - PowerPoint PPT Presentation

Introduction to Historical Text Reuse Detection Marco Bchler, Emily Franzini, Greta Franzini, Maria Moritz eTRAP Research Group Gttingen Centre for Digital Humanities Institute of Computer Science Georg August University Gttingen, Germany


slide-1
SLIDE 1
  • 20. Oktober 2015

KITAB DH Hackathon 2015

Introduction to Historical Text Reuse Detection

Marco Büchler, Emily Franzini, Greta Franzini, Maria Moritz eTRAP Research Group Göttingen Centre for Digital Humanities Institute of Computer Science Georg August University Göttingen, Germany

slide-2
SLIDE 2
  • 20. Oktober 2015

2015 DH Estonia – Text Reuse Hackathon

Overview

  • What is text reuse?
  • Aspects of text reuse
  • ACID for the Digital Humanities
  • Big (Humanities) Data
  • Language Model
slide-3
SLIDE 3
  • 20. Oktober 2015

2015 DH Estonia – Text Reuse Hackathon

My interests :)

slide-4
SLIDE 4
  • 20. Oktober 2015

2015 DH Estonia – Text Reuse Hackathon

What do you associate with text reuse/intertextuality?

slide-5
SLIDE 5
  • 20. Oktober 2015

2015 DH Estonia – Text Reuse Hackathon

Typical expectation of a computer scientist: oversimplification

slide-6
SLIDE 6
  • 20. Oktober 2015

2015 DH Estonia – Text Reuse Hackathon

Expectations of a humanists: oversimplification

slide-7
SLIDE 7
  • 20. Oktober 2015

2015 DH Estonia – Text Reuse Hackathon

Text Reuse for Humanities and Computer Science

  • Question: Why is Text Reuse so relevant for Humanities and Computer

Science?

  • Premise: The amount of digitally available data is growing exponentially (Big

Data)

  • Humanities:

– Lines of transmission and textual criticism – Transmissions of ideas/thoughts under different circumstances and conditions

  • Computer Science:

– Text Decontamination for stylometry and authorship attribution, dating of texts – gen. Text Mining, Corpus Linguistics

slide-8
SLIDE 8
  • 20. Oktober 2015

2015 DH Estonia – Text Reuse Hackathon

Temperature Map

slide-9
SLIDE 9
  • 20. Oktober 2015

2015 DH Estonia – Text Reuse Hackathon

Respect to the topic

  • ACID for the Digital Humanities:

– Acceptance – Complexity – Interoperability – Diversity

slide-10
SLIDE 10
  • 20. Oktober 2015

2015 DH Estonia – Text Reuse Hackathon

ACID for the Digital Humanities – Acceptance I

slide-11
SLIDE 11
  • 20. Oktober 2015

2015 DH Estonia – Text Reuse Hackathon

ACID for the Digital Humanities – Acceptance II

How to be accepted by humanists if text mining is a black box we can't look into?

slide-12
SLIDE 12
  • 20. Oktober 2015

2015 DH Estonia – Text Reuse Hackathon

ACID for the Digital Humanities – Acceptance III

Transparency: How to provide user- friendly insights into complex mining techniques and machine learning?

slide-13
SLIDE 13
  • 20. Oktober 2015

2015 DH Estonia – Text Reuse Hackathon

Current approach

slide-14
SLIDE 14
  • 20. Oktober 2015

2015 DH Estonia – Text Reuse Hackathon

ACID for the Digital Humanities – Acceptance IV

slide-15
SLIDE 15
  • 20. Oktober 2015

2015 DH Estonia – Text Reuse Hackathon

ACID for the Digital Humanities – Acceptance V

slide-16
SLIDE 16
  • 20. Oktober 2015

2015 DH Estonia – Text Reuse Hackathon

ACID for the Digital Humanities – Acceptance VI

slide-17
SLIDE 17
  • 20. Oktober 2015

2015 DH Estonia – Text Reuse Hackathon

ACID for the Digital Humanities – Acceptance VII

slide-18
SLIDE 18
  • 20. Oktober 2015

2015 DH Estonia – Text Reuse Hackathon

ACID for the Digital Humanities – Acceptance VII

slide-19
SLIDE 19
  • 20. Oktober 2015

2015 DH Estonia – Text Reuse Hackathon

ACID for the Digital Humanities – Complexity

slide-20
SLIDE 20
  • 20. Oktober 2015

2015 DH Estonia – Text Reuse Hackathon

ACID for the Digital Humanities – Interoperability

slide-21
SLIDE 21
  • 20. Oktober 2015

KITAB DH Hackathon 2015

ACID for the Digital Humanities – Diversity (Reuse Types)

  • Stability (yellow)
  • Purpose (green)
  • Size of text reuse (blue)
  • Classification (light blue)
  • Degree of distribution (purple)
  • Written and oral transmission
slide-22
SLIDE 22
  • 20. Oktober 2015

KITAB DH Hackathon 2015

ACID for the Digital Humanities – Diversity (Reuse Styles)

slide-23
SLIDE 23
  • 20. Oktober 2015

KITAB DH Hackathon 2015

Key problem

Basic question: Distribution of Reuse Types und Reuse Styles are often unknown: Which model(s) should be chosen?

slide-24
SLIDE 24
  • 20. Oktober 2015

KITAB DH Hackathon 2015

Outline

slide-25
SLIDE 25
  • 20. Oktober 2015

DH Hackathon 2015: "Don't leave your data problems at home!"

Thank you!

"Stealing from one is plagiarism, stealing from many is research" (Wilson Mitzner, 1876-1933) Visit us at http://etrap.gcdh.de