Some Remarks on Text Data Visualization and Codec Transparency - - PowerPoint PPT Presentation

some remarks on text data visualization and codec
SMART_READER_LITE
LIVE PREVIEW

Some Remarks on Text Data Visualization and Codec Transparency - - PowerPoint PPT Presentation

Some Remarks on Text Data Visualization and Codec Transparency Bryan Jurish jurish@bbaw.de VisiHu 2017: Visualisierungsprozesse in den Humanities Universit at Z urich 17 th July, 2017 Overview Preliminaries p Full Disclosure p


slide-1
SLIDE 1

Some Remarks on Text Data Visualization and Codec Transparency

Bryan Jurish

jurish@bbaw.de VisiHu 2017: Visualisierungsprozesse in den Humanities Universit¨ at Z¨ urich 17th July, 2017

slide-2
SLIDE 2

2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 1

Overview

Preliminaries

p Full Disclosure p Terminology: Data, Text, & Visualization

Remarks

p Pipelines, Parameters, & (visualization) Procedures p Visualizations as Filters p Lossiness, Compression, & ‘Universal’ Filters p ‘Intuitivity’, Exploitation, & Coherence p Co-operation & Codec Transparency

Summary

slide-3
SLIDE 3

2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 2

Full Disclosure

p I am a computational linguist t tinker of algorithms t tweaker of data structures t not a philosopher

(. . . but I played one as an undergraduate)

slide-4
SLIDE 4

2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 2

Full Disclosure

p I am a computational linguist t tinker of algorithms t tweaker of data structures t not a philosopher

(. . . but I played one as an undergraduate)

p . . . I am also an incorrigible Platonist t ∃x.x = ∅ t formal (mathematical) objects really exist! t good company:
slide-5
SLIDE 5

2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 2

Full Disclosure

p I am a computational linguist t tinker of algorithms t tweaker of data structures t not a philosopher

(. . . but I played one as an undergraduate)

p . . . I am also an incorrigible Platonist t ∃x.x = ∅ t formal (mathematical) objects really exist! t good company: p Please adjust your interpretative apparatus if and where required t to accommodate my bottomless na¨

ıvet´ e, and/or

t according to your own epistemological commitments (or lack thereof)
slide-6
SLIDE 6

2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 3

Terminology

Visualization

p an algorithmic procedure by which an underlying data source is transformed

to graphical form for direct human consumption

p e.g. as a network graph, tag cloud, motion chart, etc.

Text Data

p a (digital) text corpus, possibly including extralinguistic information such as

bibliographic meta-data, document structure, etc. Text Data Visualization

p a visualization procedure using a (digital) text corpus as its underlying data

source (usually indirectly) Visualization Pipeline

p a cascade of algorithmic procedures by which (raw) text data is prepared for

and formatted by a particular visualization procedure, including any preprocessing and application-specific modeling

slide-7
SLIDE 7

2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 4

Remark 1: Pipelines versus Procedures

Facts

p raw text data itself does not directly support most visualization procedures p each visualization procedure imposes formal constraints on its parameters

Claim

p (preprocessing) pipelines ⊥ (visualization) procedures p “generic” visualization procedures cannot be clearly distinguished from the

preprocessing machinery (“pipeline”) which supplies their input Rhetoric

p Q: how does one visualize a flat list of unweighted terms as a network graph?

A: one doesn’t! (at least not in any meaningful way)

p Q: why is Mike Bostock’s D3.js API so mind-bogglingly complex?

A: because it needs to be! (“generic” visualization procedures are fictional)

slide-8
SLIDE 8

2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5

Remark 2: Visualizations ∼ Filters

Information Source Destination Transmitter (Encoder) Receiver (Decoder) Noise Source Message Message Signal Received Signal

p noisy channel model of communication

(Shannon 1948)

slide-9
SLIDE 9

2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5

Remark 2: Visualizations ∼ Filters

Information Source Destination Transmitter (Encoder) Receiver (Decoder) Noise Source Message Message Signal Received Signal

p noisy channel model

(Shannon 1948)

t “codec” = encoder ⊕ decoder
slide-10
SLIDE 10

2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5

Remark 2: Visualizations ∼ Filters

(Text) (Preprocessing) (Visualization) Information Source Destination Transmitter (Encoder) Receiver (Decoder) Noise Source Message Message Signal Received Signal (User's Eye)

p noisy channel model

(Shannon 1948)

t “codec” = encoder ⊕ decoder p text data visualization codec (na¨

ıve tinker’s version)

slide-11
SLIDE 11

2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5

Remark 2: Visualizations ∼ Filters

(Text) (Preprocessing) (Visualization) Information Source Destination Transmitter (Encoder) Receiver (Decoder) Noise Source Message Message Signal Received Signal (User's Eye)

p noisy channel model

(Shannon 1948)

t “codec” = encoder ⊕ decoder p text data visualization codec (na¨

ıve tinker’s version) not the whole story!

slide-12
SLIDE 12

2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5

Remark 2: Visualizations ∼ Filters

p noisy channel model

(Shannon 1948)

t “codec” = encoder ⊕ decoder p natural language is a lossy codec

(Reddy 1979)

slide-13
SLIDE 13

2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5

Remark 2: Visualizations ∼ Filters

p noisy channel model

(Shannon 1948)

t “codec” = encoder ⊕ decoder p natural language is a lossy codec

(Reddy 1979)

p text data visualization is a (lossy) filter
slide-14
SLIDE 14

2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5

Remark 2: Visualizations ∼ Filters

p noisy channel model

(Shannon 1948)

t “codec” = encoder ⊕ decoder p natural language is a lossy codec

(Reddy 1979)

p text data visualization is a (lossy) filter

what about the decoder?

slide-15
SLIDE 15

2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5

Remark 2: Visualizations ∼ Filters

(Author) Information Source Destination Transmitter (Encoder+Filter) Receiver (Filter+Decoder) Noise Source (User) (Lossy Compression) (NLG) (TxtVis) (optical intake) (interp.)

p noisy channel model

(Shannon 1948)

t “codec” = encoder ⊕ decoder p natural language is a lossy codec

(Reddy 1979)

p text data visualization is a (lossy) filter

(transmission side)

p reception (interpretation) is filtered too!
slide-16
SLIDE 16

2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 6

Remark 3: Lossiness & ‘Universal’ Filters

Visualization Pipelines Lossy Compression

p information is lost when messages are passed through the codec t usually by design

(we already have the text-encoding)

t no lossless formal model of natural language available

(yet) ‘Universal’ Filters

p as humans, we’re already equipped with a whole bevy of (lossy) filters: t linguistic

(minimal attachment, semantic priming)

t perceptual

(motion detection, color sensitivity)

t cognitive

(object independence, causal relations)

t cultural

(common knowledge, conventional signs) Lossiness ∼ ‘Distance’

p lossy filters increase “reading distance”

(Moretti 2013)

p the communication channel was already fallible
slide-17
SLIDE 17

2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 7

Remark 4: ‘Intuitivity’ ∼ Exploitation

‘Intuitivity’

p ‘intuitive’ visualizations exploit users’ pre-existing (‘universal’) filters t perceptual size, motion, color t cognitive physical simulations, display “objects” t cultural shared conventional signs p reduced recipient processing load t “progressive disclosure” conscious focus

Exploitation & Coherence

p successful exploitation ⇔ coherence of pipeline- & user-filters t all and only relevant information passes unchanged through both codecs t relevance depends on user’s individual research question
slide-18
SLIDE 18

2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 8

Remark 5: Co-operation Transparency

Co-operation

“Make your contribution such as it is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged.” — Grice (1975)

Codec Transparency

p no perceptible data loss

(e.g. mp3, ogg audio codecs)

p visualization no apprehensible (relevant) data loss

Visualization as (co-operative) Communication

p Task: maximize transparency optimize for users’ common research goals p Challenges: t research goals vary widely between users, projects t commonalities can be hard to identify and formally model
slide-19
SLIDE 19

2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 9

Summary

Visualization Procedures

p non-modular, interface constraints

(preprocessing pipelines) Visualization Pipelines

p noisy-channel filters

(lossy, usually by design) ‘Universal’ Filters

p recipient-internal

(perceptual, cognitive, cultural) ‘Intuitivity’

p exploitation of recipient filters

(relevance, coherence) Co-operative Communication

p maximize codec transparency

(minimize apprehensible loss)

slide-20
SLIDE 20

— The End —

treu

wirklich

lieb

herzlich

lächeln

gut

schön

persönlich

warm

letzte

lieb

danken

klein glücklich

kurz

liebenswürdig

jung ganz

freundschaftlich

gehorsam

freundlich

0.0 1.0 2.0 3.0 4.0 5.0 6.0

Thank you for listening!

http://kaskade.dwds.de/˜jurish/visihu2017/danke

slide-21
SLIDE 21

2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 10

References

  • P. Grice. Logic and conversation. In P. Cole and J. Morgan, editors, Syntax and Semantics,

volume 3: Speech Acts, pages 41–58. Academic Press, 1975.

  • F. Moretti. Distant reading. Verso Books, 2013.
  • M. J. Reddy. The conduit metaphor: A case of frame conflict in our language about language.

In A. Ortony, editor, Metaphor and Thought, pages 284–310. Cambridge University Press, 1979.

  • C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27

(3):379–423, 1948.