Some Remarks on Text Data Visualization and Codec Transparency
Bryan Jurish
jurish@bbaw.de VisiHu 2017: Visualisierungsprozesse in den Humanities Universit¨ at Z¨ urich 17th July, 2017
Some Remarks on Text Data Visualization and Codec Transparency - - PowerPoint PPT Presentation
Some Remarks on Text Data Visualization and Codec Transparency Bryan Jurish jurish@bbaw.de VisiHu 2017: Visualisierungsprozesse in den Humanities Universit at Z urich 17 th July, 2017 Overview Preliminaries p Full Disclosure p
Some Remarks on Text Data Visualization and Codec Transparency
Bryan Jurish
jurish@bbaw.de VisiHu 2017: Visualisierungsprozesse in den Humanities Universit¨ at Z¨ urich 17th July, 2017
2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 1
Overview
Preliminaries
p Full Disclosure p Terminology: Data, Text, & VisualizationRemarks
p Pipelines, Parameters, & (visualization) Procedures p Visualizations as Filters p Lossiness, Compression, & ‘Universal’ Filters p ‘Intuitivity’, Exploitation, & Coherence p Co-operation & Codec TransparencySummary
2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 2
Full Disclosure
p I am a computational linguist t tinker of algorithms t tweaker of data structures t not a philosopher(. . . but I played one as an undergraduate)
2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 2
Full Disclosure
p I am a computational linguist t tinker of algorithms t tweaker of data structures t not a philosopher(. . . but I played one as an undergraduate)
p . . . I am also an incorrigible Platonist t ∃x.x = ∅ t formal (mathematical) objects really exist! t good company:2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 2
Full Disclosure
p I am a computational linguist t tinker of algorithms t tweaker of data structures t not a philosopher(. . . but I played one as an undergraduate)
p . . . I am also an incorrigible Platonist t ∃x.x = ∅ t formal (mathematical) objects really exist! t good company: p Please adjust your interpretative apparatus if and where required t to accommodate my bottomless na¨ıvet´ e, and/or
t according to your own epistemological commitments (or lack thereof)2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 3
Terminology
Visualization
p an algorithmic procedure by which an underlying data source is transformedto graphical form for direct human consumption
p e.g. as a network graph, tag cloud, motion chart, etc.Text Data
p a (digital) text corpus, possibly including extralinguistic information such asbibliographic meta-data, document structure, etc. Text Data Visualization
p a visualization procedure using a (digital) text corpus as its underlying datasource (usually indirectly) Visualization Pipeline
p a cascade of algorithmic procedures by which (raw) text data is prepared forand formatted by a particular visualization procedure, including any preprocessing and application-specific modeling
2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 4
Remark 1: Pipelines versus Procedures
Facts
p raw text data itself does not directly support most visualization procedures p each visualization procedure imposes formal constraints on its parametersClaim
p (preprocessing) pipelines ⊥ (visualization) procedures p “generic” visualization procedures cannot be clearly distinguished from thepreprocessing machinery (“pipeline”) which supplies their input Rhetoric
p Q: how does one visualize a flat list of unweighted terms as a network graph?A: one doesn’t! (at least not in any meaningful way)
p Q: why is Mike Bostock’s D3.js API so mind-bogglingly complex?A: because it needs to be! (“generic” visualization procedures are fictional)
2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5
Remark 2: Visualizations ∼ Filters
Information Source Destination Transmitter (Encoder) Receiver (Decoder) Noise Source Message Message Signal Received Signal
p noisy channel model of communication(Shannon 1948)
2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5
Remark 2: Visualizations ∼ Filters
Information Source Destination Transmitter (Encoder) Receiver (Decoder) Noise Source Message Message Signal Received Signal
p noisy channel model(Shannon 1948)
t “codec” = encoder ⊕ decoder2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5
Remark 2: Visualizations ∼ Filters
(Text) (Preprocessing) (Visualization) Information Source Destination Transmitter (Encoder) Receiver (Decoder) Noise Source Message Message Signal Received Signal (User's Eye)
p noisy channel model(Shannon 1948)
t “codec” = encoder ⊕ decoder p text data visualization codec (na¨ıve tinker’s version)
2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5
Remark 2: Visualizations ∼ Filters
(Text) (Preprocessing) (Visualization) Information Source Destination Transmitter (Encoder) Receiver (Decoder) Noise Source Message Message Signal Received Signal (User's Eye)
p noisy channel model(Shannon 1948)
t “codec” = encoder ⊕ decoder p text data visualization codec (na¨ıve tinker’s version) not the whole story!
2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5
Remark 2: Visualizations ∼ Filters
p noisy channel model(Shannon 1948)
t “codec” = encoder ⊕ decoder p natural language is a lossy codec(Reddy 1979)
2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5
Remark 2: Visualizations ∼ Filters
p noisy channel model(Shannon 1948)
t “codec” = encoder ⊕ decoder p natural language is a lossy codec(Reddy 1979)
p text data visualization is a (lossy) filter2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5
Remark 2: Visualizations ∼ Filters
p noisy channel model(Shannon 1948)
t “codec” = encoder ⊕ decoder p natural language is a lossy codec(Reddy 1979)
p text data visualization is a (lossy) filterwhat about the decoder?
2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 5
Remark 2: Visualizations ∼ Filters
(Author) Information Source Destination Transmitter (Encoder+Filter) Receiver (Filter+Decoder) Noise Source (User) (Lossy Compression) (NLG) (TxtVis) (optical intake) (interp.)
p noisy channel model(Shannon 1948)
t “codec” = encoder ⊕ decoder p natural language is a lossy codec(Reddy 1979)
p text data visualization is a (lossy) filter(transmission side)
p reception (interpretation) is filtered too!2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 6
Remark 3: Lossiness & ‘Universal’ Filters
Visualization Pipelines Lossy Compression
p information is lost when messages are passed through the codec t usually by design(we already have the text-encoding)
t no lossless formal model of natural language available(yet) ‘Universal’ Filters
p as humans, we’re already equipped with a whole bevy of (lossy) filters: t linguistic(minimal attachment, semantic priming)
t perceptual(motion detection, color sensitivity)
t cognitive(object independence, causal relations)
t cultural(common knowledge, conventional signs) Lossiness ∼ ‘Distance’
p lossy filters increase “reading distance”(Moretti 2013)
p the communication channel was already fallible2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 7
Remark 4: ‘Intuitivity’ ∼ Exploitation
‘Intuitivity’
p ‘intuitive’ visualizations exploit users’ pre-existing (‘universal’) filters t perceptual size, motion, color t cognitive physical simulations, display “objects” t cultural shared conventional signs p reduced recipient processing load t “progressive disclosure” conscious focusExploitation & Coherence
p successful exploitation ⇔ coherence of pipeline- & user-filters t all and only relevant information passes unchanged through both codecs t relevance depends on user’s individual research question2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 8
Remark 5: Co-operation Transparency
Co-operation
“Make your contribution such as it is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged.” — Grice (1975)
Codec Transparency
p no perceptible data loss(e.g. mp3, ogg audio codecs)
p visualization no apprehensible (relevant) data lossVisualization as (co-operative) Communication
p Task: maximize transparency optimize for users’ common research goals p Challenges: t research goals vary widely between users, projects t commonalities can be hard to identify and formally model2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 9
Summary
Visualization Procedures
p non-modular, interface constraints(preprocessing pipelines) Visualization Pipelines
p noisy-channel filters(lossy, usually by design) ‘Universal’ Filters
p recipient-internal(perceptual, cognitive, cultural) ‘Intuitivity’
p exploitation of recipient filters(relevance, coherence) Co-operative Communication
p maximize codec transparency(minimize apprehensible loss)
— The End —
treu
lächeln
letzte
klein glücklich
liebenswürdig
freundschaftlich
Thank you for listening!
http://kaskade.dwds.de/˜jurish/visihu2017/danke
2017-07-17 / Jurish / Text Data Visualization & Codec Transparency 10
References
volume 3: Speech Acts, pages 41–58. Academic Press, 1975.
In A. Ortony, editor, Metaphor and Thought, pages 284–310. Cambridge University Press, 1979.
(3):379–423, 1948.