C omputa tiona l T ools a nd S tra te g ie s in E nha nc ing A c c e - - PowerPoint PPT Presentation

c omputa tiona l t
SMART_READER_LITE
LIVE PREVIEW

C omputa tiona l T ools a nd S tra te g ie s in E nha nc ing A c c e - - PowerPoint PPT Presentation

C omputa tiona l T ools a nd S tra te g ie s in E nha nc ing A c c e ss to C ultura l B ig D a ta C olle c tions Richard MARCIANO & Ryan COX marciano@umd.edu & ryan.cox@maryland.gov Director & Research Archivist Digital


slide-1
SLIDE 1

Computa tiona l T

  • ols a nd Stra te g ie s

in E

nha nc ing Ac c e ss

to Cultura l Big Da ta Colle c tions

Richard MARCIANO & Ryan COX

marciano@umd.edu & ryan.cox@maryland.gov Director & Research Archivist Digital Curation Innovation Center (DCIC) & Maryland State Archives (MSA) http://DCIC.umd.edu & http://slavery.msa.maryland.gov Tuesday, April 30, 2019 Columbus Metropolitan Library, Main Library, Columbus, Ohio

slide-2
SLIDE 2

Wha t a re Computa tiona l Stra te g ie s?

Based on the concepts of Computational Archival Science (CAS), both case studies use visualization and analytical tools to connect and display data in new

  • ways. The goal is to create transparency in cultural Big Data.

Case studies involve:

  • ! automating the detection of personally identifiable information (PII) in Japanese-

American World War II Incarceration Camps

  • ! penetrating the complex pre-Civil War slave system in Maryland
slide-3
SLIDE 3

What is CAS?

A tra nsdisc iplina ry fie ld c o nc e rne d with the a pplic a tio n o f:

!! c o mputa tio na l me tho ds a nd re so urc e s to la rg e -sc a le re c o rds / a rc hive s:

!! pro c e ssing , a na lysis, sto ra g e , lo ng -te rm pre se rva tio n, a nd a c c e ss, !! with the a im o f impro ving e ffic ie nc y, pro duc tivity a nd pre c isio n

!! in suppo rt o f a ppra isa l, a rra ng e me nt a nd de sc riptio n, pre se rva tio n a nd

a c c e ss de c isio ns, a nd e ng a g ing a nd unde rta king re se a rc h with a rc hiva l ma te ria ls.

PORTAL: http://dcicblog.umd.edu/cas/

GOAL: Explore computational treatments of archival and cultural content GOOGLE GROUP: computational-archival-science@googlegroups.com

Foundational Book Chapter: May. 2018

Book: “Advances in Librarianship – Re-Envisioning the MLIS: Perspectives on the Future of Library and Information Science Education”. Book Chapter: “Archival Records and Training in the Age of Big Data”

slide-4
SLIDE 4

The Emergence of Computational XXX’s

!! XXX=So c ia l Sc ie nc e

!!“I

nve stig a ting so c ia l a nd b e ha vio ra l re la tio nships a nd inte ra c tio ns thro ug h: so c ia l simula tio n, mo de ling , ne two rk a na lysis, a nd me dia a na lysis”, Wikipe dia

!! XXX=Biology

!!“The science of using biological data to develop algorithms or models to better understand biological systems”, Wikipedia

!! XXX=Journalism

!!“Finding and telling news stories, WITH, BY, or ABOUT algorithms”, Nick Diakopoulos

!! XXX=Archival Science ?

!!The Focus of this seminar"

slide-5
SLIDE 5

Mission

  • !

Be a leader in the digital curation research and educational fields, and foster interdisciplinary partnerships using Big Records and Archival Analytics through public / industry / government collaborations.

! !

!"#$

!%&'()*)+%,*-$"./0+1*-$#/+2,/2$ !"#$%&'&$()$*(+,##-(,.$/01/,2#)$ (,$!"#$%&'&(")'*+&,-'&#-)&.+"/$ ',!0(1'*+')2+!%*&%,'*+!")&-)&3$ 4##$+5-$&64$*+-7/8$9+-$7"#$8/7#)7$ 0#1#8+*:#,7);$ "77*;<<02(2=8+.35:03#05<2/)<$

30*)$+4$!"#5$

6,$(,7#-0()2(*8(,/->$9(#80$2+,2#-,#0$?(7"$ 7"#$/**8(2/7(+,$+9$2+:*57/7(+,/8$:#7"@ +0)$/,0$-#)+5-2#)$7+$8/-.#@)2/8#$-#2@+-0)$ </-2"(1#)$ *-+2#))(,.A$ /,/8>)()A$ )7+-@/.#A$ 8+,.@7#-:$*-#)#-1/7(+,A$/,0$/2@2#))A$?(7"$ 7"#$ /(:$ +9$ (:*-+1(,.$ #99(2(#,@2>A$ *-+0527(1(7>$ /,0$ *-#2()(+,$ (,$ )5**+-7$ +9$ /**-/()/8A$/--/,.#:#,7$/,0$0#)2-(*@7(+,A$ *-#)#-1/7(+,$ /,0$ /22#))$ 0#2()(+,)A$ /,0$ #,./.(,.$ /,0$ 5,0#-7/B(,.$ -#@)#/-2"$ ?(7"$/-2"(1/8$:/7#-(/8)3$

!"#$6%(,7+,8$9*.),2.4:$

;+/0*.7$<*./+*,%A$C3$D/->8/,0$ <*.=$>27824A$King’s College London (UK)$ ?+/=+$@2&+2(AA$C3$E-(7()"$&+85:=(/$F&/,/0/G$ <*.+*$B4)21*A$!#H/)$601/,2#0$&+:*57(,.$&#,7#-$

F!6&&G$

<+/0*2-$C(.)DA$C3$D/->8/,0$ E+--$F,72.G%%7A$C3$D/->8/,0$ H.28$I*,42,A$C3$D/->8/,0$ <*.=$!%,.*7A$I/7(+,/8$6-2"(1#)$/,0$J#2+-0)$

60:(,()7-/7(+,$FI6J6G$ >%.,J*=2$#%()0$KLLM$ %(.(7/8$$8/=$$9+-$$.-+5*$$8#/-,(,.A$ 2+88/=+-/7(1#$0#)(.,A$/,0$"/,0)@+,$0(.(7/8$25-/7(+,$ *-+K#27$0#1#8+*:#,7$FLM$)#/7)A$M$(,7#-/27(1#$ )2-##,)A$NL$?+-B)7/7(+,)$?(7"$NL!E$+9$)7+-/.#G3$ >%.,J*=2$#%()0$KLLMN$ %+25:#,7$ )2/,,(,.A (:/.#$ :/,(*58/7(+,A$ /,0$ /-2"(1/8$ (,.#)7(+,$ 9/2(8(7>$ 9+-$ .-+5*$*-+K#27)3$ ")-*,)+/$E(+-7+,8$ O,@2/:*5)$1(-75/8$:/2"(,#$9/-:$9+-$
  • #)#/-2"$0/7/$*-+2#))(,.A$)7+-/.#A$
/,0$"+)7(,.$FNP!E$)7+-/.#A$L$%#88$ )#-1#-)A$QDR/-#@*+?#-#0G3$ "&*D%,$!-%(7$ %/)"=+/-0@#,/=8#0$ 1(-75/8$$2+:*57(,.$8/=$(,$7"#$ 28+50$9+-$2-#/7(,.$R(,0+?)<C=5,75$(,)7/,2#)$5)(,.$ 6:/S+,$R#=$4#-1(2#)$F6R4G3$ F<N$!OJ2.+,P.*4).(/)(.2$ !2,)2.$*)$)02$;+12.)2/0$ E-78$ N(.(7/8$;#*+)(7+->$"7$#2/8#$Q"/7$R,1(7#)$!+:*57/@7(+,$ FQ+$ R:*-+1#$ !+88#27(+,)G;$ /$ *#7/)2/8#$ /-2"(1/8$ )7+-/.#$/,0$*-#)#-1/7(+,$-#*+)(7+->$F=/)#0$+,$7"#$ N;"#SQR!$+*#,@)+5-2#$)+97?/-#$TI+4UV$&/))/,0-/$ 0/7/=/)#W$/,0$2+:*57/7(+,/8$(,9-/)7-5275-#$FX$%#88$ ,+0#)G3$

0))':TT7/+/U(&7U27($

<+44+%,:$

E#$/$8#/0#-$(,$7"#$0(.(7/8$25-/7(+,$-#)#/-2"$/,0$#05@ 2/7(+,/8$ 9(#80)A$ /,0$ 9+)7#-$ (,7#-0()2(*8(,/->$ 2+88/=+-/@ 7(+,)$ 5)(,.$ E(.$ J#2+-0)$ /,0$ 6-2"(1/8$ 6,/8>7(2)$ ?(7"$ *5=8(2$<$(,05)7->$<$.+1#-,:#,7$*/-7,#-)"(*)3$

H%*-4:$

4*+,)+-$ (,7#-0()2(*8(,/->$ *-+K#27)$ 7"/7$ #H*8+-#$ 7"#$ (,7#.-/7(+,$+9$/-2"(1/8$-#)#/-2"$0/7/A$5)#-@2+,7-(=57#0$ 0/7/A$ /,0$ 7#2",+8+.>$ 7+$ .#,#-/7#$ ,#?$ 9+-:)$ +9$ /,/8>)()$ /,0$ "()7+-(2/8$
  • #)#/-2"$
#,./.#:#,7A$ */-7(258/-8>$(,$7"#$/-#,/)$+9$)+2(/8$K5)7(2#A$"5:/,$-(."7)A$ /,0$25875-/8$"#-(7/.#3$

<%))%:$

“Integrating Education and Research”!

  • !

Sponsor interdisciplinary projects that explore the integration of archival research data, user- contributed data, and technology to generate new forms of analysis and historical research engagements.

slide-6
SLIDE 6
slide-7
SLIDE 7

An e xa mple : Ma pping Ine qua lity – a foc us on Big Da ta [Ra c ia l Zoning ]

UMD Student Team: Mary Kendig Myeong Lee Sydney Vaile Maddie Allen Martin Almirón Jhon De La Cruz Shaina Destine Erin Durham Darlene Reyes Benjamin Sagay Richard Bool

slide-8
SLIDE 8

Historic a l Conte xt

!! Ho me Owne rs L

  • a n Co rpo ra tio n 1930’ s - 1940’ s

!! Ra te d ne ig hb o rho o ds b y ra c ia l ma ke up !! Are a s witho ut lo a ns fe ll a pa rt !! 1950’ s Urb a n Re ne wa l ta rg e te d a re a s fo r c le a ra nc e !! Re sult: Ma ss displa c e me nt !! RG195: F

e de ra l Ho me L

  • a n Ba nk Bo a rd, HOL

C,1933 - 1951

!! Co nta ins Ma ps, Ne ig hb o rho o d Surve ys, L

  • a n I

nfo rma tio n

!! NYT

ime s, Aug . 24, 2017: “Se lf-fulfilling pro phe c ie s: Ho w re dlining ’ s ra c ist e ffe c ts la ste d fo r de c a de s”

slide-9
SLIDE 9

Ma pping Ine qua lity

Do c ume nts

!! E

a c h surve y c o rre spo nde d to c ity ma p

!! Gre e n: White / We a lthy = Be st !! Blue : White / Wo rking = Still De sira b le !! Ye llo w: F

  • re ig n / I

nc re a se in Po C = De c lining

!! Re d: Bla c k a nd Hispa nic = Ha za rdo us

Co lle c tio n Sta tistic s

!! 150 Bo xe s !! Ove r 10,000 surve ys a lo ne !! 250 c itie s

slide-10
SLIDE 10
slide-11
SLIDE 11

Neighborhood description for: Boyle Heights in L.A.: * Area D-53 * “Red” area

“Boyle Heights remained one of the most heterogeneous neighborhoods in the city for decades and it was a center of Jewish, Mexican and Japanese immigrant life in the early 20th century, and also hosted large Yugoslav and Russian populations.” Wikipedia, 6/17/2011

slide-12
SLIDE 12

… …

slide-13
SLIDE 13

Mapping Inequality http://mappinginequality.net

  • U. Richmond

Virginia Tech Johns Hopkins

  • U. Maryland
slide-14
SLIDE 14
  • A. Historical Lab Notebooks

Paper-based Lab Notebooks:

  • !

Used in science research

  • !

Represent a record of:

  • observations
  • experiments
  • ideas
  • notes
  • formulas
  • data

Electronic Lab Notebooks:

  • !

patient medical records

slide-15
SLIDE 15

Learning Goals

  • !

Archival Practices

  • !

Computational Thinking Practices

  • !

Ethics and Values Considerations

slide-16
SLIDE 16
  • B. Computational Framework for Library & Archival Education

https://dcicblog.umd.edu/ComputationalFrameworkForArchivalEducation/

April 4 / 5, 2019

Motiva tion for Introduc ing Computa tiona l T hinking into L ibra ry a nd Arc hiva l Studie s Curric ulum:

!! A b a sic unde rsta nding o f the c ha ra c te ristic s o f dig ita l ma te ria ls is impo rta nt fo r future lib r. & a rc hivists. !! Arc hiva l c o lle c tio ns a re inc re a sing ly c o mpo se d o f dig ita l ma te ria ls. !! T

he to o ls a nd pra c tic e s a sso c ia te d a rc hiva l a c tivitie s a re inc re a sing ly de pe nde nt o n c o mputing .

!! T

he wa y use rs inte ra c t with a rc hiva l c o lle c tio ns re fle c ts the inc re a sing ly c o mputa tio na lly-me dia te d na ture o f o ur wo rld.

!! F

  • r to da y’ s le a rne rs to suc c e e d in future a rc hiva l ta sks, it is e sse ntia l tha t c o mputa tio na l thinking is

inc lude d a s pa rt o f the ir tra ining . Approa c h:

!! De ve lo p c o mputa tio na l thinking e nha nc e d le sso n pla ns fo r a rc hiva l to pic s tha t c o uld b e use d b y

iSc ho o l fa c ulty to intro duc e c o mputa tio na l thinking into the ir c o urse s.

!! Build a n inte rna tio na l ne two rke d c o mmunity o f iSc ho o l fa c ulty a nd L

ib ra ry a nd Arc hive s pra c titio ne rs to e ng e nde r the se c a pa b ilitie s?

slide-17
SLIDE 17

https://www.smithsonianmag.com/smithsonian-institution/how-artificial-intelligence-could-revolutionize-museum-research-180967065/

  • C. How Artificial Intelligence Could Revolutionize

Archival Museum Research (Nov. 3, 2017)

  • !

Deep learning software to help botanists

  • !

Botanical specimen categorization at museums (5 million specimens)

  • !

Two big data analytics questions:

1.! With what accuracy can a trained neural network sort mercury-stained plant specimens from clean ones? [90-94%] 2.! With what accuracy can machine learning algorithms recognize members of two similar plant families? [96-99%]

slide-18
SLIDE 18
  • D. UC Santa Barbara Library – Data Curation: March 15, 2019

!!“Re spo nding to re c e nt a nd e xc iting c ha ng e s in sc ho la rship tha t

inc lude the inc re a sing use o f dig ita l da ta a nd c o mputa tio na l me tho ds”

!! “e xpa nding a nd no ve l c ro ss-disc iplina ry da ta re use ; ne w e mpha se s o n tra nspa re nc y

a nd re pro duc ib ility; c o nc e rn fo r pre se rva tio n a nd lo ng -te rm usa b ility; a nd the rise o f ne w fo rms o f c o mmunic a tio n inc luding da ta pub lic a tio n a nd o pe n a c c e ss pub lishing .”

!!“T

he suc c e ssful c a ndida te will… suppo rt ne w re se a rc h me tho ds b e ing use d b y re se a rc he rs a s we ll a s le a d the a do ptio n o f ne w me tho ds.”

slide-19
SLIDE 19

!! Re se a rc hWo rks c o nve ne d pra c titio ne rs a nd re se a rc he rs to sha pe a re se a rc h a g e nda tha t

c ha rts e ng a g e me nt with da ta sc ie nc e a nd c o mputa tio na l me tho ds

!! Goa l: ma ke the c a se fo r b uilding ne two rks a nd pa rtne rships b e twe e n lib ra rie s a nd da ta

sc ie nc e a nd c o mputa tio na l me tho ds

!! E

.G.:

!! Ma c hine L

e a rning multiplie s c o nne c tio ns – a llo wing fo r e xpa nde d disc o ve ra b ility

!! Ne w T

  • o ls inc re a se a c c e ss to c o lle c tio ns – a llo wing fo r pro g re ss o n g lo b a l info rma tio n e q uity

!! A Ra ng e o f Me tho ds a re use d to a na lyze c o lle c tio ns a t sc a le – a llo wing fo r a c tio na b le insig hts tha t suppo rt

susta ina b ility a nd the re a liza tio n o f c o re va lue s.

!! Goa l 2: I

de ntify ke y c ha lle ng e s ma tc he d with q ue stio ns, me tho ds, a c tio ns a nd g ro unde d b y c a re fully c o nside re d e thic a l c o mmitme nts.

!! Be g inning o f a mo ve me nt?

  • E. OCLC ResearchWorks: Shaping an Applied Research Agenda

Dublin, OH, April 25-26, 2019

slide-20
SLIDE 20

F . L a unc h o f the I nte rna tio na l Re se a rc h Co lla b o ra tio n Ne two rk in Co mputa tio na l Arc hiva l Sc ie nc e (I RCN-CAS)

!!T

he ne w wa ys in whic h the pub lic a nd re se a rc he rs wish to e ng a g e with a rc hiva l ma te ria ls, a re disrupting to tra ditio na l a rc hiva l the o rie s a nd pra c tic e s.

!!T

he a pplic a tio n o f c o mputa tio na l me tho ds a nd to o ls to the a rc hiva l pro b le m spa c e ne e ds to b e furthe r e xplo re d.

!!T

he c o nte xtua liza tio n o f re c o rds a lso ne e ds to b e e xplo re d ,whe the r thro ug h:

!! c a pturing me ta da ta , !! e nha nc ing re c o rds b y se ma ntic ta g g ing , !! linking re c o rds with o the r re c o rds,

https://computationalarchives.net/ Datathon at TNA (UK): Jun. 20-21, 2019 Datathon at MSA (US): Oct. 28-29, 2019 Research Symposium at TNA (UK): Jan. 2020 F e b. 1, 2019 – Ja n. 31, 2020: UMD, KCL & MSA, T NA

slide-21
SLIDE 21
  • 1. Computational Thinking (CT)
  • 2. CT-Archives Mapping to Archives

* Japanese American WWW Incarceration Camps

  • 3. Publishing / Sharing Computational Stories Using Digital Notebooks

REST OF THE TALK

slide-22
SLIDE 22
  • 1. Wha t is Computa tiona l T

hinking ?

“To reading, writing, and arithmetic, we should add computational thinking to every child’s analytical ability.” (Wing, 2006)

“" a form of problem solving that uses modeling, decomposition, pattern recognition, abstraction, algorithm design, and scale”

slide-23
SLIDE 23

“Computational thinking is the thought processes involved in formulating problems and their solutions so that the solutions are represented in a form that can effectively be carried out by an information- processing agent.”

(Cuny, Snyder & Wing, 2010)

slide-24
SLIDE 24

CT

  • ST

E M Pra c tic e s T a xo no my

  • Da vid We intro p

!"#$%&'(")'*+,-.)/.)0+.)+1,23++ 45'6(678+,'9")"#:+

Modeling & Simulation Practices Data Practices Computational Problem Solving Practices Systems Thinking Practices

;)<78(0'()0+'+!"#$*79+ 1:8&7#+'8+'+=-"*7+ >)?758&')?.)0+&-7+ @7*'(")8-.$8+A.&-.)+'+ 1:8&7#+ ,-.)/.)0+.)+B7<7*8+ !"##%).6'()0+ ;)C"5#'(")+'D"%&+'+ 1:8&7#+ E7F).)0+1:8&7#8+')?+ 3')'0.)0+!"#$*79.&:+ !"**76()0+E'&'+ !57'()0+E'&'+ 3').$%*'()0+E'&'+ G)'*:H.)0+E'&'+ I.8%'*.H.)0+E'&'+ 457$'5.)0+45"D*7#8+C"5+ !"#$%&'(")'*+1"*%(")8+ 45"05'##.)0+ !-""8.)0+2J76(<7+ !"#$%&'(")'*+,""*8+ G88788.)0+E.J757)&+ G$$5"'6-78K1"*%(")8+&"+'+ 45"D*7#+ E7<7*"$.)0+3"?%*'5+ !"#$%&'(")'*+1"*%(")8++ !57'()0+!"#$%&'(")'*+ GD8&5'6(")8+ ,5"%D*78-""()0+')?+ E7D%00.)0+ >8.)0+!"#$%&'(")'*+ 3"?7*8+&"+>)?758&')?+ '+!")67$&+ >8.)0+!"#$%&'(")'*+ 3"?7*8+&"+L.)?+')?+,78&+ 1"*%(")8+ G88788.)0+ !"#$%&'(")'*+3"?7*8+ E78.0).)0+ !"#$%&'(")'*+3"?7*8+ !")8&5%6()0+ !"#$%&'(")'*+3"?7*8+

slide-25
SLIDE 25
  • 3. Automa ting the De te c tion of Pe rsona lly Ide ntifia ble Informa tion (PII)

in Ja pa ne se - Ame ric a n WWII Inc a rc e ra tion Ca mps

Richard Marciano William Underwood

slide-26
SLIDE 26

https://www.nationalgeographic.com/magazine/2018/10/japanese-internment-then-now-portraits/

slide-27
SLIDE 27

Tule Lake Camp in Northern California. Pomona, CA

(Assembly Camp, LA Fairgrounds, racetrack, stables)

Arcadia, CA

(Assembly Camp, Santa Anita Racetrack, stables)

Tule Lake, CA

(Incarceration Camp)

slide-28
SLIDE 28

The records of the WRA (Record Group 210 from 1941-47) at the National Archives in Washington D.C. and Maryland, are comprised

  • f over 100 series with motion picture films,

drawings of incarceration centers, photos, maps, correspondence, yearbooks, rosters, etc. Series 51 & 52 have immense value for survivors of the camps, their families, and historians, yet they are still not accessible. Series 51, the “Internal Security Case Reports” from 1942 to 1946, comprises narrative reports prepared by camp investigators, police officers, and directors of internal security, relating cases

  • f alleged “disorderly conduct, rioting, seditious

behavior,” etc. at each of the 10 camps, with detailed information on the names and addresses in the camps of the persons involved, the time and place where the alleged incident occurred, an account of what happened, and a statement of action taken by the investigating officer.

Automa ting the De te c tion of Pe rsona lly Ide ntifia ble Informa tion (PII) in Inde x Ca rds to Inte rna l Se c urity Ca se Re ports

slide-29
SLIDE 29
slide-30
SLIDE 30

Graph Database Modeling and Visualization: Movements of Satsuki Ina’s Family within the Camp System Graph Database Modeling and Visualization: People Deported to Tule Lake Clustered by City of Origin Interactive Map of Tule Lake

Spring 2015

slide-31
SLIDE 31
  • U. Ma ryla nd Colle g e of Informa tion Studie s Stude nt T

e a m

MOU w. NARA

Carl Apgar Luis Beteta Waleed Falak Marisa Gilman Riss Hardcastle Keona Holden Yun Huang David Baasch Brittni Ballard Tricia Glaser Adam Gray Leigh Plummer Zeynep Diker Mayanka Jha Aakanksha Singh Namrata Walanj Fall 2017 – Spring 2018

slide-32
SLIDE 32

A. B. C. D. E. G. H. I. F.

slide-33
SLIDE 33
  • A. Cre a ting Da ta

“The increasingly computational nature of working with data in” archival science “underscores the importance of developing computational thinking practices in the classroom.” “Part of the challenge is teaching students that answers are drawn from the data available.” “In many cases” archivists “use computational tools to generate data! at scales that would otherwise be impossible.”

slide-34
SLIDE 34

!"#$%&"'(% )*+#$%&"'(% ,*+$-%.("+% /+*0*1"2% 3$"$(% 4(15(+% ,*+$-% 62"7(% )"'*28%&9% :15*;*5<"2% &9% )*2(%&<'=(+% >##('=28%?(1$(+%

>,@% )A>&B% CDCE% ?>% F% ?>% GHEIJ% GHEIJ>% GEKCLI% &91(% >,@% )A>&B% CDHE% ?>% F% ?>% GHEIJ% GHEIJ% GECKKK% &91(% >,@% )A>&B% CDEL% ?>% F% M9192<2<% 79<1$8% KIEL% EKIEL>% DLEJKN% )+(#19% >,@% )A>&B% CDCN% ?>% F% /+(091% CKELE% CKELE,% KELLNI% 3"1$"%>1*$"% >,@% )A>&B% CKKC% ?>% F% 3"O-"2*1% NHHGH% NHHGH?% GEJKIL% &91(%

P% !"#$% )*+#$% )"'*28% &9% 3(Q% ,*+$-% ?*$*R(1#-*S% >2*(1% P% @1$+8% @1$+8% T"$(% 6+(U (;"7<"$*91% >55+% V8S(%9W% )*1"2% T(S"+$<+(% T"$(%9W% )*1"2% T(S"+$<+(% T(#$*1"$*91%9W% )*1"2% T(S"+$<+(%

DI% >,@% V9'9% )+"1O% CKELE% F% GUCUCN% A% U% 4+"1"5"X% ?/% DUCKUHN% !9#%>10(2(#X% ?>% T(S$Y%9W% Z<#$*7(% :1$(+1'(1$% IUGHUHL% 3"1$"%)(X%&F%

Final Accountablity Rosters (FAR) WRA Form 26 register

“Japanese-American Internee Data File” NARA AAD

Box 8 -- #269

slide-35
SLIDE 35
  • B. Ma nipula ting Da ta

“Computational tools make it possible to efficiently and reliably manipulate large and complex” archival holdings. “Data manipulation includes sorting, filtering, cleaning, normalizing, and joining disparate datasets.”

slide-36
SLIDE 36
  • C. Ana lyzing Da ta

!! We use d NE

R so ftwa re to e xtra c t me ta da ta fro m the inc ide nt c a rds. T his wa s do ne with the o pe n so urc e GAT E . T his is b a se d o n pa tte rn ma tc hing thro ug h re c o g nitio n rule s. T he ma tc hing rule s a re o fte n re fine d thro ug h ite ra tive tuning .

!! F

  • r e xa mple , a rule fo r re c o g nizing a pe rso n’ s na me wo uld b e b a se d o n a la stna me ,

fo llo we d b y a c o mma , fo llo we d b y a Ja pa ne se firstna me , fo llo we d b y a n Ang lo first na me in pa re nthe se s. As we pro c e ss a dditio na l c a rds we wo uld no te tha t the re a re

  • the r style s o f na me s, so the pa tte rn wo uld b e g e ne ra lize d a c c o unt fo r stylistic

va ria tio ns. I f the pa tte rn is ma de to b e ro b ust e no ug h it will e ve ntua lly wo rk o n a ll o f the insta nc e s o f na me s.

!! GAT

E , Ge ne ra l Arc hite c ture fo r T e xt Pro c e ssing , https:/ / g a te .a c .uk/

“There are many strategies that can be employed when analyzing data for use in” an archival context, “including looking for patterns or anomalies, defining rules to categorize data, and identifying trends and correlations.”

slide-37
SLIDE 37
  • D. Visua lizing Da ta

“Communicating results is an essential component of” understanding archival data “and computational tools can greatly facilitate that process. Tools include both conventional visualizations such as graphs and charts, as well as dynamic, interactive displays.”

Box 8 WRA Form 26 FAR Tule LAke

slide-38
SLIDE 38

E . De sig ning Computa tiona l Mode ls

“The ability to create, refine, and use models of phenomena is a central practice.” “Models can include flowcharts and diagrams.” “Part of taking advantage of computational power! is designing new models that can be run on a computational device.” “There are many reasons that might motivate designing a computational model, including wanting to better understand a phenomenon under investigation, to test out a hypothesis.” “Students! will be able to define the components of the model, describe how they interact, decide what data will be produced by the model.”

slide-39
SLIDE 39

F . Construc ting Computa tiona l Mode ls

“An important practice is the ability to create new or extend existing computational models. This requires being able to encode the model features in a way that a computer can interpret.”

slide-40
SLIDE 40
  • G. Compute r Prog ra mming

“Enabling students to explore” archival problems “using computational problem solving practices such as programming, algorithm development, and creating computational abstractions.” “The ability to encode instructions in such a way that a computer can execute them is a powerful skill for investigating” archival problems. Programs include ten-line Python scripts.”

slide-41
SLIDE 41
  • H. De ve loping Modula r Computa tiona l Solutions

!! We ma ke use o f a b stra c tio n a nd func tio na l pro g ra mming thro ug h the use o f mo dula r

c o mpo ne nts suc h a s:

!!PI I _Date Che c k(), !!F ORM26_lo o kup(), a nd !!F AR_lo o kup().

!!

T his a llo ws fo r re usa b le c hunks o f c o de tha t c a n b e te ste d lo c a lly. T he la rg e r pro g ra m is the c o mpo sitio n o f the se mo dule s, whic h ma ke s it b o th mo re re a da b le a nd ma inta ina b le .

“When working toward a specific” archival “outcome, there are often a number of steps or components involved in the process; these steps, in turn, can be broken down in a variety of ways that impact their ability to be easily reused, repurposed, and debugged. Developing computational solutions in a modular, reusable way has many implications. By developing modular solutions, it is easier to incrementally construct solutions, test components independently, and increase the likelihood that components will be useful for future problems.”

slide-42
SLIDE 42
  • I. T

rouble shooting a nd De bug g ing

!! T

  • fa c ilita te g ro up de b ug g ing , we use a n inte ra c tive se rve r-b a se d sha re d ve rsio n o f

Jupyte r No te b o o k.

!! “T he Jupyte r No te bo o k is an o pe n-so urc e we b applic atio n that allo ws yo u to c re ate and share do c ume nts that c o ntain live c o de , e quatio ns, visualizatio ns and narrative te xt. U se s inc lude : data c le aning and transfo rmatio n, nume ric al simulatio n, statistic al mo de ling, data visualizatio n, mac hine le arning, and muc h mo re .”

!! Jupyte r No te b o o k Do c ume nta tio n, se e :

https:/ / me dia .re a dthe do c s.o rg / pdf/ jupyte r-no te b o o k/ la te st/ jupyte r-no te b o o k.pdf

!! Pro je c t Jupyte r, se e : http:/ / jupyte r.o rg /

“Troubleshooting broadly refers to the process of figuring out why something is not working or behaving as expected. There are a number of strategies one can employ while troubleshooting a problem, including clearly identifying the issue, systematically testing the system to isolate the source of the error, and reproducing the problem so that potential solutions can be tested reliably.”

slide-43
SLIDE 43

See: https://cases.umd.edu Educators are rapidly adopting Jupyter Notebooks for: * teaching * use in the classroom * developing teaching materials * creating computational stories

  • 5. (Jupyter) Digital Notebooks
slide-44
SLIDE 44
  • 3. Opportunities for Collaboration
  • 1. Best Practices Exchange (BPE)

Computational Tools and Strategies in Enhancing Access to Cultural Big Data Collections

  • April 30, 2019, Columbus, Ohio

https://bpexchange.wordpress.com/2019-conference/

  • 2. Records Management Journal – Themed call for papers ß---------------------------------------------------------------

‘Technology and records management: disrupt or be disrupted?’

  • Extended: May 1, 2019
  • Full paper submitted: September 1, 2019
  • Review, revision and final acceptance: January 31, 2020
  • 3. AERI (Archival Education and Research Institute) 2019

Developing a Computational Curriculum Framework for Archival Education

  • July 11, 2019, Liverpool, UK
  • 4. ARA (Archives & Records Association – UK & Ireland) Conference

Shaping Digital Recordkeeping Competence

  • August 28, 2019, Leeds, UK

https://conference.archives.org.uk

  • 5. Computational Archival Science Workshop (CAS#4) ß----------------------------------------------------------------------

IEEE Big Data 2018, December 9-12, Los Angeles http://dcicblog.umd.edu/cas/ieee-big-data-2018-3rd-cas-workshop/

http://dcic.umd.edu/technology-and-records-management-disrupt-or-be-disrupted-records-management-journal-themed-call-for-papers/

https://aeri2019.com/

slide-45
SLIDE 45

CONTACTS marciano@umd.edu @umdDCIC http://dcicblog.umd.edu/cas/ computational-archival-science@googlegroups.com