Documenting and preserving programming languages and software in - - PowerPoint PPT Presentation

documenting and preserving programming languages and
SMART_READER_LITE
LIVE PREVIEW

Documenting and preserving programming languages and software in - - PowerPoint PPT Presentation

Documenting and preserving programming languages and software in Wikidata John Samuel, Katherine Thornton, Kenneth Seals-Nutt CPE Lyon, EaaSI SWIB 2018, Bonn, 27 th November, 2018 Digital Preservation | John Samuel, Katherine Thornton 1 | >


slide-1
SLIDE 1

Digital Preservation | John Samuel, Katherine Thornton

1 | >

Documenting and preserving programming languages and software in Wikidata

John Samuel, Katherine Thornton, Kenneth Seals-Nutt

CPE Lyon, EaaSI SWIB 2018, Bonn, 27th November, 2018

slide-2
SLIDE 2

Digital Preservation | John Samuel, Katherine Thornton

2 < | >

Programming Languages

English Wikipedia Infoboxes of Programming Languages

slide-3
SLIDE 3

Digital Preservation | John Samuel, Katherine Thornton

3 < | >

Programming Languages

Programming Languages with the most multilingual labels

slide-4
SLIDE 4

Digital Preservation | John Samuel, Katherine Thornton

4 < | >

Programming Languages

Programming Language Paradigms

slide-5
SLIDE 5

Digital Preservation | John Samuel, Katherine Thornton

5 < | >

Programming Languages

Programming Languages with the most number of different paradigms

slide-6
SLIDE 6

Digital Preservation | John Samuel, Katherine Thornton

6 < | >

Programming Languages

Programming languages with most number of multilingual Wikpedia articles

slide-7
SLIDE 7

Digital Preservation | John Samuel, Katherine Thornton

7 < | >

Programming Languages

Wikipedia languages with the most number of articles on programming languages

slide-8
SLIDE 8

Digital Preservation | John Samuel, Katherine Thornton

8 < | >

Programming Languages

Languages with the most number of labels of programming languages

slide-9
SLIDE 9

Digital Preservation | John Samuel, Katherine Thornton

9 < | >

Software

English Wikipedia Infoboxes of Software

slide-10
SLIDE 10

Digital Preservation | John Samuel, Katherine Thornton

10 < | >

Software

Software with the most number of labels on Wikidata

slide-11
SLIDE 11

Digital Preservation | John Samuel, Katherine Thornton

11 < | >

Software

Software with the most number of articles on Wikipedia

slide-12
SLIDE 12

Digital Preservation | John Samuel, Katherine Thornton

12 < | >

Software

Languages with the most number of articles on Wikipedia

slide-13
SLIDE 13

Digital Preservation | John Samuel, Katherine Thornton

13 < | >

Software

Languages with the most number of Software labels on Wikidata

slide-14
SLIDE 14

Digital Preservation | John Samuel, Katherine Thornton

14 < | >

Operating Systems

English Wikipedia Infoboxes of Operating Systems

slide-15
SLIDE 15

Digital Preservation | John Samuel, Katherine Thornton

15 < | >

slide-16
SLIDE 16

Digital Preservation | John Samuel, Katherine Thornton

16 < | >

Digital Preservation

Digital Preservation OPF Software Heritage EaaSI

slide-17
SLIDE 17

Digital Preservation | John Samuel, Katherine Thornton

17 < | >

Wikidata

Wikidata Started in 2012 is free, open, linked, structured, collaborative and multilingual knowledge base From multi-(sub)domain multilingual Wikipedia sites to a single-domain multilingual website Collaborative Multilingual Multi-domain Ontology development

slide-18
SLIDE 18

Digital Preservation | John Samuel, Katherine Thornton

18 < | >

Wikipedia to Wikdiata

EN FR DE IT NL ES HI ML

Importing structured data from Wikipedia Infoboxes to Wikidata

slide-19
SLIDE 19

Digital Preservation | John Samuel, Katherine Thornton

19 < | >

Wikdiata to Wikipedia

EN FR DE IT NL ES HI ML

Exporting data from Wikidata to multiple multilingual Wikipedia articles

slide-20
SLIDE 20

Digital Preservation | John Samuel, Katherine Thornton

20 < | >

Wikipedia Infobox Properties

Existing English Wikipedia Infobox Properties of Programming Languages

slide-21
SLIDE 21

Digital Preservation | John Samuel, Katherine Thornton

21 < | >

Wikidata

Wikidata entry of Python Programming Language (labels)

slide-22
SLIDE 22

Digital Preservation | John Samuel, Katherine Thornton

22 < | >

Wikidata Properties

Wikidata entry of Python Programming Language (property values)

slide-23
SLIDE 23

Digital Preservation | John Samuel, Katherine Thornton

23 < | >

Wikidata Properties

Example of Wikidata Property

slide-24
SLIDE 24

Digital Preservation | John Samuel, Katherine Thornton

24 < | >

Wikidata Properties

Proposition

(with possible translations)

Discussion Voting Creation Translation Usage Proposition to Delete Deletion

Property Creation on Wikidata

slide-25
SLIDE 25

Digital Preservation | John Samuel, Katherine Thornton

25 < | >

Wikidata Projects

Example Wikidata WikiProject

slide-26
SLIDE 26

Digital Preservation | John Samuel, Katherine Thornton

26 < | >

Wikidata Projects

Example Wikidata WikiProject and Property Suggestions

slide-27
SLIDE 27

Digital Preservation | John Samuel, Katherine Thornton

27 < | >

Tools: Histropedia

Timeline of Programming Languages

http://histropedia.com/timeline/d98rtpg9bg0t/Programming-languages

slide-28
SLIDE 28

Digital Preservation | John Samuel, Katherine Thornton

28 < | >

Status of software data

Wikidata 85,000 destop applications research software FLOSS

slide-29
SLIDE 29

Digital Preservation | John Samuel, Katherine Thornton

29 < | >

Licenses approved by the Free Software Foundation

Licenses approved by the Free Software Foundation by count of software titles available under each

slide-30
SLIDE 30

Digital Preservation | John Samuel, Katherine Thornton

30 < | >

UNIX utilities

Some unix utilities have their own identifiers in the LoC Name Authority File or in the GND

slide-31
SLIDE 31

Digital Preservation | John Samuel, Katherine Thornton

31 < | >

Deutsches Forschungsnetz

Software developed by members of Deutsches Forschungsnetz

slide-32
SLIDE 32

Digital Preservation | John Samuel, Katherine Thornton

32 < | >

File format items

File format items that have a LoC FDD identifier, along with all other identifiers

slide-33
SLIDE 33

Digital Preservation | John Samuel, Katherine Thornton

33 < | >

Wikidata for Digital Preservation

Wikidata Inspired by WikiGenomes Streamlined interface Property checklists tailored to digital preservation Specialty searches (PUID, mimetype)

slide-34
SLIDE 34

Digital Preservation | John Samuel, Katherine Thornton

34 < | >

Wikidata for Digital Preservation

Development Team Kenneth Seals-Nutt: software engineer Katherine Thornton: Data curation, data models, SPARQL queries Carl Wilson: technical mentor Euan Cochrane: digital preservation program of work

slide-35
SLIDE 35

Digital Preservation | John Samuel, Katherine Thornton

35 < | >

WikiGenomes

wikigenomes.org

slide-36
SLIDE 36

Digital Preservation | John Samuel, Katherine Thornton

36 < | >

Role of Portals

About 5,000 properties in Wikidata Data models are not pre-defined Portal has a domain-specific property checklist

slide-37
SLIDE 37

Digital Preservation | John Samuel, Katherine Thornton

37 < | >

Technologies

Python Flask SPARQL Wikidata Integrator MediaWiki API

slide-38
SLIDE 38

Digital Preservation | John Samuel, Katherine Thornton

38 < | >

WikiDP.org

Screenshot of search results in the WikiDP portal

slide-39
SLIDE 39

Digital Preservation | John Samuel, Katherine Thornton

39 < | >

WDProp

WDProp: Collaborative Multilingual Multi-domain Ontology development: is it possible to achieve a truly multilingual experience? 1. Goals: Understanding Wikidata property proposal, creation and translation Available templates and their usage Providing real-time statistics to (multilingual) contributors 2.

slide-40
SLIDE 40

Digital Preservation | John Samuel, Katherine Thornton

40 < | >

WDProp

Information on Wikidata Properties

slide-41
SLIDE 41

Digital Preservation | John Samuel, Katherine Thornton

41 < | >

WDProp

WDProp Get real-time translation statistics Navigate supported languages, properties, datatypes, classes Compare translation statistics Find available properties for an entity Uses Wikidata SPARQL endpoints and Mediawiki API URL https://tools.wmflabs.org/wdprop

slide-42
SLIDE 42

Digital Preservation | John Samuel, Katherine Thornton

42 < | >

Conclusion

Digital Heritage Wikidata: Multilingual, Structured Knowledge Base Need for Digital Preservation Digital Preservation on Wikidata Community participation: Property proposition, translation and item description Tools using SPARQL endpoints and/or MediaWiki API

slide-43
SLIDE 43

Digital Preservation | John Samuel, Katherine Thornton

43 < | >

Tools and Projects

Tools Wikidata SPARQL query endpoint MediaWiki API Wikidata Integrator Histropedia wdtaxonomy WDProp WikiDP WikiDP Portal (Github)

slide-44
SLIDE 44

Digital Preservation | John Samuel, Katherine Thornton

44 < | >

Tools and Projects

WikiProjects WikiProjects WikiProject Informatics WikiProject Informatics/Programming Language WikiProject Informatics/Software/Properties WikiProject Informatics/Operating System

slide-45
SLIDE 45

Digital Preservation | John Samuel, Katherine Thornton

45 < | >

References

Kaffee, L. A., Piscopo, A., Vougiouklis, P., Simperl, E., Carr, L., & Pintscher, L. (2017, August). A glimpse into Babel: an analysis of multilinguality in Wikidata. In Proceedings of the 13th International Symposium on Open Collaboration (p. 14). ACM. 1. Müller-Birn, C., Karran, B., Lehmann, J., & Luczak-Rösch, M. (2015, August). Peer-production system or collaborative ontology engineering effort: What is Wikidata?. In Proceedings of the 11th International Symposium on Open Collaboration (p. 20). ACM. 2. Samuel, J. (2017) Collaborative Approach to Developing a Multilingual Ontology: A Case Study of

  • Wikidata. In : Research Conference on Metadata and Semantics Research. Springer, Cham, 2017.
  • p. 167-172.

3. Samuel, J. (2018). Towards Understanding and Improving Multilingual Collaborative Ontology Development in Wikidata. In: WikiWorkshop 2018 4. Thornton, K., Cochrane E., Ledoux T. (2017). Modeling the Domain of Digital Preservation in Wikidata . In: iPRES 2017 5.

slide-46
SLIDE 46

Digital Preservation | John Samuel, Katherine Thornton

46 < | >

Thank you

slide-47
SLIDE 47

Digital Preservation | John Samuel, Katherine Thornton

47 < | >

SPARQL Query

Programming paradigms with the count of programming languages

SELECT ?paradigmLabel (count(?prog) as ?count) { ?prog wdt:P31 wd:Q9143; wdt:P3966 ?paradigm. SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } GROUP by ?paradigmLabel HAVING (?count>1)

slide-48
SLIDE 48

Digital Preservation | John Samuel, Katherine Thornton

48 < | >

SPARQL Query

Programming languages with the count of programming paradigm

SELECT ?progLabel (count(?paradigm) as ?count) { ?prog wdt:P31 wd:Q9143; wdt:P3966 ?paradigm. SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } GROUP by ?progLabel HAVING (?count>2)

slide-49
SLIDE 49

Digital Preservation | John Samuel, Katherine Thornton

49 < | >

SPARQL Query

Programming languages with the count of multilingual labels

SELECT ?languageLabel (count(?label) as ?count) { { SELECT DISTINCT ?languageLabel ?label (lang(?label) as ?langLabel) { ?language wdt:P31/wdt:P279* wd:Q9143; rdfs:label ?label. SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } } } GROUP by ?languageLabel HAVING (?count > 50) ORDER by DESC(?count)

slide-50
SLIDE 50

Digital Preservation | John Samuel, Katherine Thornton

50 < | >

SPARQL Query

Software with the count of multilingual labels

SELECT ?softwareLabel (count(?label) as ?count) { { SELECT DISTINCT ?softwareLabel ?label (lang(?label) as ?langLabel) { ?software wdt:P31/wdt:P279 wd:Q7397; rdfs:label ?label. SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } } } GROUP by ?softwareLabel HAVING (?count > 40) ORDER by DESC(?count)

slide-51
SLIDE 51

Digital Preservation | John Samuel, Katherine Thornton

51 < | >

SPARQL Query

Programming language with the count of multilingual labels

SELECT ?langLabel (count(?language) as ?count) { { SELECT DISTINCT (lang(?label) as ?langLabel) ?language { ?language wdt:P31/wdt:P279* wd:Q9143; rdfs:label ?label. } } } GROUP by ?langLabel ORDER by DESC(?count)

slide-52
SLIDE 52

Digital Preservation | John Samuel, Katherine Thornton

52 < | >

SPARQL Query

Language with the count of software labels

SELECT ?langLabel (count(?software) as ?count) { { SELECT DISTINCT (lang(?label) as ?langLabel) ?software { ?software wdt:P31/wdt:P279* wd:Q7397; rdfs:label ?label. } } } GROUP by ?langLabel ORDER by DESC(?count)

slide-53
SLIDE 53

Digital Preservation | John Samuel, Katherine Thornton

53 < | >

SPARQL Query

Languages with the count of Wikipedia articles on programming languages

SELECT DISTINCT ?languageLabel ?sitelinks { ?language wdt:P31/wdt:P279* wd:Q9143; wikibase:sitelinks ?sitelinks. FILTER(?sitelinks > 20) SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } ORDER by DESC(?sitelinks)

slide-54
SLIDE 54

Digital Preservation | John Samuel, Katherine Thornton

54 < | >

SPARQL Query

Languages with the count of Wikipedia articles on software

SELECT DISTINCT ?softwareLabel ?sitelinks { ?software wdt:P31/wdt:P279* wd:Q7397; wikibase:sitelinks ?sitelinks. FILTER(?sitelinks > 100) SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } ORDER by DESC(?sitelinks)

slide-55
SLIDE 55

Digital Preservation | John Samuel, Katherine Thornton

55 < | >

SPARQL Query

Languages with the count of Wikipedia articles on programming languages

SELECT ?lang (count(?progLanguage) as ?count) { { SELECT DISTINCT ?progLanguage ?lang { ?progLanguage wdt:P31/wdt:P279* wd:Q9143. [] schema:about ?progLanguage; schema:inLanguage ?lang. } } } GROUP BY ?lang ORDER BY DESC(?count)

slide-56
SLIDE 56

Digital Preservation | John Samuel, Katherine Thornton

56 < | >

SPARQL Query

Languages with the count of Wikipedia articles on Software

SELECT ?lang (count(?software) as ?count) { { SELECT DISTINCT ?software ?lang { ?software wdt:P31/wdt:P279* wd:Q7397. [] schema:about ?software; schema:inLanguage ?lang. } } } GROUP BY ?lang ORDER BY DESC(?count)

slide-57
SLIDE 57

Digital Preservation | John Samuel, Katherine Thornton

57 < | >

SPARQL Query

Licenses approved by the Free Software Foundation by count of software titles available under each

SELECT ?item ?itemLabel (COUNT(DISTINCT ?software) AS ?count) WHERE { ?software (wdt:P31/wdt:P279*) wd:Q7397. ?software wdt:P275 ?item. ?item wdt:P790 wd:Q48413. SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } GROUP BY ?item ?itemLabel ORDER BY DESC(?count)

slide-58
SLIDE 58

Digital Preservation | John Samuel, Katherine Thornton

58 < | >

SPARQL Query

UNIX utilities with identifiers in the LoC Name Authority File or in the GND

SELECT ?item ?itemLabel ?LCNAF ?GND WHERE { ?item wdt:P31 wd:Q18343316. OPTIONAL {?item wdt:P244 ?LCNAF}. OPTIONAL {?item wdt:P227 ?GND}. SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } }

slide-59
SLIDE 59

Digital Preservation | John Samuel, Katherine Thornton

59 <

SPARQL Query

Software developed by members of Deutsches Forschungsnetz

SELECT ?member ?memberLabel ?software ?softwareLabel WHERE { ?member wdt:P463 wd:Q2514863. ?software wdt:P178 ?member. SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } }