LinkedSpending: OpenSpending becomes Linked Open Data Konrad H - - PowerPoint PPT Presentation

linkedspending openspending becomes linked open data
SMART_READER_LITE
LIVE PREVIEW

LinkedSpending: OpenSpending becomes Linked Open Data Konrad H - - PowerPoint PPT Presentation

LinkedSpending: OpenSpending becomes Linked Open Data Konrad H offner October 5, 2013 Konrad H offner konrad.hoeffner@uni-leipzig.de Open Spending Data government spending data available to everyone what amount of money spent for which


slide-1
SLIDE 1

LinkedSpending: OpenSpending becomes Linked Open Data

Konrad H¨

  • ffner

October 5, 2013

Konrad H¨

  • ffner

konrad.hoeffner@uni-leipzig.de

slide-2
SLIDE 2

Open Spending Data

government spending data available to everyone what amount of money spent for which purpose, when and by which department demanded by the public increases accountability reduces corruption voters can make better informed decisions → strengthens democracy strengthens the government itself, more likely to commit to large projects

Konrad H¨

  • ffner

konrad.hoeffner@uni-leipzig.de

slide-3
SLIDE 3

Source Data

http://openspending.org

  • pen platform

public finance data from governments around the world more than 350 datasets, more than 17 million transactions updated regularily (about a dataset a week)

Konrad H¨

  • ffner

konrad.hoeffner@uni-leipzig.de

slide-4
SLIDE 4

Source Data

"sub -programme": { "label":"Security and safeguarding liberties", "html_url":"http:// openspending.org/eu - budget/sub -programme/security -and - safeguarding -liberties", "name":"security -and -safeguarding -liberties " }, "html_url": "http:// openspending.org/eu - budget/entries/017dfcb58d05671ef9eb5a9f77 fef39c8b14150c", "amount": 41.2

Figure : simplified excerpt from an OpenSpending entry

Konrad H¨

  • ffner

konrad.hoeffner@uni-leipzig.de

slide-5
SLIDE 5

Why Convert the Data to RDF?

Problems with source data source data is structured (database) but not semantic

  • pen, but not linked

data silo: own format, not interlinked to other knowledge bases, hard to integrate Benefits of RDF multiple ways of access for both machines and people: resolving of URIs, SPARQL, RDF dump use of Linked Open Vocabularies: common vocabulary → easier integration with other spending data semantic web infrastructure such as Question Answering

Konrad H¨

  • ffner

konrad.hoeffner@uni-leipzig.de

slide-6
SLIDE 6

Problems with Conversions

JSON API, no bulk download frequent changes errors in the data source data uses specific model for statistical observations: data cube big amount of data → performane is important to not have long waiting times common data model but datasets have different vocabulary

Konrad H¨

  • ffner

konrad.hoeffner@uni-leipzig.de

slide-7
SLIDE 7

How the problems were solved

Problems Solution JSON API custom Java program, defining JSON path expressions changes two step process: (1) download all JSON resources (2) convert JSON to RDF errors defining error rate thresholds for accept/decline

  • f datasets

data cube model use of Linked Open Vocabulary RDF data cube performance use of persistent caching different vocabulary not yet solved, needs more time (cooperation with experts, student thesis?)

Konrad H¨

  • ffner

konrad.hoeffner@uni-leipzig.de

slide-8
SLIDE 8

Outcome: LinkedSpending

Total Average number of datasets 247 filesize (RDF/N-Triples) 10 GB 41 MB triples 50 million 200 000

  • bservations

2.4 million 10 000

Table : total and average values (approximate)

available at http://linkedspending.aksw.org/1 RDF dump for access of the whole dataset public SPARQL endpoint for queries OntoWiki instance for browsing

1still under development

Konrad H¨

  • ffner

konrad.hoeffner@uni-leipzig.de

slide-9
SLIDE 9

information need SPARQL Query 1 all years which have

  • bservations

in the de-bund dataset from 2020 onwards s e l e c t d i s t i n c t ? date {?o a qb : Observation . ?o qb : dataSet l s : de−bund . ?o sdmxd : r e f P e r i o d ? date . FILTER ( xsd : date (? date ) >= ”2020−1−1”ˆˆxsd : date ) } 2 spendings of more than 100 billion e s e l e c t ∗ {?o l s o : amount ?a . ?o dbo : c u r r e n c y dbpedia : Euro . FILTER ( xsd : i n t e g e r (? a)>”1E11”ˆˆ xsd : i n t e g e r ) } 3 datasets with multiple years s e l e c t ?d count (? y ) as ? count { ?d a qb : DataSet . ?d l s o : r e f Y e a r ?y . } having ( count (? y)>1) 4 sums of amounts for each reference year of the dataset berlin de s e l e c t ?y (sum( xsd : i n t e g e r (? amount ) ) as ?sum) {?o qb : dataSet l s : b e r l i n d e . ?o l s o : r e f Y e a r ?y . ?o l s o : amount ?amou 5 datasets with curren- cies whose inflation rate is greater than 10 % s e l e c t d i s t i n c t ?d ? c ? r {?o qb : dataSet ?d . ?o dbo : c u r r e n c y ? c . ? c dbp : i n f l a t i o n R a t e ? r . f i l t e r (? r > 10)} 6 Berlin city subsectors

  • f research and educa-

tion that have had their budget reduced from 2012 to 2013 . . . Konrad H¨

  • ffner

konrad.hoeffner@uni-leipzig.de

slide-10
SLIDE 10

Konrad H¨

  • ffner

konrad.hoeffner@uni-leipzig.de