Science 2.0 VU Big Science, e-Science and E- Infrastructures + - - PowerPoint PPT Presentation

science 2 0 vu
SMART_READER_LITE
LIVE PREVIEW

Science 2.0 VU Big Science, e-Science and E- Infrastructures + - - PowerPoint PPT Presentation

www.tugraz.at n W I S S E N n T E C H N I K n L E I D E N S C H A F T Science 2.0 VU Big Science, e-Science and E- Infrastructures + Bibliometric Network Analysis Elisabeth Lex KTI, TU Graz


slide-1
SLIDE 1

www.tugraz.at n W I S S E N n T E C H N I K n L E I D E N S C H A F T

u www.tugraz.at

Science 2.0 VU

Big Science, e-Science and E- Infrastructures + Bibliometric Network Analysis

WS 2015/16 Elisabeth Lex KTI, TU Graz

slide-2
SLIDE 2

www.tugraz.at n

Agenda

  • Repetition from last time: altmetrics / altmetrics in

practice

  • Big Data and Science
  • E-Science
  • E-Infrastructures
  • Bibliometric Network Analysis
  • Your Assignment!

2

slide-3
SLIDE 3

www.tugraz.at n

Altmetrics (repetition)

„Altmetric is the creation and study of new metrics based on the Social Web for analyzing and informing scholarship“

  • Altmetrics Manifesto, http://altmetrics.org/about
  • Aggregated from many sources (e.g. Twitter,

Mendeley, github, slideshare,...)

  • Article Level Metrics (ALM)
  • multidimensional suite of transparent and established metrics at

article level

3

slide-4
SLIDE 4

www.tugraz.at n

Examples for Altmetrics sources (repetition)

  • Usage
  • Views, downloads,..
  • Captures
  • Bookmarks, readers,..
  • Mentions
  • Blog posts, news stories, Wikipedia articles,

comments, reviews

  • Social Media
  • Tweets, Google+, Facebook likes, shares, ratings
  • Citations
  • Web of Science, Scopus, Google Scholar,...

4

slide-5
SLIDE 5

www.tugraz.at n

Examples: Altmetric.com

5

Source: http://www.altmetric.com/details.php?domain=www.altmetric.com&citation_id=843656

slide-6
SLIDE 6

www.tugraz.at n

Lessons learned (repetition)

  • Alternative ways to assess impact of various

scientific outputs

  • No common understanding of altmetrics yet
  • What do they really express?
  • Are they useful and for which part of the research

process?

  • Not necessarily „better“ metrics
  • E.g. Gamification
  • Can help to get an overview of a research field
  • Visualizations based on altmetrics

6

slide-7
SLIDE 7

www.tugraz.at n

Modern Science: What has changed?

  • 150 years later: Searching for new particles like

Higgs boson with the Large Hadron Collider

  • Built in collaboration with over 10,000 scientists and

engineers from over 100 countries, hundreds of universities and laboratories. In a tunnel of 27 km in circumference,175 m deep, near Geneva

7

slide-8
SLIDE 8

www.tugraz.at n

Motivation

  • Internet and science disciplines (e.g. physical

sciences, biological sciences, medicine, and engineering) generate large and complex datasets (Big Data)

  • require more advanced database and architectural

support

  • „New kind of research methodology“ has emerged

(fourth paradigm of scientific exploration (Hey, 2007)

  • based on statistical exploration of big amounts of

data

8

http://www.ksi.mff.cuni.cz/astropara/

slide-9
SLIDE 9

www.tugraz.at n

Data intensive scientific discovery

9

http://research.microsoft.com/en-us/collaboration/fourthparadigm/4th_paradigm_book_complete_lr.pdf

slide-10
SLIDE 10

www.tugraz.at n

Example: Big Data in Science - European Exascale Projects

10

http://exascale-projects.eu

Exascale computing: computers capable of at least one exaflops (1018 floating point operations per second) à Not yet achieved, currently 1015

slide-11
SLIDE 11

www.tugraz.at n

Publications as Big Data

11

Cross- Journal Recommen- dation based on Click Streams

[Bollen et al., 2009]

slide-12
SLIDE 12

www.tugraz.at n

e-Science

  • Large scale science (since 1999)
  • Data-driven discovery
  • Focus on computationally intensive science and how

to tackle it using highly distributed environments in collaborative manner

  • Powerful computers: Supercomputers, High

Performance Computing (HPC), Grid,…

  • Distributed Computing
  • Powerful research infrastructures – “e-infrastructures”,

grids, clouds

12

http://www.anandtech.com/show/6421/inside-the-titan-supercomputer-299k-amd-x86-cores-and-186k-nvidia-gpu-cores/3

slide-13
SLIDE 13

www.tugraz.at n

Supercomputers

13

http://www.top500.org/lists/2014/06/

http://www.wikihow.com/Build-a-Supercomputer

  • large, expensive

systems, usually housed in a single room, in which multiple processors are connected by fast local network

  • Suited for highly

complex, real-time applications and simulation Pros: data can move between processors rapidly àall processors can work together on same tasks Cons: expensive to build and

  • maintain. Do not scale well,

e.g. adding more processors is challenging

slide-14
SLIDE 14

www.tugraz.at n

Distributed Computing

  • systems in which processors are not necessarily

located in close proximity to one another—and can even be housed on different continents—but which are connected via the Internet or other networks

14

  • Pros: relative to supercomputers much

less expensive.

  • Cons: less speed achieved than with

supercomputers

slide-15
SLIDE 15

www.tugraz.at n

Example: Hadoop

  • Ecosystem of tools for processing big data
  • Simple computational model
  • two-stage method for processing large data amounts
  • design an algorithm for operating on one chunk of the

data in two stages (a Map and a Reduce stage), MapReduce automatically distributes that algorithm to cluster à hides complexity in framework

15

http://hadoop.apache.org http://architects.dzone.com/articles/how-hadoop-mapreduce-works

slide-16
SLIDE 16

www.tugraz.at n

Hadoop in eScience: Example: Astronomical Image Processing

  • Large telescopes survey sky over a prolonged period
  • f time.
  • Large Synoptic Survey Telescope LSST - under

construction - will capture 1/2 of sky over 10 years - 30TB of data every night - ~60PBs in 10 years

  • Astronomers pick out faint objects for study by

capturing multiple images of same area and by combining them – „coaddition“

  • Challenge: how to organize and process all the

resulting data.

16

http://www.lsst.org/lsst/

slide-17
SLIDE 17

www.tugraz.at n

Using Hadoop to help with image coaddition

17

http://escience.washington.edu/get-help-now/astronomical-image-processing-hadoop

slide-18
SLIDE 18

www.tugraz.at n

Virtual Science Environments

  • Not only HPC but also sharing of knowledge and data

is becoming a requirement for scientific discovery

  • providing useful mechanisms to facilitate this sharing
  • Preserve and organize research data

à Virtual Science Environments: „virtual environments in which researchers work together through ubiquitous, trusted and easy access to services for scientific data, computing and networking, enabled by e-Infrastructures“

18

slide-19
SLIDE 19

www.tugraz.at n

Defining e-Infrastructures

European e- Infrastructure Reflection group (e-IRG): ‘The term e-Infrastructure refers to this new research environment in which all researchers—whether working in the context of their home institutions or in national or multinational scientific initiatives—have shared access to unique or distributed scientific facilities (including data, instruments, computing and communications), regardless of their type and location in the world.’

19

http://www.e-irg.eu/about-e-irg.html

slide-20
SLIDE 20

www.tugraz.at n

e-Infrastructures - Goals

  • Opening access to knowledge through reliable, distributed

and participatory data e-infrastructures

  • Cost effective infrastructures for preservation and curation

for re-use of data

  • Persistent availability of information and linking people and

data through flexible and robust digital identifiers

  • Interoperability for consistency of approaches on global

data exchange (e.g. standards)

  • Enabling trust through authentication and authorisation

mechanisms

20

http://cordis.europa.eu/fp7/ict/e-infrastructure/docs/framework-for-action-in-h2020_en.pdf

slide-21
SLIDE 21

www.tugraz.at n

Example: e-Infrastructure OpenAIRE

  • The European Open Access Data Infrastructure for

Scholarly and Scientific Communication

  • Functionality:
  • Harvesting and storing of information about

publications from various repos (OAI-PMH)

  • Enables searching for publications and related

infos (e.g. funding,..)

  • Provides list of OA repos that can be used to store

publications

  • Orphan repo
  • Shows statistics of stored data

21

https://www.openaire.eu

slide-22
SLIDE 22

www.tugraz.at n

OpenAIRE - Applications

22

slide-23
SLIDE 23

www.tugraz.at n

Example: e-Infrastructures Austria 1/2

23

http://www.e-infrastructures.at

slide-24
SLIDE 24

www.tugraz.at n

Example: e-Infrastructures Austria 2/2

24

slide-25
SLIDE 25

www.tugraz.at n

Take away message

  • Big Science / e-Science: data-driven, large scale

science

  • Supercomputers and distributed computing
  • Virtual research environments
  • e-Infrastructures

25

slide-26
SLIDE 26

www.tugraz.at n

Bibliometric Network Analysis

26

slide-27
SLIDE 27

www.tugraz.at n

Bibliometrics

  • Quantitative study of all kinds of bibliographic data
  • Patterns of authorship, publications, citations
  • E.g: citation analysis of research outputs/publication
  • Assess research impact of individuals, groups,

institutions

  • Measuring by Author (H Index), Article (Plos), or

Publication (Journal Impact Factor)

  • Measure of Output not Quality (Quantitative Not

Qualitative !)

  • Other measures could include funding received, number
  • f patents, awards granted, or qualitative measures

such as peer review

17/04/2015 Maynooth University

slide-28
SLIDE 28

www.tugraz.at n

Why use Bibliometrics?

  • Measure impact of research/publishing activity
  • CV, promotion, tenure, grants, feedback to funding bodies/

industry/public

  • Showcase Individual/Group/Institutional Research
  • identify Areas of Research Strengths/Weaknesses
  • Inform Research Priorities
  • Identify highest impact or top performing Journals in a Subject

Area

  • Where to Publish, learning about a particular subject area,

identify emerging areas of research

  • Identify the top researchers in a subject area
  • Collaborations/Competitors
  • Recruitment
  • Learning about a subject area

17/04/2015 Maynooth University

slide-29
SLIDE 29

www.tugraz.at n

Bibliometric Networks

  • Represent scientific literature based on bibliographic

data in form of networks

  • Helps providing overview of structure of scientific

literature e.g. in a domain or wrt a topic

  • Applications
  • Identify main research areas within a field
  • Analyze relationship between research areas

29

slide-30
SLIDE 30

www.tugraz.at n

Bibliometric Networks

  • Co-authorship networks
  • Citation networks
  • Co-citation networks
  • Co-occurence maps
  • Keywords, extracted topics,..

30

slide-31
SLIDE 31

www.tugraz.at n

Co-authorship Networks

  • Scientific collaboration network
  • Nodes are authors of publications
  • Link between authors if they co-authored a

publication

  • Collaboration networks are scale-free
  • Co-authorship networks are Affiliation Networks

31

slide-32
SLIDE 32

www.tugraz.at n

Co-authorship networks: Example

32

slide-33
SLIDE 33

www.tugraz.at n

Citation Networks

  • Nodes are publications
  • Link between nodes if publications cite each other
  • Reveals how often articles were cited

33

slide-34
SLIDE 34

www.tugraz.at n

Citation Networks

34

http://eduinf.eu/2012/03/15/co-citation-analysis-of-the-topic-social-network-analysis/

slide-35
SLIDE 35

www.tugraz.at n

Co-Citation Networks

  • Nodes are publications
  • Links between nodes if two publications were cited

together in a paper

  • How often two articles were cited by some third

article

  • OR: nodes are authors
  • Links between nodes if authors were cited together
  • To identify clusters of authors

35

slide-36
SLIDE 36

www.tugraz.at n 36

Author co-citation network of 15 history & philosophy of science

  • journals. Two authors are connected if

they are cited together in some article, and connected more strongly if they are cited together frequently

http://www.scottbot.net/HIAL/?p=38272

slide-37
SLIDE 37

www.tugraz.at n

Mining in Scientific Networks

  • Find influential researchers
  • Find influential papers
  • Investigate patterns of scientific collaboration
  • ...

37

slide-38
SLIDE 38

www.tugraz.at n

Centrality Measures

  • Degree Centrality
  • equals to number of links (connections) a

node has à In citation networks papers that have high in-degree centrality have a lot of citations à Widely used metric for measuring the scientific impact of a paper

38

slide-39
SLIDE 39

www.tugraz.at n

Centrality Measures

  • „Extension“ of degree centrality
  • Degree centrality awards one centrality point for

every neighbor a node has

  • However, not all neighbors are equally important
  • In many cases importance of node increased by

having connections to other nodes that are themselves important

  • Eigenvector centrality: not only count of neighbors is

important but also the importance of the neighbors

  • Eigenvector centrality gives each node score

proportional to the sum of the scores of its neighbors

39

slide-40
SLIDE 40

www.tugraz.at n

Centrality Measures in Python

https://networkx.github.io/documentation/latest/ reference/algorithms.centrality.html

40

slide-41
SLIDE 41

www.tugraz.at n

Summary

  • Big Science
  • E-Science
  • E-Infrastructure
  • Bibliometrics
  • Bibliometric Network Analysis

41

slide-42
SLIDE 42

www.tugraz.at n

Thank you for your attention!

42