[PPT] - some cRiteRia foR building Reliable bibliometRic indicatoRs foR PowerPoint Presentation

SLIDE 1

some cRiteRia foR building Reliable bibliometRic indicatoRs foR measuRing ReseaRch peRfoRmance

n data souRces, data quality and caveats in

designing and applying indicatoRs wolfgang glÄnzel

Centre for R&D Monitoring and Dept MSI, KU Leuven, Belgium

SLIDE 2

 Structure of presentation 

1. pRologue
2. fRom infoRmation to evaluation
3. appRopRiateness of data souRces
4. quality and cleanness of data

4.1 Basic problems of author identification 4.2 Institutional assignment

5. soundness of methodology
6. subject gRanulaRity
7. aggRegation level
8. epilogue

GlÄnzel, Cinvestav, Mexico-City, 2019 2/40

SLIDE 3

CHapter 1 From Information to Evaluation – The Big Turn

GlÄnzel, Cinvestav, Mexico-City, 2019 3/40

SLIDE 4

 Prologue 

The way from bibliographic data to appropriate bibliometric indicators for measuring research performance is a long and stony road.

In what follows I will atuempt to sketch the basic criteria for a

correct use of bibliographic databases for indicator building and their sensible use in research assessment.

As a rule of experience one could assume that bibliometricians spend

about 80% of their time on data processing, cleaning and indicator testing.

GlÄnzel, Cinvestav, Mexico-City, 2019 4/40

SLIDE 5

 From Information to Evaluation 

Eugene GaRfield (1925–2017) He was the founder and chairman of the Institute for Scientific Information (now part of Clarivate Analytics). In the early 1960s he developed the Science Citation Index, the world’s first large multi-disciplinary citation database. Although the SCI was developed for the advanced information retrieval and for services in scientific information, it has become the common source for scientometric studies. “The SCI was not originally created either to conduct quantitative studies, calculate impact factors, nor to facilitate the study of history of science”.  GaRfield, From information retrieval to scientometrics – is the dog still

wagging its tail? 2009

GlÄnzel, Cinvestav, Mexico-City, 2019 5/40

SLIDE 6

From Information to Evaluation

In the 1970s and 1980s, scientometrics/bibliometrics took a sharp rise and found a new orientation. — Besides information science and sociology of science, science policy became the third driving force in the evolution of scientometrics. The evolution from “litule scientometrics” to “big scientometrics” (GlÄnzel & Schoepflin, 1994) is characterised by two cardinal signs (GlÄnzel & WouteRs, 2013).

GlÄnzel, Cinvestav, Mexico-City, 2019 6/40

SLIDE 7

From Information to Evaluation

1. Scientometrics evolved from a sub-discipline of library and

information science to an instrument for evaluation and benchmarking (GlÄnzel, 2006; WouteRs, 2013).

As a consequence, several scientometric tools became used in a context

for which they were not designed.

2. Due to the dynamics in evaluation, the focus has shifued away from

macro studies towards meso and micro studies of both actors and topics.

More recently, the evaluation of research teams and individual

scientists has become a central issue in services based on bibliometric data.

GlÄnzel, Cinvestav, Mexico-City, 2019 7/40

SLIDE 8

From Information to Evaluation

☛ While in information services a certain incompleteness (false negatives), even some errors (false positives) might be tolerable as the results might still be useful, in benchmarking and evaluative contexts such errors can have fatal consequences.

Moed (2010) raised the question of errors, namely, of what is an

acceptable “error rate” in the assessment process.

GlÄnzel, Cinvestav, Mexico-City, 2019 8/40

SLIDE 9

CHapter 2 Appropriateness of Data Sources

GlÄnzel, Cinvestav, Mexico-City, 2019 9/40

SLIDE 10

 Appropriateness of Data Sources 

Basic demands on bibliographic databases for possible bibliometric use include the following.

Indicators can only be as good as the underlying data source allows.
The database must provide a basis for global comparisons and

benchmarking exercises.

GlÄnzel, Cinvestav, Mexico-City, 2019 10/40

SLIDE 11

Appropriateness of Data Sources

The appropriateness of databases for bibliometric use includes basic issues such as

subject coverage (e.g., applied sciences, SSH)
source coverage (full vs. selective coverage)
availability of necessary information (co-authors, abstracts, references,

addresses)

completeness of information (e.g., all authors, all addresses,

author–afgiliation link, etc.)

Further criteria concern, among others,

publication type (journal, book, proceedings …)
document type (research article, review, letuer …)
selection criteria (journals, books, proceedings)
possible redundancy of information (translations and difgerent publication

types of similar documents)

GlÄnzel, Cinvestav, Mexico-City, 2019 11/40

SLIDE 12

CHapter 3 Qvality and Cleanness of Data

GlÄnzel, Cinvestav, Mexico-City, 2019 12/40

SLIDE 13

 Qvality and Cleanness of Data 

The quality of data is mainly afgected by the following groups

the authors of the publications indexed in the database
the editors of the journals covered by the database
the database producer
the user of the database

Strongly afgected fields and items are

author names
citations
addresses and institutions
document identifiers
funding information

GlÄnzel, Cinvestav, Mexico-City, 2019 13/40

SLIDE 14

 Qvality and Cleanness of Data 

The quality of data is mainly afgected by the following groups

the authors of the publications indexed in the database
the editors of the journals covered by the database
the database producer
the user of the database

Strongly afgected fields and items are

author names
citations
addresses and institutions
document identifiers
funding information

GlÄnzel, Cinvestav, Mexico-City, 2019 13/40

SLIDE 15

Qvality and Cleanness of Data

Authors themselves are responsible for many errors, including names, references, addresses, titles. Note that wrong tiles of cited work may result in incorrect KeyWords Plus in the WoS Core Collection. Data extracted from bibliographic databases require careful cleaning and processing before possible bibliometric use. This applies to

Author identification
Address cleaning and standardisation
Institute identification and assignment
Document identification

GlÄnzel, Cinvestav, Mexico-City, 2019 14/40

SLIDE 16

Qvality and Cleanness of Data

Authors themselves are responsible for many errors, including names, references, addresses, titles. Note that wrong tiles of cited work may result in incorrect KeyWords Plus in the WoS Core Collection. Data extracted from bibliographic databases require careful cleaning and processing before possible bibliometric use. This applies to

Author identification
Address cleaning and standardisation
Institute identification and assignment
Document identification

GlÄnzel, Cinvestav, Mexico-City, 2019 14/40

SLIDE 17

Qvality and Cleanness of Data

Typical errors caused by authors are

various types of misspelling
incomplete information
erroneous data

Data need therefore careful cleaning but many errors can practically not be corrected on the large scale. ☞ At lower aggregation levels, partial correction is possible, but requires supplementary information from external sources. ☞ On the large scale, semi-automated processes can be applied with remaining uncertainty.

GlÄnzel, Cinvestav, Mexico-City, 2019 15/40

SLIDE 18

Qvality and Cleanness of Data

Example of an incorrectly cited document

1st Author Journal PY Vol 1st Page Cites Share SCHUBERT, A SCIENTOMETRICS 1989 16 3 196 85.2% SCHUBERT A SCIENTOMETRICS 1989 16 1 17 7.4% SCHUBERT A SCIENTOMETRICS 1989 16 8 1 0.4% SCHUBERT A SCIENTOMETRICS 1989 16 18 1 0.4% SCHUBERT A SCIENTOMETRICS 1989 16 218 1 0.4% SCHUBERT A SCIENTOMETRICS 1989 16 239 1 0.4% SCHUBERT A SCIENTOMETRICS 1989 16 432 1 0.4% SCHUBERT A SCIENTOMETRICS 1989 16 463 2 0.9% SCHUBERT A SCIENTOMETRICS 1989 16 478 1 0.4% SCHUBERT A SCIENTOMETRICS 1989 16 – 2 0.9% SCHUBERT A SCIENTOMETRICS 1988 16 3 3 1.3% SCHUBERT A SCIENTOMETRICS 1987 16 3 1 0.4% SCUBERT A SCIENTOMETRICS 1989 16 3 1 0.4% BRAUN T SCIENTOMETRICS 1989 16 3 2 0.9%

Source: WoS Core Collection (Retrieved: May 2017)

GlÄnzel, Cinvestav, Mexico-City, 2019 16/40

SLIDE 19

Qvality and Cleanness of Data

Edited vs. authored books – another source of uncertainty and errors

1st Editor/Author Book title PY 1st Page Cites MOED HE HDB QUANTITATIVE TEC 2004 1 MOED HF HDB QUANTATIVE SCI T 2004 1 Moed, HF...Glanzel, W...Schmoch, U HDB QUANTITATIVE SCI 2005 19 1 MOED HF HDB QUANTITATIVE SCI 2005 CH11 1 Moed, H. F....Glanzel, W....Schmoch, U. HDB QUANTITATIVE SCI 2005 15 Moed, H.F....Glanzel, W....Schmoch, U. HDB QUANTITATIVE SCI 2004 1 1 Moed, Henk F. HDB QUANTITATIVE SCI 2004 389 1 Moed, H. F....Glanzel, W....Schmoch, U. HDB QUANTITATIVE SCI 2004 785 1 Moed, H. F HDB QUANTITATIVE SCI 2004 82 Moed, W. Glanzel HDB QUANTITATIVE SCI 2004 51 1

Source: WoS Core Collection (Retrieved: May 2017)

GlÄnzel, Cinvestav, Mexico-City, 2019 17/40

SLIDE 20

Qvality and Cleanness of Data

Edited vs. authored books – another source of uncertainty and errors

1st Author Book title PY 1st Page Cites Glaenzel, W HDB QUANTITATIVE SCI 2004 257 3 Glanzel, S. HDB QUANTITATIVE SCI 2005 257 1 Glanzel, W HANDBOOK OF QUANTITATIVE SCIENCE AND TECHNOLOGY RESEARCH … 2004 257 126 Glanzel, W. HDB BIBLIOMETRIC IND 2004 2 Schubert, A....Glanzel, W. HDB QUALITATIVE SCI 2004 277 1 GLANZEL W HDB QUANTITATIVE SCI 2006 1 Glanzel, W. HDB QUANTITATIVE SCI 2005 257 74 Glanzel, W. HDB QUANTITATIVE SCI 2005 373 1 Glanzel, W. HDB QUANTITATIVE SCI 2005 6 GLANZEL W HDB QUANTITATIVE SCI 2004 57 1

Source: WoS Core Collection (Retrieved: May 2017)

GlÄnzel, Cinvestav, Mexico-City, 2019 18/40

SLIDE 21

Basic problems of author identification

Some basic problems of author identification in bibliographic databases are related to their names: Synonyms

Spelling variants (e.g., umlauts, transliteration, name particles,

double names)

Misprints
Initials
Name changes
Database standard

GlÄnzel, Cinvestav, Mexico-City, 2019 19/40

SLIDE 22

Basic problems of author identification

The problem of synonyms

Variant 1 Variant 2 Variant 3 Umlaut Glänzel Glanzel Glaenzel Transliteration 王悦 Wang, Y Name particles Van De Broek, I Broek, I Vande / Broek, IV Vandebroek, I Initials Wemans, Andre Wemans, ADV Wemans, A Name change Petre, Camelia Stanciu, Camelia Camelia, Stanciu Database VANRAAN, AFJ VanRaan, AFJ Van Raan, AFJ

Source: HeeffeR et al., ISSI 2013

GlÄnzel, Cinvestav, Mexico-City, 2019 20/40

SLIDE 23

Basic problems of author identification

Further problems of author identification in bibliographic databases

Homonyms
Incomplete records
Incomplete double names or first names/initials
Missing links with afgiliation
Missing, incomplete or incorrect corporate address
Ambiguous or changing email address
Mobility
Institutional restructuring

GlÄnzel, Cinvestav, Mexico-City, 2019 21/40

SLIDE 24

Institutional assignment

Data quality of address-fields information in bibliographic databases does ofuen not meet the bibliometric requirement. Typical bibliometric tasks are

Studies of institutional research performance
Funding formulas with bibliometric components
Sectoral assignment
Regional performance indicators

Main challenges to address cleaning and processing:

Organisational changes
National and regional peculiarities
Institutional information provided in bibliographic databases is
fuen incomplete and error-prone.
Due to the information by authors/publishers and database provider.

☛ Organization enhanced provided by the WoS ofuen aggregates institutions to umbrella organisation and is error-prone.

GlÄnzel, Cinvestav, Mexico-City, 2019 22/40

SLIDE 25

Institutional assignment

Example of erroneous institutional assignment in Flanders

Data source: WoS Core Collection

GlÄnzel, Cinvestav, Mexico-City, 2019 23/40

SLIDE 26

Institutional assignment

Example of the variety of spelling variances in the WoS database

UNIV ERLANGEN NURNBERG FRIEDRICH ALEXANDER UNIV UNIV HOSP ERLANGEN UNIV ERLANGEN UNIV ERLANGEN NUREMBERG FRIEDRICH ALEXANDER UNIV ERLANGEN NURNBERG UNIV HOSP ERLANGEN NURNBERG UNIVERSITAT ERLANGEN NURNBERG KLINIKUM UNIV ERLANGEN NURNBERG FRIEDRICH ALEXANDER UNIVERSITAT ERLANGEN NURNBERG UNIV ERLANGEN NURNBERG POLIKLIN POLIKLIN UNIV ERLANGEN NURNBERG UNIV ERLANGEN NURNBERG KLINIKUM FRIEDRICH ALEXANDER UNIV ERLANGEN ERLANGEN UNIV HOSP FAU UNIV ERLANGEN KLINIKUM FRIEDRICH ALEXANDER UNIV ERLANGEN NURNBE UNIV ERLANGEN NURNBERG KLIN UNIV KLINIKUM ERLANGEN NURNBERG ERLANGEN UNIV UNIV ERLANGEN NURNBERG HOSP FRIEDRICH ALEXANDER UNIV ERLANGEN NUREMBERG FRIEDRICH ALEXANDER UNIV POLIKLIN UNIV CLIN ERLANGEN NURNBERG UNIV ERLANGEN NURNEMBERG UNIV ERLANGER NURNBERG DR REMEIS STERNWARTE UNIV ERLANGEN NURNBERG ERLANGEN NUREMBERG UNIV ERLANGEN NUREMBURG UNIV FAU UNIV FREDRICH ALEXANDER UNIV ERLANGEN NURNBERG FRIEDRICH ALEXANDER UNIV ERLANGEN NUERNBERG FRIEDRICH ALEXANDER UNIV NURNBERG FRIEDRICH ALEXANDER UNIV NURNBERG ERLANGEN HOSP UNIV ERLANGEN NURNBERG KLIN FRIEDRICH ALEXANDER UNIV ERLANGEN NURNBERG KOPFKLIN UNIV ERLANGEN NURNBERG KUNIV ERLANGEN NURNBERG UB ERLANGEN NURNBERG UNIV ERLANDGEN NURNBERG UNIV ERLANGEN NURNBERG KINDERKLIN UNIV ERLANGEN NURNBERG KOPFKLINIKUM UNIV ERLANGEN NURNBERG PSYCHIAT KLIN UNIV ERLANGEN NURNBURG UNIV ERLANTEN NURNBERG UNIV EYE CLIN ERLANGEN NURNBERG UNIV EYE HOSPITAL ERLANGEN NURNBERG UNIV GRLANGEN NURNBERG UNIV KLIN ERLANGEN NURNBERG UNIV LIB ERLANGEN NURNBERG UNIV NURNBERG ERLANGEN

FRIEDRICH ALEXANDER UNIV ERLANGEN NURNBERG (#69)

Data source: WoS Core Collection

GlÄnzel, Cinvestav, Mexico-City, 2019 24/40

SLIDE 27

CHapter 4 Soundness of Methodology

GlÄnzel, Cinvestav, Mexico-City, 2019 25/40

SLIDE 28

 Soundness of Methodology 

In this context we have to point to conceptual and theoretical issues. On the conceptual part, indicator designers and users should, most notably,

avoid comparing apples with oranges
pay atuention to the interpretation (e.g. quality vs. citation impact)
avoid arbitrariness in designing measures by avoiding theoretical

foundation and empirical evidence

GlÄnzel, Cinvestav, Mexico-City, 2019 26/40

SLIDE 29

Soundness of Methodology

BooKstein (1997) noted three (out of other) demons to measurement that are challenges to quantitative approaches.

randomness
fuzziness
ambiguity

On the methodological part, indicator designers and users should avoid

over-sophistication (cf. composite indicators) and
simplification (ignoring essential properties)

GlÄnzel, Cinvestav, Mexico-City, 2019 27/40

SLIDE 30

Soundness of Methodology

Bibliometric indicators are statistics and as such random variables that are subject to the rules of mathematical statistics. This implies, among others, that

indicators have random errors with certain tolerance levels,
deviations of indicator values need not necessarily be significant

(Mind out: ranking),

uncorrelatedness does not necessarily mean independence,
correlatedness does not necessarily imply causality.

In addition, the two following conditions should be observed.

Most underlying frequency distributions are discrete, non-negative

integer-valued, skewed and have heavy tails.

The “sample size” sets natural limits to the Gaussian approximation
f indicators.

GlÄnzel, Cinvestav, Mexico-City, 2019 28/40

SLIDE 31

Soundness of Methodology

Indicators that are applied to research assessment must meet several important preconditions

valid (measures what it is intended to measure)
meaningful (significance of measurement)
reliable (in terms of underlying data)
robust (insensitive to negligible changes in the system)
normalisable (sometimes problematic: cf. h-index)
standardisable (i.a. for reasons of comparability and replicability)

☛ Chicago Workshop on Bibliometric Standards (1995), Bibliometric Evaluation Standards Debate (Berlin, 2013)

GlÄnzel, Cinvestav, Mexico-City, 2019 29/40

SLIDE 32

CHapter 5 Subject Granularity

GlÄnzel, Cinvestav, Mexico-City, 2019 30/40

SLIDE 33

 Subject Granularity 

Granularity is a central issue in subject normalisation. The question arises of which hierarchical level is to be chosen as subject standard. Zitt et al. (2005) found that over-normalisation can unduly level down topics, and under-rates the role of articles in leading or central topics. They furthermore concluded that there is no best level for normalisation. ☞ Once again: The objective determines the means.

GlÄnzel, Cinvestav, Mexico-City, 2019 31/40

SLIDE 34

Subject Granularity

Plot of subject-normalised citation impact based on ECOOM subfields (lefu) and major fields (right) vs. WoS Subject Categories for 676 European universities and research institutions

y = 1.011x R2 = 0.971 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 NMCR based on ISI Subject Categories NMCR based on 60 subfields y = 1.052x R2 = 0.844 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 NMCR based on ISI Subject Categories NMCR based 12 major science fields

Source: GlÄnzel et al., 2009

GlÄnzel, Cinvestav, Mexico-City, 2019 32/40

SLIDE 35

CHapter 6 Aggregation level

GlÄnzel, Cinvestav, Mexico-City, 2019 33/40

SLIDE 36

 Aggregation level 

GlÄnzel (1996) and Moed (2010) stressed that the choice of

indicators depends on the aggregation level, the purpose and the aspect of assessment. Consequently, indicators appropriate for one level are not necessarily applicable to an other one.

In most situations, the context should determine which bibliometric

methods and how those should be applied.

At lower aggregation levels, researchers have become more

susceptible to the consequences of bibliometric practice since they are more and more concerned by policy use and misuse of bibliometric methods (GlÄnzel & DebacKeRe, 2003).

GlÄnzel, Cinvestav, Mexico-City, 2019 34/40

SLIDE 37

Aggregation level

The weight of qualitative (peer evaluation) and quantitative (bibliometrics) methods as function of the aggregation level

micro macro meso

countries individuals unive sities departments research groups subjects fields disciplines journals

Source: GlÄnzel, 2011

GlÄnzel, Cinvestav, Mexico-City, 2019 35/40

SLIDE 38

Aggregation level

Soundness and validity of methods is all the more necessary at the

individual level but not yet sufgicient.

Accuracy, reliability and completeness of sources is an absolute

imperative at this level.

GlÄnzel & WouteRs (2013) recommend to use individual level

bibliometrics in combination with “qualitative methods” and on the basis of the particular research portfolio.

The best method to do this may be the design of individual researchers

profiles combining bibliometrics with qualitative information about careers and working contexts. The profile includes the research mission and goals of the researcher.

Furthermore, they formulated 20 recommendations for bibliometrics

(The dos and don’ts in individual-level bibliometrics).

GlÄnzel, Cinvestav, Mexico-City, 2019 36/40

SLIDE 39

GlÄnzel & WouteRs – The dos and don’ts … ten tHings one must not do at tHe individual-level

1. Don’t reduce individual performance to a single number
2. Don’t use IFs as measures of quality
3. Don’t apply hidden “bibliometric filters” for selection
4. Don’t apply arbitrary weights to co-authorship
5. Don’t rank scientists according to one indicator
6. Don’t merge incommensurable measures
7. Don’t use flawed statistics
8. Don’t blindly trust one-hit wonders
9. Don’t compare apples and oranges
10. Don’t allow deadlines and workload to compel you to drop good practices

GlÄnzel, Cinvestav, Mexico-City, 2019 37/40

SLIDE 40

GlÄnzel & WouteRs – The dos and don’ts … ten tHings one migHt do at tHe individual-level

1. Also individual-level bibliometrics is statistics
2. Analyse collaboration profiles of researchers
3. Always combine quantitative and qualitative methods
4. Use citation context analysis
5. Analyse subject profiles
6. Make an explicit choice for oeuvre or time-window analysis
7. Combine bibliometrics with career analysis
8. Clean bibliographic data carefully and use external sources
9. Even some “don’ts” are not taboo if properly applied
10. Help users to interpret and apply your results

GlÄnzel, Cinvestav, Mexico-City, 2019 38/40

SLIDE 41

 Epilogue: “Unimpeded bibliometric standards” 

István Örkény (1912–1979): More One-Minute Stories. Corvina, Budapest,

2007. Selected and translated by Judith Sollosy.

Unimpeded production standards “Hello? Machine shop?” “Skultéti here.” “How much, Skultéti?” “Thirty-three, Comrade.” “What’s thirty-three, Skultéti?” “What’s thirty-three, Comrade?” “Yes, what’s thirty-three, Skultéti.” “Why? Wasn’t thirty-three the right answer, Comrade?” “The right answer to what, Skultéti.” “To your question, Comrade.” “Never mind, Skultéti, just resume where you lefu ofg.” (Heavy industry folklore, 1978)

GlÄnzel, Cinvestav, Mexico-City, 2019 39/40

SLIDE 42

Thank you very much for your atuention and for giving me the opportunity to share nearly four decades experience of using bibliographic databases for bibliometric application!