New Perspectives on Estimation and Inference about Research - - PowerPoint PPT Presentation

new perspectives on estimation and inference about
SMART_READER_LITE
LIVE PREVIEW

New Perspectives on Estimation and Inference about Research - - PowerPoint PPT Presentation

New Perspectives on Estimation and Inference about Research Productivity Cinzia Daraio Department of Computer, Control and Management Engineering A. Ruberti (DIAG) Sapienza University of Rome, Italy E-mail: cinzia.daraio@uniroma1.it Plenary


slide-1
SLIDE 1

New Perspectives on Estimation and Inference about Research Productivity

Cinzia Daraio

Department of Computer, Control and Management Engineering A. Ruberti (DIAG) Sapienza University of Rome, Italy E-mail: cinzia.daraio@uniroma1.it

Plenary Session: Intriguing Applications of Efficiency Analysis North American Productivity Workshop (NAPW X) Miami Business School, University of Miami Miami, 12 June 2018

Daraio (2018) New Perspectives NAPW X 2018 1 / 40

slide-2
SLIDE 2

Motivation

Motivation

Understanding the functioning of research: How does the production of knowledge work? Assessing research and its impacts: How can we measure productivity/efficiency of research and its impacts? What do we measure? Frascati Manual (OECD, 2015, p. 44-46): “R&D comprise creative and systematic work undertaken to increase the stock of knowledge and to devise new applications of available knowledge. Criteria: i) novel, ii) creative, iii) uncertain, iv) systematic, and v) transferable and/or reproducible”. How do we measure it? (ambiguities of productivity) Why do we measure it? An “intriguing” complex research problem A relevant policy issue.

Daraio (2018) New Perspectives NAPW X 2018 2 / 40

slide-3
SLIDE 3

Motivation

Outline

1 Motivation 2 I: A framework and a performance evaluation model 3 II: Quality and its Impact on Productivity

Our Approach to Quality Quality in Higher Education Conditional Efficiency, Separability and Unobserved Heterogeneity A Micro Application: Quality of European Universities

4 III: Inference for Nonparametric Productivity

Networks

Existing gaps Why tools from the Physics of Complex Systems? A new general production framework A Macro Application: Knowledge Production at Country Level

5 IV: Summary and Conclusions 6 Main References

Daraio (2018) New Perspectives NAPW X 2018 3 / 40

slide-4
SLIDE 4

I: A framework and a model

Introduction and Recent trends

The evaluation of research activities is a complex task for many reasons. There are no perfect indicators or metrics which fit for all purposes. Each metric of research assessment is based on a model that can be implicitly

  • r explicitly defined and discussed. If you do not specify your model of the

metric you may not check its robustness. changes:

In the knowledge production (Gibbons et al. 1994; Nowotny et al. 2001) The crisis of technoscience (Bucchi, 2009) and science (Benessia et al. 2016) Advent of the big data era (the computerization of evaluative informetrics, Moed, 2017) In the Communication of science (Bucchi and Trench, 2014)

consequences (see the refs in Daraio, 2018):

On the demand and supply side On scholars On the assessment process On the measurement of productivity/efficiency within an assessment process

Daraio (2018) New Perspectives NAPW X 2018 4 / 40

slide-5
SLIDE 5

I: A framework and a model

The need for a Framework

Theory

EDU RES INNO

Openness

Knowledge Infrastructures

Figure: The ability to develop (understand and use effectively) models for the assessment of research is linked and depends, among other factors, on the degree or depth of the conceptualization and formalization, in an unambiguous way, of the underlying idea of quality.

Daraio (2018) New Perspectives NAPW X 2018 5 / 40

slide-6
SLIDE 6

I: A framework and a model

The Generalized Implementation Problem

Intellectual resources Intervention Problem content

Context of intervention systems LoA (O-committing) Model (O-committed) Properties identifies analysed at generates attributed to

Theory Method Data

Theory EDU RES INNO Openness

Context of intervention systems

LoA (O-committing) Model (O-committed) Properties identifies analysed at generates attributed to Instantiation

LOCAL GLOBAL

Abstraction TRANSLATIONS

Theory EDU RES INNO Openness

Figure: An illustration of the generalized implementation problem. Context of Intervention and LoA (Level of Analysis): Left Panel; inclusion of translations: Right Panel.

Daraio (2018) New Perspectives NAPW X 2018 6 / 40

slide-7
SLIDE 7

I: A framework and a model

A Doubly-Conditional Performance Evaluation Model

Stakeholders - Policy and objectives

Time

Context Combined/ Transformed

Input 1 Input 2 … …… Input p Output 1 Output 2 … …… Output q

PROCESSES

Potential Heterogeneity Factors A C T O R S Other Contextual and/or Environmental Conditions R E S U L T S I M P A C T S Effectiveness Efficiency Criteria Rules Standards Understandings Incentives Actions Consequences

Figure: A Doubly-Conditional Performance Evaluation Model.

Daraio (2018) New Perspectives NAPW X 2018 7 / 40

slide-8
SLIDE 8

II: Quality and Productivity

II: Quality and its Impact on Productivity

From a joint work with L´ eopold Simar and Paul W. Wilson Quality is important but not easy to measure. Quality is linked to productivity and performance but there may be trade-offs between quality and efficiency. The role played by Quality if far from being unambiguously determined. We propose to consider Quality as a latent factor of heterogeneity, that means recognize it is difficult to directly observe it, but it may have an impact on the productivity and performance, although its impact is not a priori known and must be empirically estimated. Powell (1995) in the Strategic Management Journal finds that certain tacit, behavioural, imperfectly imitable features - such as open culture, employee empowerment and executive commitment, can produce advantage; these tacit resources drive the success of TQM.

Daraio (2018) New Perspectives NAPW X 2018 8 / 40

slide-9
SLIDE 9

II: Quality and Productivity Our Approach to Quality

Introduction: Our Approach to Quality

Recent works (Bounfour and Edvinsson, 2012) show that measuring and managing the intellectual capital of communities has the potential to change how public sector planning and development is done. Intangibles and intellectual capital have always been considered as relevant factors to the productivity and competitiveness of the private sector as well as of the public sector (Guthrie and Dumay, 2015; Dumay, Guthrie and Puntillo, 2015; Secundo Lombardi, and Dumay, 2018). The measurement of intellectual capital (Bryl, 2018) is an emerging research area in knowledge management (Tiwana, 2000; Alavi and Leidner, 2001 and Liebowitz, 2012). However, being at its infant stage, it still lacks a rigorous methodology for being assessed, as it is the case for managerial quality, that remains difficult to be directly measured and included in a more general performance measurement system. In our approach, Quality is a latent factor of heterogeneity linked to a labour input and/or an intellectual capital input. It may be used in different contexts of manufacturing and public and private service sectors.

Daraio (2018) New Perspectives NAPW X 2018 9 / 40

slide-10
SLIDE 10

II: Quality and Productivity Our Approach to Quality

Our approach to quality and its impact on productivity

Nonparametric frontier and efficiency analysis and its robust version : we use the flexible directional distance approach. We apply the new computational methods for directional distances (marginal, conditional and their robust versions) provided in Daraio, Simar and Wilson (2018) with Matlab codes. Conditional frontier models to account for heterogeneity in the production process and analyze the impact of environmental, external factors. We propose the identification of quality as a latent heterogeneity factor linked to some inputs: we use a flexible nonparametric nonseparable model We provide a methodology for identifying quality and analyze its impact on the production process We illustrate the approach through the case of the performance of European Universities.

Daraio (2018) New Perspectives NAPW X 2018 10 / 40

slide-11
SLIDE 11

II: Quality and Productivity Quality in Higher Education

Quality in Higher Education

Universities carry out a complex production process. Multiple activities, such as teaching, research and third mission are realized by combining different resources: human capital, financial stocks and infrastructures to produce heterogeneous outputs, such as: undergraduate degrees, PhD degrees, scientific publications, citations, service contracts, patents, spin off and so

  • n, within an heterogeneous environment in which size and subject mix play

an important role (e.g. Daraio et al. 2015 JI and Daraio et al. 2015 EJOR and the references cited there). The concept of quality of HEIs is tricky (Sarrico et al. 2010), elusive and complex (Westerheijden et al. 2007) and multidimensional (Blackmur 2007). It’s modeling in quantitative analysis is compelling and challenging (econometric modeling of quality, Daraio, 2017a). In search of Academic Quality(Paradeise and Thoenig, 2015): “Academic quality still remains a black box not only with regard to assessing the

  • utputs, but also in terms of the formal and informal social, cultural and
  • rganizational processes adopted by specific university governance regimes”.

In this book quality is linked to the academic staff.

Daraio (2018) New Perspectives NAPW X 2018 11 / 40

slide-12
SLIDE 12

II: Quality and Productivity Conditional Efficiency, Separability and Unobserved Heterogeneity

Modelling strategy: Conditional Efficiency

Model for Production Process in the presence of observable heterogeneous factors Z ∈ Z ⊆ Rd, with inputs X ∈ Rp

+ and one output Y ∈ R+

HX,Y |Z(x, y|Z = z) = P(X ≤ x, Y ≥ y|Z = z) = SY |X,Z(y|X ≤ x, Z = z)FX|Z(x|Z = z), Conditional support of (X, Y ), conditionally on Z = z Ψ(z) = {(x, y) ∈ Rp

+ × R+ | x can produce y, when Z = z}.

Marginal support of (X, Y ) is given by Ψ = {(x, y) ∈ Rp

+ × R+ | x can produce y} =

  • z∈Z

Ψ(z). Ψ is the (marginal) support of (X, Y ), e.g. of HX,Y (x, y) = P(X ≤ x, Y ≥ y).

Daraio (2018) New Perspectives NAPW X 2018 12 / 40

slide-13
SLIDE 13

II: Quality and Productivity Conditional Efficiency, Separability and Unobserved Heterogeneity

Separability condition

“Separability” condition (Simar-Wilson, 2007): Ψ(z) = Ψ for all z ∈ Z. Example from Simar-Wilson (2011): g(X) = [1 − (X − 1)2]1/2 Y ∗ = g(X)e−(Z−2)2U Y ∗∗ = g(X)e−(Z−2)2e−U

.25 .5 .75 1 1 2 3 4 0.5 1

x z y*

.25 .5 .75 1 1 2 3 4 0.5 1

x z y**

Left Panel: Separable, Right Panel: Not Separable. In addition, the distribution

  • f inefficiency U can or cannot be dependent on Z.

Daraio (2018) New Perspectives NAPW X 2018 13 / 40

slide-14
SLIDE 14

II: Quality and Productivity Conditional Efficiency, Separability and Unobserved Heterogeneity

Conditional Efficiency

If ∃z ∈ Z with Ψ(z) = Ψ usual Marginal efficiency score has no practical economic meaning. It is given by λ(x, y) = sup{λ | (x, λy) ∈ Ψ} = sup{λ > 0 | HXY (x, λy) > 0} What we need: Conditional efficiency scores λ(x, y|z) = sup{λ > 0 | (x, λy) ∈ Ψ(z)} = sup{λ > 0 | HXY |Z(x, λy|Z = z) > 0} NB: The production frontier φ(x, z) = λ(x, y|z) y Nonparametric Estimation

  • HXY |Z(x, y|Z = z) =

n

i=1 1

I(Xi ≤ x, Yi ≥ y)Khz(Zi − z) n

i=1 Khz(Zi − z)

, Estimator of the conditional efficiency score (here FDH)

  • λ(x, y|z) = max{i |Xi≤x,||Zi−z||≤hz}

Yi y

  • ,

Daraio (2018) New Perspectives NAPW X 2018 14 / 40

slide-15
SLIDE 15

II: Quality and Productivity Conditional Efficiency, Separability and Unobserved Heterogeneity

Conditional Efficiency

Robust version (In SVVK 2016 order-m and univariate Y ) φm(x, z) = E(max(Y1, ..., Ym)|X ≤ x, Z = z) = ∞

  • 1 − FY |X,Z(y|X ≤ x, Z = z)m

dy. Order-m conditional efficiency score λm(x, y|z) = ∞

  • 1 − FY |X,Z(uy|X ≤ x, Z = z)m

du we have φm(x, z) = λm(x, y|z) y. Nonparametric estimator

  • λm(x, y|z) =

  • 1 −

FY |X,Z(uy|X ≤ x, Z = z)m du. Statistical properties: Cazals, Florens, Simar (2002) and Jeong, Park, Simar (2010), B˘ adin, Daraio, Simar (2012).

Daraio (2018) New Perspectives NAPW X 2018 15 / 40

slide-16
SLIDE 16

II: Quality and Productivity Conditional Efficiency, Separability and Unobserved Heterogeneity

Conditional Efficiency and Separability

Analysis of the effect of Z on the production process (see Daraio, Simar, 2005, 2007, and Badin, Daraio, Simar, 2012; Daraio, Simar, 2014)

Effect on the support of (X, Y ) (efficient frontier)

Comparison of conditional and unconditional full efficiency scores (ratios and nonparametric regressions) Robust versions : order-m (and order-α quantile) frontiers for large m (and α → 1)

Effect on the distribution of the inefficiencies by using e.g. m = 1 and α = 0.5 See Badin, Daraio, Simar, 2012, 2014, Daraio and Simar, 2014, for details and tests.

Test of the separability condition (Daraio, Simar and Wilson, 2018): H0 : Ψz = Ψ, for all z ∈ Z against H1 : Ψz = Ψ, for some z ∈ Z. What to do if Z (or some component of Z) is latent (not observed)?

Daraio (2018) New Perspectives NAPW X 2018 16 / 40

slide-17
SLIDE 17

II: Quality and Productivity Conditional Efficiency, Separability and Unobserved Heterogeneity

Unobserved Heterogeneity

Simar, Vanhems, Van Keilegom (2016): V ∈ R is not observed but may also influence the frontier level φ(x, v, z) = sup{y|FY |XVZ(y|X ≤ x, V = v, Z = z) < 1} NB: Z is observed but may be ∅, i.e. no observed Z. Simar, Vanhems and Van Keilegom follow Matzkin (2003): V is linked to one of the inputs X 1 through a flexible nonparametric nonseparable model X 1 = ψ(W , V ) W is an instrument, correlated with X 1 and independent of V ψ is monotone increasing with V and V ∼ Unif(0, 1) NB: We could also consider several relations X j = ψj(W j, V j), j = 1, . . . , p.

Daraio (2018) New Perspectives NAPW X 2018 17 / 40

slide-18
SLIDE 18

II: Quality and Productivity Conditional Efficiency, Separability and Unobserved Heterogeneity

Unobserved Heterogeneity

Interpretation: V can be viewed as the part of X 1 that is independent of W .

In our case: W is the size of the university, X 1 is the input factor (or academic staff) and V is the part of X 1 not explained by the size V could be linked to some proxies (see the following slides).

According to Matzkin (2003), V is identified by the conditional cdf of X 1 given W : we have V = FX 1|W . In X 1 = ψ(W , V ), ψ is unknown and V is like a “residual”, identified (under the monotonicity assumptions) by V = FX 1|W . The choice of V ∼ Unif(0, 1) is just a rescaling and so ψ can be interpreted as a quantile function.

Daraio (2018) New Perspectives NAPW X 2018 18 / 40

slide-19
SLIDE 19

II: Quality and Productivity Conditional Efficiency, Separability and Unobserved Heterogeneity

Nonparametric Estimation

Estimation of Vi

  • Vi =

FX 1|W (X 1

i |Wi) =

n

k=1 1

I(X 1

k ≤ X 1 i )Khw (Wi − Wk)

n

k=1 Khw (Wi − Wk)

. Efficiency scores

  • λ(x, y|v, z) = sup{λ |

FY |X,

V ,Z(λy|X ≤ x, (

V , Z) = (v, z)) < 1}

  • λm(x, y|v, z) =

  • 1 −

FY |X,

V ,Z(uy|X ≤ x, (

V , Z) = (v, z))m du,

  • FY |X,

V ,Z(y|x, v, z) =

n

i=1 1

I(Xi ≤ x, Yi ≤ y)Khv ( Vi − v)Khz(Zi − z) n

i=1 1

I(Xi ≤ x)Khv ( Vi − v)Khz(Zi − z) . In practice we compute λ(Xi, Yi, | Vi, Zi) and λm(Xi, Yi, | Vi, Zi). We extend the separability test of Daraio, Simar and Wilson (2018) to unobserved heterogeneous factors.

Daraio (2018) New Perspectives NAPW X 2018 19 / 40

slide-20
SLIDE 20

II: Quality and Productivity A Micro Application: Quality of European Universities

Data

Pioneering project Aquameth (Bonaccorsi and Daraio, 2007; Daraio et al. 2011) and the feasibility study EUMIDA (Bonaccorsi, 2014) Micro data validated by NSAs and certified by a Data Quality Report are available now for the European Higher Education Institutions (European Tertiary Education Register, ETER) Project, link to download the data for 2011, 2012, 2013 and 2014: http://eter.joanneum.at/imdas-eter/. Data collection for 2015 in progress and will be on line by the end of the month! Interesting across-country comparison. A rich documentation available. Data integrated with other sources. Include Scimago Global 2013 Rank, Output years 2007- 2011 such as PUB, %IC, NI and % Q1.

Daraio (2018) New Perspectives NAPW X 2018 20 / 40

slide-21
SLIDE 21

II: Quality and Productivity A Micro Application: Quality of European Universities

Data: Inputs, Outputs, Instrument

Inputs ACAD staff, NON ACAD staff, Total expenditures (Input Factor) Outputs TDEG, Res Factor (PUB and PHD deg) Instrument Size (Total enrolled students) V unobserved quality (what remains from the resources after eliminating the size) The Input factor explains 96% of the inertia; The Research factor explains 97% of the inertia; The estimated latent factor V can be interpreted as the hidden component of the resources that contributes to the production of the considered outputs, after the elimination of the size component. Interestingly, the same results is obtained if we estimate the latent factor not

  • f the aggregated Input factor but only of the number of Academic Staff.

This could confirm that the estimated latent factor (quality) is mainly related to the academic staff of the universities, as in the book of Paradeise and Thoenig (2015).

Daraio (2018) New Perspectives NAPW X 2018 21 / 40

slide-22
SLIDE 22

II: Quality and Productivity A Micro Application: Quality of European Universities

Unobserved quality: correlations with

  • utputs and some observed proxies

ISCED5-7 0.112 ResFact 0.614 %3PFUN 0.558 %IC 0.399 NI 0.420 %Q1 0.463 %EXC 0.444 %EWL 0.351 WR

  • 0.666

RR

  • 0.6578
  • Found. Year
  • 0.270

Daraio (2018) New Perspectives NAPW X 2018 22 / 40

slide-23
SLIDE 23

II: Quality and Productivity A Micro Application: Quality of European Universities

Test of the separability condition with Z unobserved

H0 : Ψz = Ψ, for all z ∈ Z against H1 : Ψz = Ψ, for some z ∈ Z We extend Daraio, Simar and Wilson (2018) to unobserved Z Random split by stratification, per decile, according to W p-value: 0.0037 We reject H0: we have a non-separable production process! Also convexity highly rejected (Kneip, Simar and Wilson, 2015), p-value: 0.0000166!

Daraio (2018) New Perspectives NAPW X 2018 23 / 40

slide-24
SLIDE 24

II: Quality and Productivity A Micro Application: Quality of European Universities

Impact on the efficient frontier

1 0.8

Full directional distance β(z) (diff-eff) for fixed X =0.83

0.6 0.4 Vhat 0.2 0.65 0.7 SPEC 0.75 0.8 0.85 0.2 1 0.4 0.6 0.8 0.9 β(z) 0.8

Full directional distance β(z) (diff-eff) for fixed X =1.44

0.6 0.4 Vhat 0.2 0.5 0.6 SPEC 0.7 0.8 1 0.8 0.6 0.4 0.2 0.9 β(z) 0.8

Full directional distance β(z) (diff-eff) for fixed X =2.68

0.6 Vhat 0.4 0.2 0.45 0.5 0.55 SPEC 0.6 0.65 0.7 0.75 0.2 0.4 0.6 0.8 1 β(z)

Figure: Impact of V = QUAL and Z = SPEC on the shift of the full frontier β(x, y; 0, d) − β(x, y; 0, d|z, v), where d = med(y) for fixed values of the Input Factor at the 3 quartiles: from the left to the right, small, median and large level of labor (academic staff).

Daraio (2018) New Perspectives NAPW X 2018 24 / 40

slide-25
SLIDE 25

II: Quality and Productivity A Micro Application: Quality of European Universities

Impact on the distribution of the (in-)efficiencies

1

Order-m (m=310) directional distance β(z) (cond-eff) for fixed X =0.83

0.8 0.6 0.4 Vhat 0.2 0.65 0.7 SPEC 0.75 0.8 0.85 0.2 1 0.4 0.6 0.8 0.9 β(z)

Order-m (m=310) directional distance β(z) (cond-eff) for fixed X =1.44

0.8 0.6 0.4 Vhat 0.2 0.5 0.6 SPEC 0.7 0.8 1 0.8 0.6 0.4 0.2 0.9 β(z)

Order-m (m=310) directional distance β(z) (cond-eff) for fixed X =2.68

0.8 0.6 Vhat 0.4 0.2 0.45 0.5 0.55 SPEC 0.6 0.65 0.7 0.75 1 0.2 0.4 0.6 0.8 β(z)

Figure: Impact of V = QUAL and Z = SPEC on conditional order-m efficiency measures βm(x, y; 0, d|z, v), where d = med(y) for fixed values of the Input Factor at the 3 quartiles: from the left to the right, small, median and large level of labor (academic staff).

Daraio (2018) New Perspectives NAPW X 2018 25 / 40

slide-26
SLIDE 26

III: Inference for Nonparametric Productivity Networks

III: Inference for nonparametric productivity networks: Motivation and relevance

From a joint work with M. Bostian, R. F¨ are, S. Grosskopf, M.G. Izzo, L. Leuzzi,

  • G. Ruocco, W. L. Weber

Networks are general models, which can represent the relationships within or between given systems. Rapid development of social networking and the growth of social media. Development and expansion of the networks in economics (Schweitzer et al. 2009a,b; Elsner, et al. 2015): Networks: a paradigm shift for Economics? by Kirman in Bramoull´ e, Y., Galeotti, A., Rogers, B. W. (Eds.). (2016).The Oxford Handbook of the Economics of Networks. Oxford University Press. The structure and function of complex networks is widely studied in statistical mechanics (e.g. Albert and Barabasi (2002), Newman (2003), Barabasi (2016)). What about Productivity Networks? There is a rich literature Network DEA: F¨ are and Grosskopf (2000), F¨ are, Grosskopf and Whittaker (2007, 2014), Cook et al. (2010, 2014), Bogetoft et al (2009), Prieto and Zoo (2007), Kao (2014), ...

Daraio (2018) New Perspectives NAPW X 2018 26 / 40

slide-27
SLIDE 27

III: Inference for Nonparametric Productivity Networks Existing gaps

Inference for nonparametric productivity networks: Existing gaps

However, Network DEA typically analyzes the networks in a descriptive rather than statistical framework. One exception: Trinh and Zelenyuk (2015): bootstrap-based comparison between average DEA-NDEA efficiency scores and their distributions but assuming the NDEA structure! Kao (2017) in the concluding section about extensions and open questions, in particular about the choice of the Network Model to apply in given empirical contexts highlights that it is not possible a priori to choose what is the best Network Model to apply. Hence, we need to infer the Network structure: this is exactly the aim of our paper!

Daraio (2018) New Perspectives NAPW X 2018 27 / 40

slide-28
SLIDE 28

III: Inference for Nonparametric Productivity Networks Why tools from the Physics of Complex Systems?

Why productivity networks should be modelled with tools from the Physics of Complex Systems?

Our approach is Semi-Parametric: it is based on a Parametric Bayesian framework (generalized multicomponent spin model) to make inference for Nonparametric Productivity Networks. It is developed in a Bayesian framework and relies on some recent pseudo-likelihood techniques introduced in the physics of complex systems (Ravikumar et al. 2010, Aurell and Ekeberg 2012, Tyagi et al. 2016; Marruzzo et al. 2016) The answer: Georgescu-Roegen (1971) The Entropy law and the Economic process.

Daraio (2018) New Perspectives NAPW X 2018 28 / 40

slide-29
SLIDE 29

III: Inference for Nonparametric Productivity Networks A new general production framework

A new general production framework

We end up with a new more general framework for modeling the production process which includes production frontiers, information theoretic approaches to econometrics, machine learning and statistical inference from the physics

  • f complex systems.

We show that all the different productivity network DEA models (Kao, 2017) as implementation of the Georgescu Roegen’s Funds and Flows model. Morroni (1992, 2006, 2014) expounds generalizations of the description of the production process based on Georgescu Roegen’s Funds and Flows including the organizational aspects of the production process and their interactions at different levels of analysis. To the best of our knowledge, in the network DEA literature, this connection has not been done so far!

Daraio (2018) New Perspectives NAPW X 2018 29 / 40

slide-30
SLIDE 30

III: Inference for Nonparametric Productivity Networks A new general production framework

Our proposal for doing inference: A Pseudo-Likelihood Approach

Daraio (2018) New Perspectives NAPW X 2018 30 / 40

slide-31
SLIDE 31

III: Inference for Nonparametric Productivity Networks A new general production framework

The Ising Spin Glass Model

Daraio (2018) New Perspectives NAPW X 2018 31 / 40

slide-32
SLIDE 32

III: Inference for Nonparametric Productivity Networks A new general production framework

Model of the interactions of scientific productivity of disciplines

Statistical Physics Productivity Analysis Generalized Multicomponent spin Network-DEA in an model with arbitrary network Input-Output model interrelations node vectorial variable: productivity spin si

  • f discipline i

node variable components: productivity of country γ si = {si,γ} in the discipline i Node interactions or

  • interdep. or interrelations

couplings: Ji,j between productivity of disciplines Hamiltonian, generalized cost function social energy H = − 1

2 β N i,j=1 Jijsi(t) · sj(t) − N i=1 si(t) · hi

to be minimized β: β external inverse of the temperature global parameter hi: contextual environmental external magnetic field on i variables of discipline i Jii: chemical potential of i

Daraio (2018) New Perspectives NAPW X 2018 32 / 40

slide-33
SLIDE 33

III: Inference for Nonparametric Productivity Networks A new general production framework

Our proposal for doing inference

Our system as a disordered system at equilibrium, described by a generalized multicomponent spin model. Our variables s(γ)

i

(t), the equivalent of the spin configurations, are defined as follows: s(γ)

i

(t) = ∆(γ)

i

(t) D

γ=1 ∆(γ) i

(t)2 ; ∆(γ)

i

(t) = π(γ)

i

(t) − πi(t); πi(t) = 1 D

D

  • γ=1

π(γ)

i

(t); γ = 1, ..., D; i = 1, ..., N; t = 1, ..., T. (1) where γ = 1, ..., D are the countries, π(γ)

i

is the productivity of country γ in a subject category i, for i = 1, ..., N, over the period t = 1, ..., T (here 1996-2012). Finally, πi(t) is the world average of productivity in subject i. They have the property that si = 0 and (si)2 = 1

N .

Daraio (2018) New Perspectives NAPW X 2018 33 / 40

slide-34
SLIDE 34

III: Inference for Nonparametric Productivity Networks A new general production framework

The solution of the inverse generalized Ising Model

Given a model hypothesis (H) and a set of measurements of the spin configuration s, how to infer the value of the set of model parameters J which generated the data? (All the technical details are in the paper!) Maximum Likelihood estimation and the Boltzmann Machine Learning (Barber, 2012) To solve the inverse problem we have to maximize the log-likelihood with respect to the set of parameters J: l({J}) = log(L({J})) =

T

  • t=1

−H({s}t|{J}) − T log(Z({J})). (2) This is computationally hard to solve and for this reason a pseudo-likelihood approach, which involves Bolzmann-Gibbs machine learning, is applied. The solution of the inverse problem consists in finding the optimal values of the set of parameters J, which are supposed to generate the observable set of data.

Daraio (2018) New Perspectives NAPW X 2018 34 / 40

slide-35
SLIDE 35

III: Inference for Nonparametric Productivity Networks A Macro Application: Knowledge Production at Country Level

Data

Data provided within the EBRP Project was extracted from the Scopus database and relate to the scientific production of world countries for 16 (in bold in the table) out of the 27 Scopus subject categories from 1996 to 2012.

Table: List of the 27 Scopus’ subject categories

asjc Subject Description code Category 10 GENE General 11 AGRI Agricultural and Biological Sciences 12 ARTS Arts and Humanities 13 BIOC Biochemistry, Genetics and Molecular Biology 14 BUSI Business, Management and Accounting 15 CENG Chemical Engineering 16 CHEM Chemistry 17 COMP Computer Science 18 DECI Decision Sciences 19 EART Earth and Planetary Sciences 20 ECON Economics, Econometrics and Finance 21 ENER Energy 22 ENGI Engineering 23 ENVI Environmental Science 24 IMMU Immunology and Microbiology 25 MATE Materials Science 26 MATH Mathematics 27 MEDI Medicine 28 NEUR Neuroscience 29 NURS Nursing 30 PHAR Pharmacology, Toxicology and Pharmaceutics 31 PHYS Physics and Astronomy 32 PSYC Psychology 33 SOCI Social Sciences 34 VETE Veterinary 35 DENT Dentistry 36 HEAL Health Professions Daraio (2018) New Perspectives NAPW X 2018 35 / 40

slide-36
SLIDE 36

III: Inference for Nonparametric Productivity Networks A Macro Application: Knowledge Production at Country Level

Network DEA Models of Knowledge Production

Our network models borrow from Fukuyama, Weber and Xia (2016) and Weber (2017), while the production variable choices are modeled after Georgescu-Roegen, where we include both flow variables and funds variables.

Table: Network DEA Models of Knowledge Production

Models Variable Quantity Model(1) Quality Model (2) flow input Author count(NA) Author count (NA) fund input 1 Own prev pub (Basic) Mod 1.1 Own prev highly cited pub, Mod 2.1 fund input 2 Other prev pub (NW) Mod 1.2 Other prev highly cited pub, Mod 2.2 (externality) Outputs Own current pub count (P) Own current highly cited pub (HCP) Fractional current pub (F)

Daraio (2018) New Perspectives NAPW X 2018 36 / 40

slide-37
SLIDE 37

III: Inference for Nonparametric Productivity Networks A Macro Application: Knowledge Production at Country Level

Illustrative Results

Figure: Quality Model 2.2: Estimated Jij (left panel) and inferred network topology (right panel).

Daraio (2018) New Perspectives NAPW X 2018 37 / 40

slide-38
SLIDE 38

III: Inference for Nonparametric Productivity Networks A Macro Application: Knowledge Production at Country Level

Selected Results

Figure: Overlaps (Qi,j) and Interdependencies (Jij) of the Model 2.2. The North-East values reported in bold are the Jij while the South-west values correspond to the Qi,j.

Daraio (2018) New Perspectives NAPW X 2018 38 / 40

slide-39
SLIDE 39

IV: Summary and Conclusions

IV: Summary and Conclusions

A Framework for developing models of metrics A Doubly-conditional performance evaluation model A new approach to estimate unobserved quality and investigate its impact on productivity A general approach to inference in Network DEA that lead to a more general approach to production embracing information and complex theory Summing up (Hendry, 1980, p. 403): “Econometricians may well tend to look too much where the light is and too little where the key might be found.” ... “Whether or not econometrics will prove to be more analogous to alchemy than to science depends primarily on the spirit with which the subject is tackled. ... The three golden rules of econometrics are test, test and test; that all three rules are broken regularly in empirical applications is fortunately easily remedied. Rigorously tested models, which adequately described the available data, encompassed previous findings and were derived from well based theories would greatly enhance any claim to be scientific.” Intriguing applications may be developed going deeper and beyond

  • ur traditional set ups!

Daraio (2018) New Perspectives NAPW X 2018 39 / 40

slide-40
SLIDE 40

References

Main References

Daraio C. (2017a), A framework for the assessment of Research and its Impacts, Journal of Data and Information Science, Vol. 2 No. 4, 2017 pp 7-42. Daraio C. (2017b), Assessing research and its impacts: The generalized implementation problem and a doubly-conditional performance evaluation model, ISSI 2017 - 16th International Conference on Scientometrics and Informetrics, Conference Proceedings, pp. 1546-1557. Daraio C. (2018a), Econometric approaches to the measurement of research productivity, in Springer Handbook of Science and Technology Indicators edited by Gl¨ anzel W., Moed H.F., Schmoch H. and Thelwall M., forthcoming. Daraio C. (2018b), The Democratization of Evaluation and Altmetrics, Technical Report n.1, 2018 Diag, Sapienza University of Rome. Daraio C., Simar L. (2007), Advanced Robust and Nonparametric Methods in Efficiency Analysis. Methodology and Applications, Springer, New York (USA). Daraio C. Simar L. (2014), Directional Distances and their Robust versions. Computational and Testing Issues, European Journal of Operational Research, 237, 358-369. Daraio, C., Simar L. (2016), Efficiency and benchmarking with directional distances: a data-driven approach, Journal of the Operational Research Society, 67 (7), 928-944. Daraio, C., Simar L., Wilson P. (2017), Central Limit Theorems for Conditional Efficiency Measures and Tests of the “Separability” Condition in Nonparametric, Two-Stage Models of Production, The Econometrics Journal, doi.org/10.1111/ectj.12103 Daraio, C., Simar L., Wilson P. (2018a), Fast and Efficient Computation of Directional Distance Estimators, submitted. Daraio, C., Simar L., Wilson P. (2018b), Quality and its Impact on Productivity, Technical Report, Diag, Sapienza University of Rome, to appear. Daraio, C. et al. (2017), Inference for nonparametric productivity networks: A pseudo-likelihood approach, 10th International Conference of the ERCIM WG on Computational and Methodological Statistics (CMStatistics 2017), 16-18 December 2017, Technical Report, Diag, Sapienza University of Rome, to appear.

Daraio (2018) New Perspectives NAPW X 2018 40 / 40