Identification of and correction for publication bias Isaiah Andrews - PowerPoint PPT Presentation

Identification of and correction for publication bias Isaiah Andrews Maximilian Kasy December 13, 2017

Introduction Fundamental requirement of science: replicability Different researchers should reach same conclusions Methodological conventions should ensure this (e.g., randomized experiments) Replicability often appears to fail, e.g. Experimental economics (Camerer et al., 2016) Experimental psychology (Open Science Collaboration, 2015) Medicine (Ionnidias, 2005) Cell Biology (Begley et al, 2012) Neuroscience (Button et al, 2013)

Introduction Possible explanation: selective publication of results Due to: Researcher decisions Journal selectivity Possible selection criteria: Statistically significant effects Confirmation of prior beliefs Novelty Consequences: Conventional estimators are biased Conventional inference does not control size

Introduction Literature Identification of publication bias: Good overview: Rothstein et al. (2006) Regression based: Egger et al. (1997) Symmetry of funnel plot (“trim and fill”): Duval and Tweedie (2000) Parametric selection models: Hedges (1992), Iyengar and Greenhouse (1988) Distribution of p-values, parametric distribution of true effects: Brodeur et al. (2016)

Introduction Literature Corrected inference: McCrary et al. (2016) Replication- and meta-studies for empirical part: Replication of econ experiments: Camerer et al. (2016) Replication of psych experiments: Open Science Collaboration (2015) Minimum wage: Wolfson and Belman (2015) Deworming: Croke et al. (2016)

Introduction Our contributions Nonparametric identification of selectivity in the publication 1 process, using a) Replication studies: Absent selectivity, original and replication estimates should be symmetrically distributed b) Meta-studies: Absent selectivity, distribution of estimates for small sample sizes should be noised-up version of distribution for larger sample sizes Corrected inference when selectivity is known 2 a) Median unbiased estimators b) Confidence sets with correct coverage c) Allow for nuisance parameters and multiple dimensions of selection d) Bayesian inference accounting for selection Applications to 3 a) Experimental economics b) Experimental psychology c) Effects of minimum wages on employment d) Effects of de-worming

Outline Introduction 1 Setup 2 Identification 3 Bias-corrected inference 4 Applications 5 6 Conclusion

Setup Assume there is a population of latent studies indexed by i True parameter value in study i is Θ ∗ i Θ ∗ i drawn from some population ⇒ empirical Bayes perspective Different studies may recover different parameters Each study reports findings X ∗ i Distribution of X ∗ i given Θ ∗ i known A given study may or may not be published Determined by both researcher and journal: we don’t try to disentangle Probability of publication P ( D i = 1 | X ∗ i , Θ ∗ i ) = p ( X ∗ i ) Published studies are indexed by j

Setup Definition (General sampling process) Latent (unobserved) variables: ( D i , X ∗ i , Θ ∗ i ) , jointly i.i.d. across i Θ ∗ i ∼ µ X ∗ i | Θ ∗ i ∼ f X ∗ | Θ ∗ ( x | Θ ∗ i ) D i | X ∗ i , Θ ∗ i ∼ Ber ( p ( X ∗ i )) Truncation: We observe i.i.d. draws of X j , where I j = min { i : D i = 1 , i > I j − 1 } Θ j = Θ ∗ I j X j = X ∗ I j

Setup Example: treatment effects Journal receives a stream of studies i = 1 , 2 ,... Each reporting experimental estimates X ∗ i of treatment effects Θ ∗ i Distribution of Θ ∗ i : µ Suppose that X ∗ i | Θ ∗ i ∼ N (Θ ∗ i , 1 ) Publication probability: “significance testing,” � 0 . 1 | X | < 1 . 96 p ( X ) = | X | ≥ 1 . 96 1 Published studies: report estimate X j of treatment effect Θ j

Setup Example continued – Publication bias 1.5 1 Median Bias 1 0.9 Coverage Bias 0.5 0.8 0 0.7 True Coverage Nominal Coverage -0.5 0.6 0 1 2 3 4 5 0 1 2 3 4 5 3 3 Left: median bias of ˆ θ j = X j Right: true coverage of conventional 95% confidence interval

Identification Identification of the selection mechanism p ( · ) Key unknown object in model: publication probability p ( · ) We propose two approaches for identification: Replication experiments: 1 replication estimate X r for the same parameter Θ selectivity operates only on X , but not on X r Meta-studies: 2 Variation in σ ∗ , where X ∗ ∼ N (Θ ∗ , σ ∗ 2 ) Assume σ ∗ is (conditionally) independent of Θ ∗ across latent studies i Standard assumption in the meta-studies literature; validated in our applications by comparison to replications Advantages: Replications: Very credible 1 Meta-studies: Widely applicable 2

Identification Intuition: identification using replication studies B 1.96 1.96 X *r X r A 0 0 -1.96 -1.96 -1.96 0 1.96 -1.96 0 1.96 X * X Left: no truncation ⇒ areas A and B have same probability Right: p ( Z ) = 0 . 1 + 0 . 9 · 1 ( | Z | > 1 . 96 ) ⇒ A more likely then B

Identification Approach 1: Replication studies Definition (Replication sampling process) Latent variables: as before, Θ ∗ i ∼ µ X ∗ i | Θ ∗ i ∼ f X ∗ | Θ ∗ ( x | Θ ∗ i ) D i | X ∗ i , Θ ∗ i ∼ Ber ( p ( X ∗ i )) Additionally: replication draws, X ∗ r i | X ∗ i , D i , Θ ∗ i ∼ f X ∗ | Θ ∗ ( x | Θ ∗ i ) Observability: as before, I j = min { i : D i = 1 , i > I j − 1 } Θ j = Θ I j j ) = ( X ∗ I j , X ∗ r ( X j , X r I j )

Identification Theorem (Identification using replication experiments) is of the form A × A for some set A. Assume that the support of f X ∗ i , X ∗ r i Then p ( · ) is identified on A up to scale. Intuition of proof: Marginal density of ( X , X r ) is p ( x ) � f X ∗ | Θ ∗ ( x | θ ∗ i ) f X ∗ | Θ ∗ ( x r | θ ∗ i ) d µ ( θ ∗ f X , X r ( x , x r ) = i ) E [ p ( X ∗ i )] Thus, for all a , b , if p ( a ) > 0, p ( b ) p ( a ) = f X , X r ( b , a ) f X , X r ( a , b )

Identification Practical complication Replication experiments follow the same protocol ⇒ estimate same effect Θ But often different sample size ⇒ different variance ⇒ symmetry breaks down Additionally: replication sample size often determined based on power calculations given initial estimate p ( · ) is still identified (up to scale): Assume X normally distributed Intuition: Conditional on X , σ , (de-)convolve X r with normal noise to get symmetry back µ is identified as well

Identification Further complication What if selectivity is based not only on observed X , but also on unobserved W ? Would imply general selectivity of the form D i | X ∗ i , Θ ∗ i ∼ Ber ( p ( X ∗ i , Θ ∗ i )) Again assume normality, X ∗ r i | σ i , D i , X ∗ i , Θ ∗ i ∼ N (Θ ∗ i , σ 2 i ) ⇒ Solution: Identify µ Θ | X from f X r | X by deconvolution Recover f X | Θ by Bayes’ rule ( f X is observed) This density is all we need for bias corrected inference We use this to construct specification tests for our baseline model

Identification Intuition: identification using meta-studies 2.5 2.5 2 2 1.5 1.5 < * < B 1 1 A 0.5 0.5 0 0 -3 -2 -1 0 1 2 3 4 5 -3 -2 -1 0 1 2 3 4 5 X * X Left: no truncation ⇒ dist for higher σ noised up version of dist for lower σ Right: p ( Z ) = 0 . 1 + 0 . 9 · 1 ( | Z | > 1 . 96 ) ⇒ “missing data” inside the cone

Identification Approach 2: meta-studies Definition (Independent σ sampling process) σ ∗ i ∼ µ σ Θ ∗ i | σ ∗ i ∼ µ Θ X ∗ i | Θ ∗ i , σ ∗ i ∼ N (Θ ∗ i , σ ∗ 2 i ) D i | X ∗ i , Θ ∗ i , σ ∗ i ∼ Ber ( p ( X ∗ i / σ ∗ i )) We observe i.i.d. draws of ( X j , σ j ) , where I j = min { i : D i = 1 , i > I j − 1 } ( X j , σ j ) = ( X ∗ I j , σ ∗ I j ) Define Z ∗ = X ∗ σ ∗ and Z = X σ

Identification Theorem (Nonparametric identification using variation in σ ) Suppose that the support of σ contains a neighborhood of some point σ 0 . Then p ( · ) is identified up to scale. Intuition of proof: Conditional density of Z given σ is p ( z ) � f Z | σ ( z | σ ) = ϕ ( z − θ / σ ) d µ ( θ ) E [ p ( Z ∗ ) | σ ] Thus � ϕ ( z − θ / σ 2 ) d µ ( θ ) f Z | σ ( z | σ 1 ) = E [ p ( Z ∗ ) | σ = σ 1 ] f Z | σ ( z | σ 2 ) E [ p ( Z ∗ ) | σ = σ 2 ] · � ϕ ( z − θ / σ 1 ) d µ ( θ ) Recover µ from right hand side, then recover p ( · ) from first equation

Bias-corrected inference Once we know p ( · ) , can correct inference for selection For simplicity, here assume X , Θ both 1-dimensional Density of published X given Θ : p ( x ) f X | Θ ( x | θ ) = E [ p ( X ∗ ) | Θ ∗ = θ ] · f X ∗ | Θ ∗ ( x | θ ) Corresponding cumulative distribution function: F X | Θ ( x | θ )

Bias-corrected inference Corrected frequentist estimators and confidence sets We are interested in bias, and the coverage of confidence sets Condition on θ : standard frequentist analysis Define ˆ θ α ( x ) via � � x | ˆ θ α ( x ) = α F X | Θ Under mild conditions, can show that � � ˆ θ α ( X ) ≤ θ | θ = α ∀ θ P Median-unbiased estimator: ˆ θ 1 2 ( X ) for θ Equal-tailed level 1 − α confidence interval: � � ˆ 2 ( X ) , ˆ θ α θ 1 − α 2 ( X )

Identification of and correction for publication bias Isaiah Andrews - PowerPoint PPT Presentation

Identification of and correction for publication bias Isaiah Andrews Maximilian Kasy December 13, 2017 Introduction Fundamental requirement of science: replicability Different researchers should reach same conclusions Methodological

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Bias in, Bias out: Gender Equality and the Fourth Industrial Revolution Debra Howcroft and

Transistor bias circuits 1 Objectives Discuss the concept of dc biasing of a transistor for

go to the source The Media Bias Chart The Media Bias Chart A new taxonomy for discussing the

Implicit Bias Implicit bias Implicit bias refers to attitudes or stereotypes that affect our

Equity & Excellence: Hidden Bias Implicit Bias Inherent Bias

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Making Generative Classifiers Robust to Selection Bias Andrew Smith Charles Elkan November

pn -junctonJ under dark conditons No Bias Forward Bias Reverse Bias Model - + Circuit P N

Microwave Scan Bias Status Report Bjorn Lambrigtsen February 25, 2003 AIRS Science Team

Quantum Information Processing and Quantum Error Correction and Quantum Error Correction with

Eight Truths about Correction from the Book of Proverbs 3 1. The right attitude to correction

Detailed Survey Results 3Q 2016 Survey Background Conducted between August 9-24, 2016

Recent IDIS Changes Based on the HOME Commitment Interim Rule: Session 1 1 Agenda

OpEx savings by reduction of stock of spare parts with Sliceable Bandwidth Variable

Briefing on Briefing on Pr Project oject Aim Aim 2020 2020 Septembe September 8 r 8, ,

Utility value of management tools Advanced Herd Management Anders Ringgaard Kristensen Slide 1

Why Gaussian and Cauchy Functions Computationally . . . Are Efficient in Filled Function Method:

Exponential distribution STAT 587 (Engineering) Iowa State University September 17, 2020

CS70: Jean Walrand: Lecture 36. Continuous Probability 3 1. Review: CDF , PDF 2. Review:

Identification of and correction for publication bias Isaiah Andrews - PowerPoint PPT Presentation

Identification of and correction for publication bias Isaiah Andrews Maximilian Kasy December 13, 2017 Introduction Fundamental requirement of science: replicability Different researchers should reach same conclusions Methodological

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

BIAS BIAS LIGHT LIGHT &amp; &amp; MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Bias in, Bias out: Gender Equality and the Fourth Industrial Revolution Debra Howcroft and

Transistor bias circuits 1 Objectives Discuss the concept of dc biasing of a transistor for

go to the source The Media Bias Chart The Media Bias Chart A new taxonomy for discussing the

Implicit Bias Implicit bias Implicit bias refers to attitudes or stereotypes that affect our

Equity &amp; Excellence: Hidden Bias Implicit Bias Inherent Bias

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Making Generative Classifiers Robust to Selection Bias Andrew Smith Charles Elkan November

pn -junctonJ under dark conditons No Bias Forward Bias Reverse Bias Model - + Circuit P N

Microwave Scan Bias Status Report Bjorn Lambrigtsen February 25, 2003 AIRS Science Team

Quantum Information Processing and Quantum Error Correction and Quantum Error Correction with

Eight Truths about Correction from the Book of Proverbs 3 1. The right attitude to correction

Detailed Survey Results 3Q 2016 Survey Background Conducted between August 9-24, 2016

Recent IDIS Changes Based on the HOME Commitment Interim Rule: Session 1 1 Agenda

OpEx savings by reduction of stock of spare parts with Sliceable Bandwidth Variable

Briefing on Briefing on Pr Project oject Aim Aim 2020 2020 Septembe September 8 r 8, ,

Utility value of management tools Advanced Herd Management Anders Ringgaard Kristensen Slide 1

Why Gaussian and Cauchy Functions Computationally . . . Are Efficient in Filled Function Method:

Exponential distribution STAT 587 (Engineering) Iowa State University September 17, 2020

CS70: Jean Walrand: Lecture 36. Continuous Probability 3 1. Review: CDF , PDF 2. Review:

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Equity & Excellence: Hidden Bias Implicit Bias Inherent Bias