BETS: The dangers of selection bias in early analyses of the - PowerPoint PPT Presentation

BETS: The dangers of selection bias in early analyses of the coronavirus disease (COVID-19) pandemic Qingyuan Zhao Statistical Laboratory, University of Cambridge May 5, 2020 @ YSPH Biostatistics Seminar Manuscript: arXiv:2004.07743 Slides: http://www.statslab.cam.ac.uk/~qz280/ .

Collaborators Nianqiao (Phyllis) Ju Sergio Bacallado Rajen Shah PhD student at Harvard Stats Lab, Cambridge Stats Lab, Cambridge And many thanks to... Cindy Chen, Yang Chen, Yunjin Choi, Hera He, Michael Levy, Marc Lipsitch, James Robins, Andrew Rosenfeld, Dylan Small, Yachong Yang, Zilu Zhou, and many other who have provided helpful suggestions. Qingyuan Zhao (Stats Lab, Cambridge) BETS on COVID-19 May 5, 2020 1 / 53

COVID-19 is personal for everyone Me and my parents, all grew up in in Wuhan, China. (September 7, 2019) Qingyuan Zhao (Stats Lab, Cambridge) BETS on COVID-19 May 5, 2020 2 / 53

Wuhan Lockdown (January 23, 2020) Before the lockdown After the lockdown Qingyuan Zhao (Stats Lab, Cambridge) BETS on COVID-19 May 5, 2020 3 / 53

The beginning of this project On January 29, I heard from my parents that a close relative was just diagnosed with “viral pneumonia”. This prompted me to start looking into the data available at the time. However, epidemiological data from Wuhan are very unreliable! Some anecdotal evidence Inadequate testing: The relative of mine could not get a RT-PCR test till mid-February, when she was already recovering. False negative test: Her first test was negative. A few days later she was tested again and the result came back positive. Insufficient contact tracing: Her husband who also showed COVID symptoms quickly recovered and was never tested. Qingyuan Zhao (Stats Lab, Cambridge) BETS on COVID-19 May 5, 2020 4 / 53

Insufficient testing in Wuhan A change of diagnostic criterion on February 12 led to a huge spike of cases. Solution: Using cases “exported” from Wuhan This has two benefits: Testing and contact tracing were intensive in other locations. 1 Detailed case reports (instead of mere case counts) are often available. 2 This design was first used by Neil Ferguson’s team in Imperial College, who estimated on January 17 that there might be already over 1,700 cases in Wuhan. Qingyuan Zhao (Stats Lab, Cambridge) BETS on COVID-19 May 5, 2020 5 / 53

Our first analysis Qingyuan Zhao (Stats Lab, Cambridge) BETS on COVID-19 May 5, 2020 6 / 53

A puzzling comparison Qingyuan Zhao (Stats Lab, Cambridge) BETS on COVID-19 May 5, 2020 7 / 53

Which one is correct? 1,000,000 United States United States Italy Spain France United Kingdom 10,000 Spain Italy Germany France United Kingdom 100,000 Iran Belgium Iran Turkey Germany Netherlands Belgium Canada Netherlands Brazil Switzerland Brazil Turkey Russia Portugal Sweden Total deaths Total cases Austria 1,000 Canada Switzerland Israel India Ireland Sweden Peru South Korea 10,000 Portugal Japan Ecuador Chile Poland Romania Norway Indonesia Czech Republic Denmark Australia Pakistan MexicoAustria Ireland Mexico Saudi Arabia India Philippines Malaysia Romania Ecuador United Arab Emirates Indonesia Algeria Philippines Denmark Serbia Poland Panama Qatar Belarus Dominican Republic UkraineLuxembourg Finland Singapore Peru South Korea Colombia Thailand Dominican Republic South Africa Argentina Egypt Russia Egypt Greece Czech Republic Algeria Moldova Morocco Hungary Croatia Iceland Colombia Morocco Norway Hungary Bahrain Israel Japan Estonia Iraq Kuwait Pakistan Argentina Kazakhstan Ukraine Greece Uzbekistan Armenia Azerbaijan Slovenia 100 Chile Panama Bosnia and Herzegovina New Zealand Lithuania Bangladesh Serbia MalaysiaIraq 1,000 Saudi Arabia Luxembourg Finland Australia Slovenia Singapore 100 10 0 20 40 60 0 20 40 Days since 100 cases Days since 10 deaths In countries most hard hit by COVID-19, the total cases and deaths grew about 100 times in the first 20 days (doubling time: 20 / log 2 (100) = 3 . 01 days). Qingyuan Zhao (Stats Lab, Cambridge) BETS on COVID-19 May 5, 2020 8 / 53

How can the results be so different? Spoilers... Similar data and model were used in these two studies, with one crucial difference: The Lancet study did not take into account the travel ban. Qingyuan Zhao (Stats Lab, Cambridge) BETS on COVID-19 May 5, 2020 9 / 53

Rest of the talk Overview of selection bias 1 Dataset 2 Model 3 Why some early analyses were severely biased? 4 Bayesian nonparametric inference 5 Conclusions 6 Qingyuan Zhao (Stats Lab, Cambridge) BETS on COVID-19 May 5, 2020 10 / 53

Bias (i): Under-ascertainment This may occur if symptomatic patients did not seek healthcare or could not be diagnosed. Susceptible studies: All studies using cases confirmed when testing is insufficient. Direction of bias: Varied, depending on the pattern of under-ascertainment and parameter of interest. Solution: Use carefully considered and planned study designs. Qingyuan Zhao (Stats Lab, Cambridge) BETS on COVID-19 May 5, 2020 12 / 53

Bias (ii): Non-random sample selection Cases included in the study are not representative of the population. Susceptible studies: All studies, as detailed information of COVID-19 cases is sparse, but especially those without clear inclusion criteria. Direction of bias: Varied. Solution: Follow a protocol for data collection and exclude data that do not meet the sample inclusion criterion. Qingyuan Zhao (Stats Lab, Cambridge) BETS on COVID-19 May 5, 2020 13 / 53

Bias (iii): Travel ban Outbound travel from Wuhan was banned from January 23, 2020 to April 8, 2020. Susceptible studies: Studies that analyze cases exported from Wuhan. Direction of bias: Under-estimation of epidemic growth and infection-to-recovery time. Solution: Derive tailored likelihood functions to account for travel restrictions. Qingyuan Zhao (Stats Lab, Cambridge) BETS on COVID-19 May 5, 2020 14 / 53

Bias (iv): Epidemic growth Patients were more likely to be infected towards the end of their exposure period. Susceptible studies: Studies that treat infections as uniformly distributed over the exposure period. Direction of bias: Over-estimation of the incubation period. Solution: Derive tailored likelihood functions to account for epidemic growth. Qingyuan Zhao (Stats Lab, Cambridge) BETS on COVID-19 May 5, 2020 15 / 53

Bias (v): Right-truncation Cases confirmed after a certain time are excluded from the dataset. Susceptible studies: Studies that only use cases detected early in an epidemic. Direction of bias: Under-estimation of the incubation period. Solution: Collect all cases that meet a selection criterion, do not end data collection 1 prematurely; Derive tailored likelihood functions to correct for right-truncation. 2 Qingyuan Zhao (Stats Lab, Cambridge) BETS on COVID-19 May 5, 2020 16 / 53

Recap Types of bias in COVID-19 analyses (i) Under-ascertainment. (ii) Non-random sample selection. (iii) Travel ban. (iv) Epidemic growth. (v) Right-truncation. Keys to avoid the selection bias Carefully design the study and adhere to the sample inclusion criterion. 1 Start from a generative model and derive likelihood functions that adjust for 2 sample selection. Qingyuan Zhao (Stats Lab, Cambridge) BETS on COVID-19 May 5, 2020 17 / 53

BETS: The dangers of selection bias in early analyses of the - PowerPoint PPT Presentation

BETS: The dangers of selection bias in early analyses of the coronavirus disease (COVID-19) pandemic Qingyuan Zhao Statistical Laboratory, University of Cambridge May 5, 2020 @ YSPH Biostatistics Seminar Manuscript: arXiv:2004.07743 Slides:

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

BETS: The dangers of selection bias in early analyses of the coronavirus disease (COVID-19)

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Making Generative Classifiers Robust to Selection Bias Andrew Smith Charles Elkan November

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Autom otiv e Div ision Overview of current business Agenda Strategic bets for future 2

Private Equity: Leveraged Expertise or Leveraged Bets? Ulf Axelson London School of Economics

Roulette: Inheritance Case Study Roulette involves a player, a wheel, and bets Real game

CHYMERA ACRASTRIP PRODUCTS www.chymeragroup.com Q.: Why Eliminate Acetone From Your Facility?

DANGERS OF SECONDARY CRASHES By Bill Fuller DANGERS OF SECONDARY CRASHES Danger to Motorists

Early SUSY analyses with ATLAS Giacomo Polesello INFN, Sezione di Pavia Early analyses at the

Equity & Excellence: Hidden Bias Implicit Bias Inherent Bias

Bias in, Bias out: Gender Equality and the Fourth Industrial Revolution Debra Howcroft and

Human Mobility Restrictions and the Spread of the Novel Coronavirus (2019-nCoV) in China Hanming

The presentation will start shortly COVID-19 Health Care Provider Briefing Middlesex and London

Key Concept Definitions Staying home allows individuals who may have been exposed to COVID-19 to

Childbearing Postponement, its Option Value, and the Biological Clock David de la Croix 1 Aude

Temporal Graph Analytics with GRADOOP Christopher Rost and Kevin Gomez Leipzig University About

Guidelines, and Testing August 12, 2020 Presented by Ashley Wegner, MPH, CIC Kim Mawby, MSN,

Novel 2019 Coronavirus Update (COVID-19) HCPA March 4 th 2020 Prof Jim McManus Director of

The Coronavirus: What Insurance Will and Wont Do. What We Know Today -1,300 Known Cases in

Sambuz

Useful Links

Newsletter

Mail Us

BETS: The dangers of selection bias in early analyses of the - PowerPoint PPT Presentation

BETS: The dangers of selection bias in early analyses of the coronavirus disease (COVID-19) pandemic Qingyuan Zhao Statistical Laboratory, University of Cambridge May 5, 2020 @ YSPH Biostatistics Seminar Manuscript: arXiv:2004.07743 Slides:

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

BETS: The dangers of selection bias in early analyses of the coronavirus disease (COVID-19)

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Making Generative Classifiers Robust to Selection Bias Andrew Smith Charles Elkan November

BIAS BIAS LIGHT LIGHT &amp; &amp; MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Autom otiv e Div ision Overview of current business Agenda Strategic bets for future 2

Private Equity: Leveraged Expertise or Leveraged Bets? Ulf Axelson London School of Economics

Roulette: Inheritance Case Study Roulette involves a player, a wheel, and bets Real game

CHYMERA ACRASTRIP PRODUCTS www.chymeragroup.com Q.: Why Eliminate Acetone From Your Facility?

DANGERS OF SECONDARY CRASHES By Bill Fuller DANGERS OF SECONDARY CRASHES Danger to Motorists

Early SUSY analyses with ATLAS Giacomo Polesello INFN, Sezione di Pavia Early analyses at the

Equity &amp; Excellence: Hidden Bias Implicit Bias Inherent Bias

Bias in, Bias out: Gender Equality and the Fourth Industrial Revolution Debra Howcroft and

Human Mobility Restrictions and the Spread of the Novel Coronavirus (2019-nCoV) in China Hanming

The presentation will start shortly COVID-19 Health Care Provider Briefing Middlesex and London

Key Concept Definitions Staying home allows individuals who may have been exposed to COVID-19 to

Childbearing Postponement, its Option Value, and the Biological Clock David de la Croix 1 Aude

Temporal Graph Analytics with GRADOOP Christopher Rost and Kevin Gomez Leipzig University About

Guidelines, and Testing August 12, 2020 Presented by Ashley Wegner, MPH, CIC Kim Mawby, MSN,

Novel 2019 Coronavirus Update (COVID-19) HCPA March 4 th 2020 Prof Jim McManus Director of

The Coronavirus: What Insurance Will and Wont Do. What We Know Today -1,300 Known Cases in

Sambuz

Useful Links

Newsletter

Mail Us

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Equity & Excellence: Hidden Bias Implicit Bias Inherent Bias