POPULATION STRUCTURE OF THE PNEUMOCOCCUS IS DRIVEN BY SELECTION ON - - PowerPoint PPT Presentation

population structure of the pneumococcus is driven by
SMART_READER_LITE
LIVE PREVIEW

POPULATION STRUCTURE OF THE PNEUMOCOCCUS IS DRIVEN BY SELECTION ON - - PowerPoint PPT Presentation

FRIDAY , 8 TH JUNE 2018 INSTITUTO DE TECNOLOGIA QUIMICA E BIOLOGICA ( ITQB NOVA ), OEIRAS , PORTUGA L Dr Jos Loureno D EPARTMENTAL L ECTURER IN I NFECTIOUS D ISEASE D EPARTMENT OF Z OOLOGY , U NIVERSITY OF O XFORD POPULATION STRUCTURE OF THE


slide-1
SLIDE 1

POPULATION STRUCTURE OF THE PNEUMOCOCCUS IS DRIVEN BY SELECTION ON THE GROEL HEAT-SHOCK PROTEIN

A WHOLE-GENOME MACHINE LEARNING PERSPECTIVE

Dr José Lourenço

DEPARTMENTAL LECTURER IN INFECTIOUS DISEASE DEPARTMENT OF ZOOLOGY, UNIVERSITY OF OXFORD

FRIDAY, 8TH JUNE 2018 INSTITUTO DE TECNOLOGIA QUIMICA E BIOLOGICA (ITQB NOVA), OEIRAS, PORTUGAL

slide-2
SLIDE 2

MY PATH / PAST EXPERIENCE

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

2

BSc Software Engineering @IST RESEARCH ASSISTANT @IGC (2005-2008) PhD (PDBC programme) 1y courses @IGC (2008) 3y project @Oxford (2009) POSTDOC @ImperialCollege (2013-2014) POSTDOC @Oxford (2014-2018)

DEPARTMENTAL RESEARCH LECTURER ON INFECTIOUS DISEASE

@Oxford (2018-2022) DENV, CHIKV, YFV, ZIKV FluA, HIV, HCV, HBV Pneumococcus DENV

slide-3
SLIDE 3

THE PNEUMOCOCCUS

slide-4
SLIDE 4

STREPTOCOCCUS PNEUMONIAE

4

Bacteria → also known as the pneumococcus → gram-positive bacterial pathogen → usually found in pairs (diplococci) and does not form spores → presents high levels of recombination and horizontal gene transfer Disease & carriage → commonly carried asymptomatically (nasopharynx) → can cause invasive disease: e.g. pneumonia, meningitis Serotypes → cell capsule dictates antigenic type (serotype) → there are +100 known serotypes → 10 to 15 serotypes are classically responsible for carriage and disease

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-5
SLIDE 5

THE CAPSULE & VACCINE

5

Pneumococcal conjugate vaccine (PCV) → contains the capsule sugars → PCV7: 4, 6B, 9V, 14, 18C, 19F and 23F → PCV13: PCV7 + 1, 3, 5, 6A, 7F and 19A PspA CbpA LytB LytA

polysaccharide capsule

Engholm DH et al. FEMS Microbiology Reviews. 2017 POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-6
SLIDE 6

EPIDEMIOLOGICAL POPULATION STRUCTURE

6

pre-vaccination → disease is dominated by a small subset of serotypes (PCV7) → majority of serotypes circulate at low frequency post-vaccination → disease of vaccine types is reduced → disease of non-vaccine types is increased

Waight PA et al. Lancet ID 2015 (England and Wales) POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-7
SLIDE 7

GENETIC POPULATION STRUCTURE

7 Azarian T et al. PLoS Pathogens 2018 Croucher et al. Nature Genetics 2013 23F, B, A

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-8
SLIDE 8

8

Chaos, Persistence, and Evolution of Strain Structure in Antigenically Diverse Infectious

  • Agents. Science (1998).

Role of selection in the emergence of lineages and the evolution of virulence in Neisseria

  • meningitidis. PNAS (2008).

Long-term evolution of antigen repertoires among carried meningococci. Proc. Roy. Soc. B

(2010).

Lineage structure of Streptococcus pneumoniae may be driven by immune selection on the groEL heat-shock protein. Nature Scientific Reports (2017). Identifying Streptococcus pneumoniae genes associated with invasive disease using pangenome-based whole genome sequence typing. Under review (2018). Vaccination can drive an increase in frequencies of antibiotic resistance among nonvaccine serotypes of Streptococcus pneumoniae. PNAS (2018). Vaccination Drives Changes in Metabolic and Virulence Profiles of Streptococcus pneumoniae.

PLoS Pathogens (2015).

High prevalence of vaccine serotype Streptococcus pneumoniae carriage six years after 13-valent pneumococcal vaccine introduction in Malawi: a prospective serial cross-sectional

  • study. Under review (2018).

vaccine response epidemiological population structure genetic population structure

OUR GROUP'S RESEARCH HISTORY

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-9
SLIDE 9

RESEARCH ON GENETIC POPULATION STRUCTURE

slide-10
SLIDE 10

IN SEARCH OF POPULATION STRUCTURE

What determines genetic population structure? → which genes determine phylogenetic branching? → which genes determine Sequence cluster (SC)? → which genes determine Serotype (Sero)? → are genes that determine SC the same as those that determine Sero? Dataset → 616 genomes of S. pneumo, Massachusetts (USA, 2001-2007) → full genomes with 2135 genes per sample → known SC and Sero per sample Approach → whole-genome multi-locus sequence typing approach (wgMLST) → machine learning to explore the determinants of SC and Sero

10 Croucher et al. Nature Genetics 2013

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-11
SLIDE 11

WHOLE-GENOME MULTI-LOCUS SEQUENCE TYPING

11 reference genome: ATCC700669 serotype 23F

gene A alleles of gene A

  • 1. find gene in reference genome
  • 2. compare gene's sequence to reference gene's sequence

discretize and collapse gene's sequence into allele (integers)

1

=

1

=

2

≠ → → →

for all genes

2135 genes

sample 1 sample N

.. .

1 2 1 gene A 3 1 1 1 3 4 1 1 1 1 1 1 3 7 8 1 1 1

.. . .. . .. . .. . .. . .. .

  • ther genes

sample 1 sample N

.. .

allelic matrix

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-12
SLIDE 12

MACHINE LEARNING

→ Random Forest Algorithm (RFA) - is a collection of Classification Trees (CT) → We set predictor variables (genes) to classify a variable of interest (SC, Sero) Critical outputs → how well it can classify (predict) variables of interest → scores predictor variables according to their importance in getting classification right

12

1 2 1 3 1 1 1 3 4 1 1 1 1 1 1 3 7 8 1 1 1

... ... ... ... ... ...

sample 1 sample N

...

2135 predictors variables of interest

10 2 4

6A 19A 19A

SC Sero

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-13
SLIDE 13

CLASSIFICATION TREE - CLASSIC EXAMPLE

13

  • utlook

humidity windy golf ok? rainy high false no rainy high true no

  • vercast

high false yes sunny high false yes sunny normal false yes sunny normal true no

  • vercast

normal true yes rainy high false no rainy normal false yes sunny normal false yes rainy normal true yes

  • vercast

high true yes .... .... .... ....

many more samples

OUTLOOK

sunny

  • vercast

rainy

WINDY HUMIDITY

false true high normal

yes yes no no yes go play golf today? importance of predictor

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-14
SLIDE 14

RANDOM FOREST - TOY EXAMPLE

14

  • utlook
humidity windy golf ok? rainy high false no rainy high true no
  • vercast
high false yes sunny high false yes sunny normal false yes sunny normal true no
  • vercast
normal true yes rainy high false no rainy normal false yes sunny normal false yes rainy normal true yes
  • vercast
high true yes .... .... .... ....

(... )

bootstra p solutions assembled

→ prediction accuracy → error rates (i.e. false positives) → predictor variable importance → etc. RF increases prediction accuracy and allows for robust error estimations, since the ensemble of slight different classification results adjusts for the instability of individual trees and avoids data overfitting.

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-15
SLIDE 15

SEROTYPE PREDICTION SUCCESS

15

Good classification: lower success for some serotypes was related to varying samples sizes (e.g. N=8 for 16F, N=5 for 17F, N=1 for 21)

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-16
SLIDE 16

GENES PREDICTING SEROTYPE

16

top-scoring genes: → 38% of top genes were placed within 10 genes downstream and upstream of the capsular locus → 62% with compelling support for functional background that mediates competitive interactions or niche specialization

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-17
SLIDE 17

SEQUENCE CLUSTER PREDICTION SUCCESS

17

Perfect classification.

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-18
SLIDE 18

GENES PREDICTING SEQUENCE CLUSTER

18

top-scoring genes: → 75% were randomly distributed along the genome (expectation) → 10 genes were contiguous and within the groESL operon (p-value=1.52×10−06)

Croucher et al. Nature Genetics 2013

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-19
SLIDE 19

DISCORDANT GENES PREDICTING SERO & SC

19

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-20
SLIDE 20

IMPLICATIONS / CONCLUSIONS (SO FAR)

20

major lineages are determined by variation in the groESL operon (and have been determined

  • r locked long ago)

minor lineages are determined by variation in and around the capsular locus (and have been determined more recently and are known to be in constant flux)

Croucher et al. Nature Genetics 2013

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-21
SLIDE 21

GROESL?

slide-22
SLIDE 22

GROESL OPERON

22

Operon → encodes for groEL (chaperonin) and groES (cofactor) groEL / groES protein complex → 'a nano-cage for protein folding' most of what we know is from E. coli → groEL is a heat-shock protein (stress protein) → at least 50 essential proteins need groESL for folding → groESL is required for cell viability

Hayer-Hartl M et al. Trends in Biochemical Sciences 2016

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-23
SLIDE 23

23 Wong P et al. Journal of structural biology 2004

groEL is present across bacterial species → only exceptions found are 2 species of mycoplasma (intracellular) groEL is an homolog of HSP60 → HSP60 is present across the kingdoms, including, plants and vertebrates groEL and HSP60 are functionally, structurally and genetically similar

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

GROEL IS UNIVERSAL

slide-24
SLIDE 24

FUNCTIONS / LOCALISATION / RESPONSES

24

groEL on the surface of the cell → promotes attachment to epithelial cells of various tissues → various types of shock / stress enhance expression and surface localization → in animal models, increased surface localization can lead to more severe disease (...) Free-form groEL → intracellular bacteria overexport upon macrophage infection → LOX-1 is the macrophage receptor (recognition induces overexpression of LOX-1) → is a potent signal for pro-inflamatory responses (...) Responses (vertebrates) → groEL is highly immunogenic (as are other HSPs) → Immune responses (cell, humoral) to groEL variants can be highly specific → Has been proposed as vaccine candidate for pneumococcus

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-25
SLIDE 25

WRAPPING UP KEY TAKE-HOME MESSAGES

slide-26
SLIDE 26

RECAPITULATE RESULTS

26

capsular locus → determines 'serotype' → under immune selection → locally epistatically linked groESL operon → determines 'lineage' → is under immune selection → necessary for folding of essential proteins implies globally epistatically linked

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-27
SLIDE 27

IMPLICATIONS FOR VACCINE / CONTROL

27

PCV-like vaccine with 2 variants → targets particularly successful serotypes → allows for less successful serotypes to expand in the post-vaccination era groEL-based vaccine with 2 variants → targets major lineages → does not allow for less successful serotypes to expand in the post-vaccination era

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-28
SLIDE 28

TAKE-HOME MESSAGES

28

What determines (universal) genetic population structure?

Selection acts on multiple loci: such as groESL creating major lineages (Semi-) universal patterns of major & minor lineages emerge in whole-sequence data Strong selection on groEL and the epistatic dependency of the rest of the genome suggest a vaccine would be a promising alternative to vaccines targetting the CPS Selection acts on multiple loci: such as CPS creating minor lineages Patterns (major lineages) have locked in early, while others (minor lineages) are recent & in continuous flux.

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

slide-29
SLIDE 29

29

People involved

Sunetra Gupta Uri Obolski Martin C.J. Maiden Eleanor R. Watkins Samuel J. Peacock Callum Morris Neil French Robert S Heyderman Todd D. Swarthout Andrea Gori

Special thanks

Angela Brueggemann Andries van Tonder Richard Moxon EEID website: www.eeid.ox.ac.uk EEID on Twitter: @EEID_oxford JL on Twitter: @LourencoJML

Group contacts Institutions & Programmes

Malawi-Liverpool-Wellcome Trust Clinical Research Programme Liverpool School of Tropical Medicine University College London University of Oxford University of Liverpool

REFERENCES, PEOPLE, INSTITUTIONS

POPULATION STRUCTURE OF THE PNEUMOCOCCUS

Funding agencies