Amino Acids in Immunogenetic Studies Richard M. Single Department of - - PowerPoint PPT Presentation
Amino Acids in Immunogenetic Studies Richard M. Single Department of - - PowerPoint PPT Presentation
Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont Outline HLA background and nomenclature Asymmetric Linkage Disequilibrium (ALD)
- HLA background and nomenclature
- Asymmetric Linkage Disequilibrium (ALD)
– Motivation, Definition & Example
- Amino acid level analyses of HLA disease associations
– SFVT Analysis & Pairwise allele level analyses – Conditional Haplotype analyses & ALD
- Identifying units of selection
– ALD as a tool
Outline
TCR
= peptide fragment
- m
TCR HLA class I HLA class II
T CR = T -cell receptor -m = microglobulin
HLA molecules are cell-surface proteins that present peptide fragments to T-cells
- HLA molecules bind specific sets of peptides (based on structure)
- Any given HLA allele codes to present a subset of available peptides to T-cells
HLA-A * 24 : 02 : 01 : 02 : L
Locus Field 1 (2-Digit) Serological level (where possible) Field 2 (4-Digit) Peptide level (amino acid difference) Field 3 (6-Digit) Nucleotide level [silent] (synonymous substitutions) Field 4 (8-Digit) Intron level (3’ or 5’ polymorphism) Expression N = null L = low S = soluble …
- For most analyses, we want to distinguish among unique peptide sequences,
i.e., 2 fields (“4-digit”) level
- This level of resolution treats alleles with the same peptide sequence for
exons 2 & 3 (class I) or exon 2 (class II) as being equivalent [“binning” alleles]
HLA Allele Nomenclature
HLA Nomenclature and why it matters
- Challenges for HLA data management and analysis
– The HLA genes are very polymorphic; – HLA nomenclature is complicated; – There are multiple ways to generate HLA data; – All common typing systems generate ambiguous data; – There are multiple ways to report alleles and ambiguities; These issues make meta-analyses of HLA data from different sources very difficult.
Extending STREGA to Immunogenomic Studies
- The STrengthening the REporting of Genetic Association studies
(STREGA) statement provides community-based data reporting and analysis standards for genomic disease association studies
- The IDAWG (immunogenomics.org) has proposed an extension of
STREGA: STrengthening the REporting of Immunogenomic Studies (STREIS)
From STREGA to STREIS
Extensions to the STREGA guidelines for immunogenomic data include:
- Describing the system(s) used to store, manage, and validate genotype
and allele data
- Documenting all methods applied to resolve ambiguity
- Defining any codes used to represent ambiguities
- e.g., NMDP codes
- A*0201/0209/0266
= A*02AJEY
- A*0201/0209/0266/0275/0289
= A*02BSFJ
- Describing any binning or combining of alleles into common categories
- e.g., G-codes
- A*0201/ 0209/ 0243N/ 0266/ 0275/ 0283N/ 0289 = “A020101g”
- Avoiding the use of subjective terms (e.g. high-resolution typing), that
may change over time
- Immunology Database and Analysis Portal (www.ImmPort.org)
Developed under the Bioinformatics Integration Support Contract (BISC) for NIH, NIAID, & DAIT (Division of Allergy, Immunology, and Transplantation) – Data validation pipeline – Analysis tools – Standardized ambiguity reduction tools – Data from a large number of immunogenomic studies
- ImmunoGenomics Data Analysis Working Group (www.immunogenomics.org)
(www.IgDAWG.org) An international collaborative group working to … – facilitate the sharing of immunogenomic data (HLA, KIR, etc.) and – foster consistent analysis and interpretation of immunogenomic data
Resources for HLA Data Validation & Analysis
- HLA background and nomenclature
- Asymmetric Linkage Disequilibrium (ALD)
– Motivation, Definition & Example
- Amino acid level analyses of HLA disease associations
– SFVT Analysis & Pairwise allele level analyses – Conditional Haplotype analyses & ALD
- Identifying units of selection
– ALD as a tool
Outline
Asymmetric Linkage Disequilibrium (ALD)
- Standard LD measures give an incomplete description of the correlation of genetic
variation at two loci when there are different numbers of alleles at the loci.
- We developed a pair of conditional asymmetric LD (ALD) measures that more
accurately capture this information.
- For disease association studies, the ALD can help to identify when stratification
analyses can be applied to detect primary disease predisposing genes.
- For evolutionary studies, the ALD can be informative for the study of forces such
as selection acting on individual amino acids, or other loci in high LD.
- For SNP studies, ALD measures can be used for analyses of LD between haplotype
blocks, for SNP–gene LD, and for haplotype block–gene LD.
1 1 I J ij i j i j
D p q D
1 2 1 2
2 2 1 1
2 min( 1 1) min( 1 1)
I J ij i j i j LD n
D p q X N W I J I J
The two most common measures of the strength of LD are: (1) the normalized measure of the individual LD values, namely Dij' = Dij / Dmax (Lewontin 1964); and (2) the correlation coefficient r for bi-allelic data, which is most often reported as r2 = D2 / (pA1 pA2 pB1 pB2). r =1 only when the allelic variations at the two loci show 100% correlation Their multi-allelic extensions are:
Linkage Disequilibrium (LD) Measures
- When there are different numbers of alleles at two loci, the direct correlation property
for the r measure is not retained.
- The asymmetric LD (ALD) measures more accurately reflect covariation at two loci.
- WA/B and WB/A describe variation observed at the 1st locus conditioned on the 2nd
- Example: (two and three alleles at the A and B loci)
f(A1B1) = 0.3, f(A2B2) = 0.5, f(A2B3) = 0.2, Wn = 1, WA/B = 1 and WB/A = 0.73, There is variation at the B locus on haplotypes containing the A2 allele there is not 100% correlation.
- ALD measures indicate that, with appropriate sample size, stratification analyses
could be carried out for some comparisons.
- Wn = 1 could result in passing over these data for conditional analyses.
Asymmetric LD measures: WA/B and WB/A
Standard LD measures D’ and Wn
Standard LD measures (overall D’ & Wn) assume/force symmetry, even though with >2 alleles per locus that is not the case
Data Source: Immport Study#SDY26: Identifying polymorphisms associated with risk for the development of myopericarditis following smallpox vaccine
Asymmetric Linkage Disequilibrium (ALD)
Interpretation: ALD for HLA-DRB1 conditioning on HLA-DQA1 WDRB1 / DQA1 = .58 ALD for HLA-DQA1 conditioning on HLA-DRB1 WDQA1 / DRB1 = .95 The overall variation for DRB1 is relatively high given specific DQA1 alleles. The overall variation for DQA1 is relatively low given specific DRB1 alleles.
ALD row gene conditional on column gene
Asymmetric Linkage Disequilibrium (ALD)
Table 1. Linkage disequilibrium and genetic diversity measures Description Definition of Measuresa
- 1. Single locus homozygosity (F)b
FA = i pAi
2
- 2. Haplotype specific homozygosity
(HSF)c FA/Bj = i (fij / pBj)2
- 3. Overall weighted HSF valuesd
FA/B (and FB/A) FA/B = j (FA/Bj) (pBj) = FA + i j Dij
2 / pBj
- 4. Multi-allelic ALDe squared
WA/B (and WB/A) WA/B
2 = (FA/B−FA) / (1−FA)
Thomson and Single(2014) Genetics
Asymmetric Linkage Disequilibrium (ALD)
Table 1. Linkage disequilibrium and genetic diversity measures Description Definition of Measuresa
- 1. Single locus homozygosity (F)b
FA = i pAi
2
- 2. Haplotype specific homozygosity
(HSF)c FA/Bj = i (fij / pBj)2
- 3. Overall weighted HSF valuesd
FA/B (and FB/A) FA/B = j (FA/Bj) (pBj) = FA + i j Dij
2 / pBj
- 4. Multi-allelic ALDe squared
WA/B (and WB/A) WA/B
2 = (FA/B−FA) / (1−FA)
If both loci are bi-allelic: WA/B
2 = [i j (Dij 2 / pBj)] / (1 − FA) = D2 / (pA1 pA2 pB1 pB2) = r2, since D11= −D12= −D21= D22=D
Thomson and Single(2014) Genetics
Other Conditional Measures of LD
- Other measures of LD that are conditional have been proposed (Nei and Li, 1980;
Chakravarti et al, 1984; Hudson, 1985; Kaplan and Weir, 1992; Guo SW, 1997).
- They measure association between alleles at a marker locus (locus B) and alleles
at a disease locus (locus A).
- They were developed to account for study designs in which individuals are not
randomly sampled from a single population, but where sampling intensity varies within disease categories.
- They are equivalent to Somer’s D statistic defined on the contingency table
relating two categorical variables
- In contrast, our statistic is a population-based measure that does not depend on a
specific patient sampling scheme.
ALD & tag-SNPs in the HLA region
- DeBakker et al. (2006) identified tag-SNPs based on r2 for SNPs with recoded HLA
alleles (recoded as presence/absence of each specific HLA allele)
DeBakker et al. (2006) Nature Genetics
ALD & tag-SNPs in the HLA region
Thomson and Single(2014) Genetics
- HLA background and nomenclature
- Asymmetric Linkage Disequilibrium (ALD)
– Motivation, Definition & Example
- Amino acid level analyses of HLA disease associations
– SFVT Analysis & Pairwise allele level analyses – Conditional Haplotype analyses & ALD
- Identifying units of selection
– ALD as a tool
Outline
Risk Category I I II II II II II III III III III DRB1 *08:01 *11:04 *13:01 *11:01 *01:01 *03:01 *13:02 *04:04 *15:01 *07:01 *04:01 sum total patients 102 57 90 60 74 89 28 7 38 30 21 596 708 controls 13 11 38 36 50 61 23 16 80 65 47 440 546 OR 6.9 4.3 1.9 1.3 1.2 1.1 0.9 0.3 0.3 0.3 0.3
Overall p-value < 2.6E-27
Juvenile Idiopathic Arthritis oligoarticular persistent (JIA-OP) Common HLA-DRB1 alleles
AA 86 implicated via pairwise within serogroup analysis
Sequence Feature Variant Type (SFVT) Analysis - Overview
- An exploratory approach for genetic association studies that uses combinations of
amino acid (AA) residues as the unit of analysis.
- Goal:
– To identify biologically relevant amino acid (AA) residues that account for the major disease risk attributable to HLA
- Genes/proteins are sub-divided into biologically relevant units affecting gene
expression and/or protein function (i.e., Sequence Features) – Polymorphic AAs (single AA sites) – Structural features (e.g., beta 1 domain, alpha-helix 2, …) – Functional features (e.g., peptide binding, T-cell interacting, …) – Combinational (e.g., alpha-helix 2 & peptide binding, …)
www.immport.org
Summary of SFVT Analysis
HLA Typing (Allele-level) Group HLA alleles based
- n structural/ functional
sequence motifs (Sequence Features) Perform disease association tests based sequence motifs (Sequence Feature-level) Choose the top Sequence Features associated with disease risk for further study Identify individual AAs & combinations of AAs directly involved in disease risk
ORs & p-values LD patterns Conditional/ Stratification analyses
Representative Sequence Features: HLA-DRB1
Table from Karp et al. (2010) Hum Molec Genet
Sequence Feature ID Sequence Feature Name Sequence Feature Type Amino Acid Position(s) # of Variant Types HLA-DRB1_SF1 allele Standard Allele Designation NA 497 HLA-DRB1_SF4 mature protein Structural - Complete protein 1..237 52 HLA-DRB1_SF5 beta 1 domain Structural - Domain 1..95 69 HLA-DRB1_SF12 loop between beta-strands 1 & 2 Structural - Secondary structure motif 19, 20, 21, 22 5 HLA-DRB1_SF13 beta-strand 2 Structural - Secondary structure motif 23..32 28 HLA-DRB1_SF21 alpha-helix 2 Structural - Secondary structure motif 65..72 29 HLA-DRB1_SF128 T cell receptor binding Functional 60, 64, 65, 66, 67, 69, 70, 71, 73, 76, 77, 78, 80, 81, 82, 84, 85 81 HLA-DRB1_SF137 peptide antigen binding pocket 7 Functional 28, 30, 47, 61, 67, 71 53 HLA-DRB1_SF163 alpha-helix 2_peptide antigen binding Structural_Functional Combination 67, 70, 71 21 HLA-DRB1_SF164 alpha-helix 2_T cell receptor binding Structural_Functional Combination 65, 66, 67, 69, 70, 71 24
Variant Types for HLA-DRB1_SF153 “beta-strand 2_peptide antigen binding”
… 5 of 11 Variant Types (VTs) for Sequence Feature 153 (SF153)
DRB1_SF153_VT1 (LEC): DRB1*0101, 0102, 0103, 0104, 0105, … DRB1_SF153_VT2 (FEL): DRB1*0113, 0701, 0703, 0704, 0705, … DRB1_SF153_VT3 (YDY): DRB1*0301, 0304, 0305, 0306, 0308, …
Karp et al 2010 Hum Mol Gen
DRB1: AAs 13, 67, 37, 57, 74, 86 in binding pockets 6, 4, 7, and 9
DRB1 Amino Acids p-value ORmax ORmin AA position 13 13 2.00E-28 4.9 0.33 Pocket 6 11, 13, 30 4.00E-28 7.1 0.31 Pocket 4 13, 26, 28, 70, 71, 74, 78 6.00E-28 6.8 0.28 DRB1 allele 9…………………….86 1.00E-27 9.4 0.28 Pocket 7 28, 30, 47, 61, 67, 71 9.00E-27 9.4 0.28 AA positions X-LD [11, 12, 10, 16] 9.00E-25 3.2 0.33 AA position 67 67 3.00E-17 3.4 0.54 Pocket 9 9, 37, 57 4.00E-16 3.9 0.33 AA position 74 74 4.00E-16 6.8 0.33 AA position 37 37 4.00E-13 1.8 0.34 AA position 57 57 6.00E-13 3.9 0.44 …………. …… ……… … …. AA position 86 86 ns 1.1 0.9 AAs underlined have a potential effect on disease risk, the effect of those in italics may be explained by LD with AA 13. Note that AA 86 is NS by SFVT analysis
SFVT analysis DRB1 summary for JIA-OP
SFVT Analysis - Summary
- An exploratory approach for identifying biologically relevant AAs in
HLA association studies
- Pros
– Utilizes information about the inter-relationships among HLA alleles – Covers more extended protein regions than single amino acid-based analyses
- Cons
– Care is needed to address complex patterns of LD among AAs and SFs in
- rder to identify AAs directly involved in disease
– Due to multiple comparisons with highly correlated SFs appropriate p-value adjustments are necessary – The effects of some amino acids (or combinations) may be missed, so complementary analyses are useful
DRB1 Amino Acids 13 and 67 13 - 67 patients controls OR G - F 108 14 6.8 S - F 130 49 2.3 S - I 131 71 1.5 G - I 13 8 1.3 S - L 102 80 1.0 R - I 44 91 0.2
- thers
270 233
p < 8E-9 AA 13 involved
- r an AA in LD
- verall p < 2E-28
Conditional Haplotype Analysis of JIA-OP
DRB1 Amino Acids 13 and 67 13 - 67 patients controls OR G - F 108 14 6.8 S - F 130 49 2.3 S - I 131 71 1.5 G - I 13 8 1.3 S - L 102 80 1.0 R - I 44 91 0.2
- thers
270 233
p < 0.002 AA 67 involved
- r an AA in LD
An extensive set of CH analyses are required, as well as consideration of LD patterns p < 0.001 AA 67 involved
- r an AA in LD
Conditional Haplotype Analysis of JIA-OP
DRB1: AAs 13, 67, 37, 57, 74, 86 in binding pockets 6, 4, 7, and 9
DRB1 Amino Acids p-value ORmax ORmin AA position 13 13 2.00E-28 4.9 0.33 Pocket 6 11, 13, 30 4.00E-28 7.1 0.31 Pocket 4 13, 26, 28, 70, 71, 74, 78 6.00E-28 6.8 0.28 DRB1 allele 9…………………….86 1.00E-27 9.4 0.28 Pocket 7 28, 30, 47, 61, 67, 71 9.00E-27 9.4 0.28 AA positions X-LD [11, 12, 10, 16] 9.00E-25 3.2 0.33 AA position 67 67 3.00E-17 3.4 0.54 Pocket 9 9, 37, 57 4.00E-16 3.9 0.33 AA position 74 74 4.00E-16 6.8 0.33 AA position 37 37 4.00E-13 1.8 0.34 AA position 57 57 6.00E-13 3.9 0.44 …………. …… ……… … …. AA position 86 86 ns 1.1 0.9 AAs underlined have a potential effect on disease risk, the effect of those in italics may be explained by LD with AA 13. Note that AA 86 is NS by SFVT analysis
SFVT analysis DRB1 summary for JIA-OP
LD for DRB1 AAs
Wn JIA controls ALD row gene conditional on column gene Asymmetric LD (ALD) Wn (symmetric)
Conditional Haplotype Analysis of JIA-OP
11_13 Cases Controls OR S-G 121 22 4.89 p<3.6E-06 S-S 363 200 1.81 D-F 9 6 1.15 ns L-F 87 66 1.01 V-H 46 84 0.38 P-R 50 99 0.34 G-Y 30 65 0.33 Total 708 546 12_13 Cases Controls OR T-G 121 22 4.91 p<3.6E-06 T-S 363 200 1.82 K-F 98 76 0.994 K-H 46 84 0.382 p<1.2E-05 K-R 50 99 0.343 K-Y 30 65 0.327 Total 708 546
OR AA position 13 67 74 86 37 57
6.9 DRB1*0801 G F L G Y S 4.3 DRB1*1104 S F A V Y D 1.9 DRB1*1301 S I A V N D 1.3 DRB1*1101 S F A G Y D 1.2 DRB1*0101 F L A G S D 1.1 DRB1*0301 S L R V N D 0.9 DRB1*1302 S I A G N D 0.3 DRB1*0404 H L A V Y D 0.3 DRB1*1501 R I A V S D 0.3 DRB1*0701 Y I Q G F V 0.3 DRB1*0401 H L A G Y D
- These alleles show the strongest evidence for direct involvement in JIA-OP
disease risk
- The 6 identified AA sites uniquely define each allele, preventing further
stratification analyses
Common DRB1 Alleles & AAs in JIA-OP
- HLA background and nomenclature
- Asymmetric Linkage Disequilibrium (ALD)
– Motivation, Definition & Example
- Amino acid level analyses of HLA disease associations
– SFVT Analysis & Pairwise allele level analyses – Conditional Haplotype analyses & ALD
- Identifying units of selection
– ALD as a tool
Outline
- Balancing selection can result from:
- Overdominance/Heterozygote advantage
- Frequency-dependent selection
- Selective regimes that change over time/space
- For HLA, the common factor in these models is rare allele advantage,
which is consistent with a pathogen-directed frequency-dependent selection model.
- At the Amino Acid (AA) level we see
- High AA variability at antigen recognition sites (ARS)
- Relatively even AA frequencies at ARS sites
- Higher rates of non-synonymous vs. synonymous changes at ARS
Balancing Selection Operates at Most HLA Loci
Homozygosity (F) and the Normalized Deviate (Fnd)
0.05 0.1 0.15 0.2 0.25 0.3 allele allele frequency
0.1 0.2 0.3 0.4 0.5 0.6 allele allele frequency
0.02 0.04 0.06 0.08 0.1 0.12 allele allele frequency
Neutrality FOBS ≈ FEQ Fnd ≈ 0 Directional Selection FOBS > FEQ Fnd > 0 Balancing Selection FOBS < FEQ Fnd < 0
2 1 k i i
F p
Fnd = (FOBS - FEQ) / SD(FEQ)
Fnd for DRB1 AA sites in JIA Controls
- Fnd << 0 gives evidence of possible balancing selection.
- Fnd >> 0 gives evidence of possible directional selection.
Fnd for DRB1 AA sites (Meta-Analysis)
Fnd for all polymorphic sites in a meta-analysis of 57 populations
- Fnd << 0 gives evidence of possible balancing selection.
- Fnd >> 0 gives evidence of possible directional selection.
Asymmetric LD : JIA – Controls (Row gene conditional on column gene) Wn : JIA – Controls Asymmetric LD (ALD)
LD for DRB1 AAs
Wn (symmetric)
Acknowledgements
University of Sao Paulo Diogo Meyer University of Graz Wolfgang Helmberg Cincinnati Children’s Hospital Susan Thompson David Glass University of Texas Nishanth Marthandan Paula Guidry David Karp Richard Scheuermann Children's Hospital Oakland Research Inst. Steven J. Mack Jill A. Hollenbach Harvard Medical School Alex Lancaster UC Berkeley Glenys Thomson UC San Francisco Owen Solberg Roche Molecular Systems Henry A. Erlich Anthony Nolan Research Inst. Steven G.E. Marsh Matthew Waller NCBI/NIH Mike Feolo NGIT Jeff Wiser Patrick Dunn Tom Smith
Distributions of Fnd values
Results from a meta-analysis of 497 HLA population studies in ten geographic regions
Solberg et al., 2008
Distributions of Fnd values
- Cano & Fernandez-Vina (2009) described two sequence dimorphisms
that define the primary immunodominant serological epitopes for HLA- DPB1.
- All DPB1 alleles can be divided into four serologic categories (DP1,
DP2, DP3, and DP4):
Evidence of Balancing Selection at HLA-DPB1
Serological Category 56 85 86 87 DP1 A E A V DP2 E G P M DP3 E E A V DP4 A G P M AA position
Global Distribution of DP serological categories
.
Fnd for DPB1 Alleles ( ) & DP Serological Categories ( )
Evidence of Balancing Selection at HLA-DPB1
- We constructed a randomization test (“random binning” to 4
categories) to ensure that the effect was not driven by differences in the observed number of variants at the allele-level vs. serotype-level.
- Randomization tests have confirmed results for European populations
more than in other geographic regions
- A possible ascertainment bias? (many common alleles were first
identified in European populations)
- Could natural selection favoring DPB1 diversity at the serologic
level be greater in Europe?
Evidence of Balancing Selection at HLA-DPB1
Supplementary Figure S1. Mean Fnd values for trios of variant DPB1 Exon 2 amino acid positions
- 1.5
- 1
- 0.5
0.5 1 50 100 150 200 250 300 350
mean Fnd
Amino-Acid Position Trio
mean Fnd values in variable sets of 3 amino-acid positions vs 36/56/85 paired trios
Acknowledgements
University of Sao Paulo Diogo Meyer University of Graz Wolfgang Helmberg Cincinnati Children’s Hospital Susan Thompson David Glass University of Texas Nishanth Marthandan Paula Guidry David Karp Richard Scheuermann Children's Hospital Oakland Research Inst. Steven J. Mack Jill A. Hollenbach Harvard Medical School Alex Lancaster UC Berkeley Glenys Thomson UC San Francisco Owen Solberg Roche Molecular Systems Henry A. Erlich Anthony Nolan Research Inst. Steven G.E. Marsh Matthew Waller NCBI/NIH Mike Feolo NGIT Jeff Wiser Patrick Dunn Tom Smith