SLIDE 1 Gene Regulation, Epigenetics & Databases
Cindy G Boer
Genetic Laboratory Internal Medicine Erasmus MC
SLIDE 2
Congratulations!
A genome-wide significant GWAS hit! (and what to do now?)
SLIDE 3 Annotation of genetic loci
Where is your SNP? & What could it do?
1. Coding or in non-coding DNA 2. In a gene body or in an intergenic region ? 3. In a regulatory region?
– Promoters, enhancers, inhibitors, insulators, transcription factor binding sites etc.
4. Causal gene & Mechanism?
Causal Variant, Causal Gene, Causal cell type
SLIDE 4 Linkage Disequilibrium (LD)
- Association between disease trait and (tag)
SNP
– Array designed on LD structure not functional SNP
- Identification of Causal variant?
- LD structure plotted
- SNPs high LD
- (r2 >0.8 or r2 > 0.6)
Castaño Betancourt, et al.,(2016), PLOS genetics
SLIDE 5 Genome-wide association signal
(Best case scenario)
Top SNP (+SNPs LD >0.8) is located in the coding sequence of a gene
- Synonymous? Or Non-Synonymous?
- Gene? What is known, what does it do?
– Damaging effect of the hit? (first part of the practical)
SLIDE 6
Genome-wide association signal
(Realistic scenario)
Most GWAS findings are located in non-coding regions of the genome [M.T. Maurano et al., Science, 337, 1190 (2012)] – Introns or intergenic – ~ 98.5% human genome is non-coding
Difficult to link SNP phenotype
SLIDE 7 Regulatory elements
GWAS SNPs are enriched for regulatory elements. Regulatory regions
Promoters, enhancers, inhibitors, insulators, transcription factor binding sites etc. 1. What is a regulatory region/how is a regulatory region defined? 2. How will you know if your hit is located in a regulatory region?
[M.T. Maurano et al., Science, 337, 1190 (2012)]
SLIDE 8 Gene Expression
- Promoter : region of DNA that initiates transcription of a gene
- Enhancer : short region of DNA that increases/helps initiate the transcription of a
gene.
- Inhibitor : short region of DNA that decreases/inhibits the transcription of a gene.
- The regulation/control of gene expression is essential for cell function, survival,
differentiation etc.
Epigenetics = Changes/regulation of gene expression, caused by mechanisms
- ther than DNA sequence variation
Enhancer
SLIDE 9
The Central Dogma (of molecular biology)
Epigenomics: All epigenetic modifications on the genetic material of a cell
SLIDE 10 The Central Dogma
Animals: ~100-150 different cell types “Same Blueprint
How are there different cell types? Epigenetics
SLIDE 11 Epigenetics
“The study of changes in gene expression or cellular phenotype, caused by mechanisms other than changes in the underlying DNA sequence” “Epigenetic mechanisms can control the functions of noncoding sequences of DNA”.
Epigenetics
SLIDE 12
Histones & Chromatin
SLIDE 13
Histones & histone modifications
SLIDE 14 DNA structure & Regulation
DNase hypersensitive regions open chromatin configuration
SLIDE 15
DNA structure & Regulation
SLIDE 16 The Histone Code
Histone code: multiple histone modifications specific unique downstream functions Specific proteins involved in gene control recognize and interrogate the patterns of histone modifications:
- Ex. RNA polymerase II, Transcription factors & DNA binding proteins
– Transcription factor recruitment
– Chromatin shape and function
SLIDE 17 Epigenetics: Histone Code
Inactive Promoter Active Promoter H3K27me3 H3K4me3 [promoter specific] DNA methylation H2A.Z [histone variant] Inactive Enhancer Active Enhancer H3K9me2 H3K4me1 [enhancer specific] DNA methylation H2A.Z [histone variant] Many many (100+) different histone modifications known! very complex!
SLIDE 18 Regulatory regions: Chromatin States
ENCODE/ROADMAP
- “15-state model”
- Histone modifications
- DNAse sites
- TF-binding Sites
Roadmap Epigenomics Consortium, et al., Nature 2015
SLIDE 19
Epigenetics: symphony No. 9
SLIDE 20
DNA binding proteins
DNA-binding proteins: Transcription factors, nucleases, other DN binding proteins Non-specific binding: polymerases, histones Specific binding: Transcription factors, nucleases Specific binding recognition consensus sequence Change in consensus sequence change in DNA binding affinity? change in gene regulation/expression?
SLIDE 21 Consensus sequences
- DNA binding motif: “recognition sequence”
- Found in databases:
– JASPAR database – Integrated in HaploReg (practical)
Can also be affected by methylation! (EWAS)
SLIDE 22
CTCF methylation
CTCF binding is affected by methylation in it’s core sequence
Proper CTCF functioning is essential! “severe dysregulation of CTCF in cancer cells” Mouse mutants CTCF – embryonic lethal
SLIDE 23
SLIDE 24 So Far we have:
Annotation:
- Location (Chr/Bp)
- Coding/non-coding
- DNA regulatory elements
– (and open chromatin sites)
- Transcription factor binding sites
GWAS & EWAS goal
Identify novel targets/genes involved in phenotype X So far only annotation, No (potential) causal gene
SLIDE 25 Gene Regulation
Adapted from: Alberts, Molecular Biology of the Cell 5th Edition, figure 7-44
Typical eukaryotic gene regulation
- Complex 3D looping (CTCF)
- Multiple regulatory regions
- Involvement of multiple transcription factors
- Can be cell type specific
Gene regulation is highly complex!
SLIDE 26 Gene Regulation
- ~1 MB (1000.0000 base pairs) long range
regulation
– Sonic Hedgehog, essential developmental gene
SLIDE 27
SLIDE 28 Circadian rhythm : Epigenetics
- Mammalian circadian clock
- Oscillation of ~ 24h
– Light-dark cycle (melatonin secretion), Feed cycle
- A conserved transcriptional–translational
auto-regulatory loop generates molecular
- scillations of ‘clock genes’ at the cellular level
PARP1- and CTCF-Mediated Interactions between Active and Repressed Chromatin at the Lamina Promote Oscillating Transcription, Zhao et al., 2015 Molecular Cell
SLIDE 29
Complex 3D structure
SLIDE 30 Finding [causal] Genes
Cell type specify is useful & Important:
- Gene expression levels (RNA-seq)
– Predicted promoter activity in cell type – Predicted gene activity (ex active gene transcription mark: H3K36me3)
- Gene expression – Genotype
– eQTL’s! (Thursday lecture/practical) – Also Cell type specific!
SLIDE 31
Causal Genes: Example
Enhancer site (likely) to regulate gene 1 or gene 2?
SLIDE 32 Cell type selection:
- Not in all cases the selection of target tissue will be easy:
– Cell fate – Cell state and Cell type – Complex diseases & phenotypes
Availability of material & data Proxy tissues:
- Same lineage, similar functioning tissue
- (gene of interest) expression vs no expression
- Tools & databases to select target tissue
- GWAS SNPs are enriched for gene regulatory regions….in
target cell type!
SLIDE 33
Phenotype - Alzheimer
Enhancer Marks in Brain? Enhancer Marks in Heart?
SLIDE 34
SLIDE 35
- Central Dogma: DNA- RNA-Protein & gene regulation is
everything!
- DNA regulatory elements: promoter, enhancer, inhibitor
- Epigenetics is cell type specific, think on what cell type is
relevant to you
Go and Annotate your GWAS hit
Genome-wide association signal
SLIDE 36 ..How to Find?
- Where is your hit (SNP) located?
– Chromosome & position – Near or in which genes
– Synonymous/non-synonymous
- Regulatory regions
- 3D structure of the genome
- Candidate gene
– gene function
SLIDE 37
- Online collection of (molecular) biological data
– Structured & Searchable – Publically available – Updated periodically & Cross-referenced
- Literature
- Data from research
SLIDE 38 Biological databases
- Pubmed – Literature database
- Categorized databases: to much to name
– Genomic variation: dbSNP, HapMa .... – Sequence: NCBI RefSeq database, Entrez Nucleotide, miRbase... – Proteins: RCSB protein databand, UniProt, SMART... – Pathways: KEGG, Reactome, STRING... – DNA annotation: ENCODE, ROADMAP epigenetics
- Genome Browsers: genomic database, integrating all
data associated to genome annotation & function.
- Mining Tools: FUMA & HaploReg
SLIDE 39 Genome Browser
- Displaying, viewing and accessing genome
annotation data
– DNA-variation information, epigenetic regulation, transcription, translation, disease information...
- Links to other specialized Databases
SLIDE 40 Difference?
- NCBI, UCSC and EnsEMBL use the same human genome
assembly generated by NCBI
– Release timing and data availability can differ between sites
- NOTE: the version of the genome assembly
– Annotation location and availability will be different between different assemblies
- Own preference which to use
- Practical: mainly UCSC and some forays into other databases,
including NCBI, EnsEMBL & ENCODE
SLIDE 41
Mining Tools
FUMA Functional Mapping and Annotation of Genome-Wide Association Studies
– Monday Practical & Todays practical – Novel Tool!
SLIDE 42 Mining Tools
HaploReg
HaploReg is a tool for exploring annotations of the noncoding genome at variants on haplotype blocks, such as candidate regulatory SNPs at disease-associated loci.
- Mine ENCODE & RADMAP data be careful! Not always up to-date
- r gives clear information!
SLIDE 43
SLIDE 44 Your Research
Play with the tools Lot’s of (useful) information
– Check the outcome – Know the data – References – Hypothesis building only!
Go and get lost...
(and write down where you went)
Your research NEEDS biological databases!
SLIDE 45 The Practical
- UCSC genome browser links to other databases & data
– Ensembl, ENCODE, ROADMAP, HaploREG, FUMA, GTEX………..
I. Beginner database and bioinformatictools (FUMA, UCSC, HaploReg) II. Advanced: adding regulatory data and gene expression data III. More Advanced: Adding 3D chromatin structure to your annotation
Focus on “real life” examples Use for your own research!
SLIDE 46
UCSC Genome Browser
SLIDE 47
UCSC Genome Browser
SLIDE 48
UCSC Genome Browser
SLIDE 49 Hints for the Practical
- Ask us anything (me, Linda & Joost)
- (related to the practical or genetics)
- DNA is LARGE and a 3D molecule
– So check your surroundings! (i.e. zoom out)
more information! more track control!
- GIYF: Google is your friend
- Practical is in 3 parts
– Intro – standard – difficult
& Enjoy (or try to)
SLIDE 50
Questions?