Inference Inference of human
- f human
transcription transcription regulatory networks regulatory networks using using deep sequencing deep sequencing data data
Erik van Nimwegen
Biozentrum, University of Basel, and Swiss Institute of Bioinformatics
Inference of human of human Inference transcription regulatory - - PowerPoint PPT Presentation
Inference of human of human Inference transcription regulatory networks regulatory networks transcription using deep sequencing deep sequencing data data using Erik van Nimwegen Biozentrum, University of Basel, and Swiss Institute of
Biozentrum, University of Basel, and Swiss Institute of Bioinformatics
developmental processes, etc.
mapping from TF binding configurations to effects on expression. Ultimately we would like to be able to predict the expression dynamics of all genes essentially just from their DNA sequences
Gene expression data (microarray)
clustering
Regulatory “modules” Pathways/ Functional categories Regulatory motifs TF expression profiles
Association Over- representation Correlation
Examples: Segal et al. Nat. Genet 2003 Beer and Tavazoie Cell 2004
Benefits:
i.e. cohorts of co-regulated genes in the process/condition under study.
with the modules.
Disadvantages:
are often unclear.
but just classified.
chIP-chip chIP-seq Genome-wide binding targets Examples: Boyer et al. Cell 2005 Jakobsen et al. Genes & Dev. 2007
Benefits:
Disadvantages:
TF knock-down (e.g. siRNA) Downstream targets Examples: Davidson et al Science 2002 Imai et al. Science 2006
Benefits:
Disadvantages:
Develop a computational frame-work that:
(developmental time course, response to perturbations, disease versus healthy tissue).
and the response coefficients of each gene to each transcription factor:
fs f gf g s gs
Expression of gene g in sample s Basal gene expression Response of gene g to factor f. Activity of factor f in sample s
are not too large.
Review: Bussemaker et al. Annu Rev Biophys Biomol Struct 2007
and the response coefficients of each gene to each transcription factor:
fs f gf g s gs
Response of gene g to factor f.
We use DNA sequence analysis to predict transcription factor binding sites and estimate response coefficients in human genome-wide.
Challenge:
gene.
sites occurs near TSS. (Nature. 447:799-816 2007 )
However, We have a technology for mapping TSSs and their expression genome-wide.
Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Shiraki et el. PNAS 23 15776-81 (2003) Tag-based approaches for transcriptome research and genome annotation Harbers M, Carninci P. Nat Methods 2 495-502 (2005) Tagging mammalian transcriptome complexity
Trends Genet 22 501-10 (2006)
454/Solexa sequencing. Mapping to the genome.
Number of samples with > 105 tags 56 Total number of mapped CAGE tags 25,469,648 Number of unique TSS positions 3,006,003 For any given sample the distribution of tags per TSS is a power-law: The vast majority of TSSs have very low expression: `background transcription’. The distribution can be used to normalize CAGE-tag counts across samples.
( )
n t n x t x t P 1 2 1 ) log( 2 1 exp ) , | (
2 2 2
+ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + − − = σ π σ σ
x = true log-expression (per million). n = raw number of tags. t = normalized number of tags. σ2 = variance of the multiplicative noise. Measure distribution of observed z-values for replicates.
2 1 2 2 1
1 1 2 ) log( ) log( n n t t z + + − = σ
Expression noise can be modeled as multiplicative noise, followed by Poisson sampling.
z
Observed and predicted replicate noise
Time course Known transcripts What is a promoter? Answer: A set of neighboring TSSs whose expression-profile is indistinguishable up to noise. We also cluster nearby promoters into promoter regions. Number of promoter regions 43,164 Number of promoters 74,273 Number of TSSs in promoters 860,823 Total number of TSSs 3,006,003 Human promoterome
Input:
IRF7 E2F REST GATA2/4
CATTCGCAGTGGCAAGGGACTGCCCTGGTCCCTGTGGAGC—GTCCCATTCGGTGACTTCCCACCAGCCCTTCCCCAGCGCCTCTGGAGGTCCAGACTGTCAGGTTGGAGCCTGGG CATTCACAGTGGCAAGGGTCCGCCCTGGTCCCTGTGGAGG--GTCCCAGTCGGTGACTTCCCGCCAGCCCTTCCCCAGTGCCTCTGGAGGTC--GACTGTC-GGTTGGAGCCTGG GAGGGGCGG---CTCGGGAGG---------CCTGCGGACC--GGGCGAG-CGGGGGCG-GCG----GGGCGGCGGGGGAGCCGGGCGGGGGCC------TGCGGTCGG-GCCTGG GATTGGCCGCGGCCAAGGACCCC-----TCCCTGGGGAGC--GTCCGGGTCGGAGACT-CCCACTTGCCCTTCTCCAGCACCTCGTGAAGTCCGGACTGTACGGTTTG-GACTCG TATCTACAACAGCAAG-GA--------GTC--TG-GAAGCAAGTCCAAGT-GATGGA-TACAGCCATCACTTACC--GGGCCTCTGCTGGTCGTGACTT----------------
Human Rhesus macaque Cow Dog Mouse
Scer AAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATC-GAAACATACATAA--GTTGATATTC-CTTTGATATCG-----ACGACTA Spar AAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATC-GAAACATACATAA--ATTGATATTC-CTTTAGCTTTT----AAAGACTA Smik GAAAAACGAAAAATTCATG-GAAAAGAGTCAACCGTC-GAAACATACATAA--ACCGATATTT-CTTTAGCTTTCGACAAAAATCTG Sbay GAAAAATAAAAAGTGATTG-GAAAAGAGTCAGATCTCCAAAACATACATAATAACAGGTTTTTACATTAGCTTTT----GAAAACTA
l n
F −
) , | (
] , [
T w S P
l l n−
Scer AAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATC-GAAACATACATAA--GTTGATATTC-CTTTGATATCG-----ACGACTA Spar AAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATC-GAAACATACATAA--ATTGATATTC-CTTTAGCTTTT----AAAGACTA Smik GAAAAACGAAAAATTCATG-GAAAAGAGTCAACCGTC-GAAACATACATAA--ACCGATATTT-CTTTAGCTTTCGACAAAAATCTG Sbay GAAAAATAAAAAGTGATTG-GAAAAGAGTCAGATCTCCAAAACATACATAATAACAGGTTTTTACATTAGCTTTT----GAAAACTA
1 − n
F
) , | ( T b S P
n
Scer AAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATC-GAAACATACATAA--GTTGATATTC-CTTTGATATCG-----ACGACTA Spar AAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATC-GAAACATACATAA--ATTGATATTC-CTTTAGCTTTT----AAAGACTA Smik GAAAAACGAAAAATTCATG-GAAAAGAGTCAACCGTC-GAAACATACATAA--ACCGATATTT-CTTTAGCTTTCGACAAAAATCTG Sbay GAAAAATAAAAAGTGATTG-GAAAAGAGTCAGATCTCCAAAACATACATAATAACAGGTTTTTACATTAGCTTTT----GAAAACTA
l n
F −
dw w P T w S P
l l n
) ( ) , | (
] , [
−
MotEvo: van Nimwegen, E. BMC Bioinf 8 Suppl 6, S4 (2007) MONKEY: Moses, A.M., Chiang, D.Y., Pollard, D.A., Iyer, V.N. & Eisen, M.B. Genome Biol 5, R98 (2004).
TBP NF-Y CAAT-box YY1 NRF1 SP1 RREB1 E2F Myb Sox17 Foxq1 FOXI1
Example: Predicted TFBSs in the proximal promoter of the SNAI3 TF.
http://www.swissregulon.unibas.ch
For each promoter p and motif m calculate the predicted number of functional sites
pm
ms m pm p s ps
Expression of promoter p in sample s Basal promoter expression Number of functional sites in promoter p for motif m Activity of motif m in sample s
⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − − −
p m s p ms pm ps
c c A N e
2
~
ms ms
*
SVD Fitting activities, minimize:
Similar approach in yeast: Nguyen DH, and P. D'haeseleer
doi:10.1038/msb4100054 Application to human: Das, D., Nahle, Z. & Zhang, M.Q. Mol Syst Biol 2, 2006 0029 (2006).
=
⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ =
S s ms ms m
A A S z
1 2
1 δ
Significance of the motif:
79 human tissues, Affymetrix micro-array 60 cancer cell lines, same Affymetrix micro-array We associate probes with promoters and apply the same analysis to this data set.
Fetal liver Liver Kidney
A known liver-specific factor indeed shows highest activity in liver tissues.
ms ms
*
sample s
Immune tissues Testis samples leukemia
MYB is high in testis. It is also up-regulated in all NCI60 samples.
MYB
ms ms
*
sample s
TFs vary most in activity across these tissues.
* ms
sample s
Fetal thyroid and thyroid
Lung and lung tumors
Monocytes before and after treatment with retinoic acid
Collaboration with Dirk Schubeler, FMI, Basel
epigenetic reprogramming during terminal neuronal differentiation of murine stem cells in vitro
Neuron-specific class III -tubulin
activity profile of the motif.
>
pm
N
Our final predictions of regulatory targets of each motif obey
DNA replication cell cycle cell cycle M phase neurological system process cell communication cell surface receptor linked signal transduction nervous system development neurite morphogenesis generation of neurons cell-cell signaling synaptic transmission neurological system process transmission of nerve impulse synaptic transmission neurological system process developmental process nervous system development DNA binding gene expression RNA processing ribosome biogenesis and assembly
http://www.swissregulon.unibas.ch
Genome browser: Example: Predicted TFBSs in the proximal promoter of the SNAI3 TF. Z-values quantify correlation between motif activity and target expression.
Piotr Balwierz (motif activity inference) Phil Arnold (MotEvo, epigenetic signals) Mikhail Pachkov (SwissRegulon)
Dirk Schübeler
Omics Science Center RIKEN Institute, Yokohama, Japan
Yoshihide Hayashizaki Harukazu Suzuki Piero Carninci Alistair Forrest Carsten Daub
Ippon jime
Gerhard Christofori
Biozentrum