Genome-wide Survey of Mixed MicroRNA / Transcription Factor - - PowerPoint PPT Presentation
Genome-wide Survey of Mixed MicroRNA / Transcription Factor - - PowerPoint PPT Presentation
Genome-wide Survey of Mixed MicroRNA / Transcription Factor Feed-Forward Regulatory Circuits in Human Davide Cor University of Torino and INFN cora@to.infn.it Transcription Factors and miRNAs Regulation of gene expression mainly
Transcription Factors and miRNAs
Wassermann, Nat. Rev. Genetics
Transcription Factors (TFs): proteins binding to specific recognition motifs (TFBSs) usually short (5-10 bp) and located upstream of the coding region
- f the regulated gene.
MicroRNAs (miRNAs) are a family of small RNAs (typically 21 - 25 nucleotide long) that negatively regulate gene expression at the posttranscriptional level, (usually) thanks to the “seed” region in 3’-UTR regions.
- Regulation of gene expression mainly mediated by:
He L. , Hannon GJ.
- Nat. Rev. Genetics
Regulatory Networks 1
Key 1 --> TFs are themselves proteins produced by other genes, and they act
in a combinatorial way, resulting in a complex network of interactions between genes and their products.
- -> Transcriptional Network
miRNAs they also act in a combinatorial and one-to-many way, and, moreover, are transcribed from POL-II promotes.
- -> Post-Transcriptional Network
Gene E Gene F miRNA X Protein E
Regulatory Networks 2
Key 2 --> difficult to understand the whole regulatory network ….
Biological functions are performed by groups of genes which act in an interdependent and synergic way. A complex network can be divided into simpler, distinct regulatory patterns called network motifs, typically composed by 3 or 4 interacting components which are able to perform elementary signal processing functions.
TF miRNA target gene
. . . .
Our Project
Several methods exist to elucidate TF-related and microRNA-related regulatory networks, but comparable information is lacking to explicitly connect them.
- We conducted an investigation aimed at the systematic integration of
transcriptional and post-transcriptional regulatory interactions.
- We inferred and than combined the two networks looking in particular for
Mixed Feed-Forward Regulatory Loops
- -> a network motif in which a master Transcription Factor (TF) regulates
a miRNA and together with it a set of Joint Target coding genes.
TF
Joint Target
miR
Hornstein E, Shomron N, Nat Genet 38 Suppl:S20–4 (2006).
Mainstream
Infer the Mixed FFL network motifs, on the human genome, using
- nly genome sequence and functional annotations.
- 1. TFs act on the promoter region of protein coding genes.
- 2. The same TFs act on the promoter region of miRNA genes.
- 3. miRNAs act on the 3’-UTR region of protein-coding genes.
through cis-binding DNA/RNA sites Through an ab-initio genome-wide sequence analysis … and investigate their properties
Pipeline
human 3’-UTR exons non-redundant set of full length 3’-UTRs
(protein-coding genes)
Oligo analysis Oligo analysis sets of human genes
conserved
- verrepresentation
mouse promoters
Mixed Feed-Forward regulatory Loops
regulatory oligos in human promoters and 3’-UTRs
human core promoters non-redundant set of human core promoters
- 900 / +100 around TSS
(protein-coding + miRNA genes)
conserved
- verrepresentation
mouse 3’-UTRs sets of human genes
Gene Ontology relevance to cancer
external annotations
Dataset
Promoter and 3’-UTRs definition: miRNAs: • clustering of the pre-miRNAs into Transcriptional Units by
- mirATLAS. For each TU, retain only the 5’-most.
- pre-miRNAs genome positions according reference genes.
Non_genic --> its own core promoter Genic --> Opp_strand: its own core promoter
- -> Same_strand: host gene’s core promoter
miRNA core promoter is -900+100 around nt 1 of pre-miRNA
protein-coding genes: core promoter only KNOWN-KNOWN
max_length transcript, -900+100 around TSS, default RepeatMasked.
protein-coding genes: known 3’UTR full-length regions, only
KNOWN-KNOWN max_length transcript, default RepeatMasked.
- evolutionary constrain: retain only human / mouse conserved one2one
miRNAs and protein-coding genes.
Algorithms
Genome-wide ab-initio oligos analysis:
- binomial probability for overrepresentation
- motifs from 5 to 9
- CG rich/poor regions treatment
- discard overlapping matches
- non redundant dataset as background vs real sequences for signal
- count both strand/single strand (promoters/3’UTRs)
Conserved-overrepresentation:
- Human vs mouse
- Hypergeometric model
- Benjamini-Yekutiely FDR
for both promoters and 3’-UTRs
- motifs validation via Transfac
known TFBSs and known miRNA seeds
Chan et al, PloS Comput. Biol. 2005 Corà et al, BMC Bioinformatics 2005 Corà et al, BMC Bioinformatics 2007
Results
Human Transcriptional Network --> Fixing 0.1 as FDR level, we obtained a catalogue of 2031 oligos that can be associated to known TFBSs for a total of 115 different TFs.
- -> target a total of 21159 genes
(20972 protein-coding and 187 miRNAs) Human Post-Transcriptional Network --> Fixing 0.1 as FDR level, we obtained a catalogue of 3989 oligos (7-mers). 182 of them turned out to match with at least one seed present in 140 mature miRNAs.
- -> target a total of 17266 genes
Human mixed FFLs catalogue --> We were able to obtain a list of 5030 different “single target circuits”, corresponding to 638 “merged circuits”.
- -> involving a total of 2625 joint target genes (JTs),
101 TFs and 133 miRNAs. # of JTs ranged from 1 to 38.
TF
JT 1
miR
JT 2 JT …
Circuits assessment 1: functional analysis
We analyzed each one of the 638 merged circuits looking for an enrichment in Gene Ontology categories in the set of their joint targets. To assess this enrichment we used the standard exact Fisher test with a p-value threshold p < 10-4. we end with a list of 32 merged mixed Feed-Forward Loops (corresponding to 380 single-target FFLs). These circuits involve a total of 344 JT protein-coding genes, 24 TFs and 25 mature miRNAs.
Circuits assessment 1: functional analysis
- -> various aspects of organism differentiation and development
Circuits assessment 2: comparison with external databases
We developed an annotation scheme, based on the existence of additional computational evidences for each circuit link.
Joint Target
TF ECRbase miR
Joint Target
miRBase4; PicTar; TargetScan4.2
miR TF ECRbase, PMID:17447837
Circuits assessment 3: looking for cancer related FFLs
In these last few years it is becoming increasingly clear that miRNAs play a central role in cancer development (e.g. Blattener Mol Syst. Biol. 2008). We filtered our results looking for FFLs containing at least two cancer related miRNA or target gene. Sources: oncomiRs reported in
- Esquela-Kerscher and FJ Slack,
Nat Rev Cancer 2006
- Zhang et al, Dev Biol, 2007
cancer genes reported in
- Cancer Gene Census database.
Example of an interesting circuit 1
MYC hsa- mir-17-5p
E2F1
EDD1 TAF5L HIF1A Q6ZR74 OSBPL10 ACP1 MYNN CENTB5 GDA
MYC|hsa-mir-17-5p|E2F1
The “merged” circuit contains 11 joint targets among which, the E2F1. The FFL involving E2F1 is well known in the literature. It was discussed for the first time in O’Donnell et al. (Nature 2005) and plays a role in the control of cell proliferation, growth and apoptosis. NFAT5 which is known to play a critical role in heart, vasculature, muscle and nervous tissue development. This circuit is experimentally validated !
Example of an interesting circuit 2
HSF2 hsa-let-7f
MYCN ESPL1 PLSCR3 PDCD4 MTO1 FMO2
Cancer related circuit
HSF2|hsa-let-7f|MYCN
HSF2 role in cancer is being elucidated by the observation that it functions as bookmarking factor for heat shock Responsive genes and also for genes that are involved in regulation of cell apoptosis and proliferation. The MYCN oncogene is crucial in neuronal development and its amplification is currently one of the molecular marker adopted in neuroblastoma clinical treatments. The MYC family oncogenes are known to deregulate cell cycle progression, apoptosis and genomic instability. let-7f belongs to the let-7 family of oncomiRs and, in particular, let-7f has been found involved in cell aging and various other aspects of cancer biology. In this case, the interplay in a mixed FFL is novel.
Analysis of the mixed FFLs in term of network motifs
Elementary regulatory circuits (the so called ”network motifs”) were shown to be over-represented in transcriptional networks. (Milo et a., Science 2002, Shen-Orr et., Nat Genetics 2002) In order to quantify the overrepresentation we perfomed various randomization tests.
- Random miRNA promoters and seeds, Z = 8.1
- Edge Switching, Z = 8.4
- Complete node replacement, Z = 9.2
Mixed FFLs are genuine Network motifs
Discussion
The main purpose of this work was to:
- -> systematically investigate connections between transcriptional and
post-transcriptional network interactions, in the human genome.
- -> we designed a bioinformatic pipeline, mainly based on sequence analysis
- f human and mouse genomes, that is able to costruct, in particular, a
catalogue of mixed Feed-Forward Loops (FFLs). The main outcomes of this work are available in a public database:
http://personalpages.to.infn.it/~cora/circuits/index.html
with all data and complete discussion (Web dynamic interface under development).
Several extensions of this work are possible: Include a more realistic gene prototype (e.g. Pesole, Gene 2008) Include pri-miRNA were available and useful (e.g. Saini et al, BMC Genomic 2008) Include other type of mixed network-motifs (e.g. Shalgi et al, PloS Comp Bio 2008) In-deep study of possible biological meaning of mixed network-motifs (e.g. Brosh et al, Mol Syst Bio 2008, Aguda et al, PNAS 2008)
Discussion
TF
Joint Target
miR TF
Joint Target
miR TF
Joint Target
miR TF
Joint Target
miR
type I circuits type II circuits
Possible biological role for mixed TF/miRNA network motifs:
Discussion
Possible biological role for mixed TF/miRNA network motifs:
Brosh et al, Mol Syst Bio 2008
“p53-Repressed miRNAs are involved with E2F in a feed-forward loop promoting proliferation.”
Blattener C, Mol Syst Bio 2008
“junk DNA meets the p53 world”
The transcription factor E2F1, a miRNA cluster (15 miRNAs) and their common target genes appear to form a regulatory feed-forward loop, enhancing cellular proliferation. This FFL is repressed by p53, possibly to promote senescence and suppress cancer progression.
People
D.C. and M. Caselle Dep. of Theoretical Physics
University of Torino
- A. Re CIBIO
University of Trento
- D. Taverna Dep. of Genetics, Biology and
Biochemistry and M.B.C. University of Torino
Thanks to all Torino’s group: Grassi L., Osella M., Bosia C., Bertolino A.
References
- D. Cora’, C. Herrmann, C. Dieterich, F. Di Cunto, P. Provero and M. Caselle
“Ab initio identification of putative human transcription factor binding sites by comparative genomics.” BMC Bioinformatics 2005, 6:110.
- D. Cora’, M. Caselle, F. Di Cunto and P. Provero
“Identification of candidate regulatory sequences in mammalian 3’ -UTRs by statistical analysis of oligonucleotide distributions.” BMC Bioinformatics. 2007 May 24;8:174.
- Hornstein E, Shomron N.
“Canalization of development by microRNAs.” Nat Genet. 2006 Jun;38 Suppl:S20-4.
- Shalgi R, Lieber D, Oren M, Pilpel Y.
“Global and local architecture of the mammalian microRNA-transcription factor regulatory network.” PLoS Comput Biol. 2007 Jul;3(7):e131.
- Brosh R, et al.
“p53-Repressed miRNAs are involved with E2F in a feed-forward loop promoting proliferation.” Mol Syst Biol. 2008;4:229.