Earl Bellinger and Fabio Mendes What are microarrays again? A - - PowerPoint PPT Presentation

earl bellinger and fabio mendes what are microarrays again
SMART_READER_LITE
LIVE PREVIEW

Earl Bellinger and Fabio Mendes What are microarrays again? A - - PowerPoint PPT Presentation

Hangauer MJ, Vaughn IW, McManus MT (2013). Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs . PLoS Genetics 9(6): 1-13 Earl Bellinger and Fabio Mendes What are microarrays


slide-1
SLIDE 1

Hangauer MJ, Vaughn IW, McManus MT (2013). Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs. PLoS Genetics 9(6): 1-13

Earl Bellinger and Fabio Mendes

slide-2
SLIDE 2

What are microarrays again?

A microarray is a 2D array on a solid substrate (usually a glass slide or silicon thin-film cell) that assays large amounts of biological material using high-throughput screening methods. "DNA microarrays are a well-established technology for measuring gene expression levels. Microarrays designed for this purpose use relatively few probes for each gene and are biased toward known and predicted gene structures"

Mockler TC, Chan S, Sundaresan A, Chen H, Jacobsen SE, Ecker JR (2005). Applications of DNA tiling arrays for whole-genome analysis. Genomics 85(1): 1- 15.

slide-3
SLIDE 3

What are microarrays again?

  • Tiling arrays:

"Recently, high-density oligonucleotide-based whole-genome microarrays have emerged as a preferred platform for genomic analysis beyond simple gene expression profiling. Potential uses for such whole-genome arrays include empirical annotation of the transcriptome (...)"

Mockler TC, Chan S, Sundaresan A, Chen H, Jacobsen SE, Ecker JR (2005). Applications of DNA tiling arrays for whole-genome analysis. Genomics 85(1): 1-15.

slide-4
SLIDE 4

Back to the paper...

1. Noncoding DNA (ncRNA, "Junk" DNA); 2. Intergenic DNA can be transcribed; 2.1. Over 80% of the human genome "serves some purpose, biochemically speaking"

The ENCODE Project Consortium (2012). An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414): 57-74.

2.2. "Here, we detail the many logical and methodological transgressions involved in assigning functionality to almost every nucleotide in the human genome. The ENCODE results were predicted by one of its authors to necessitate the rewriting of textbooks. We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten."

Graur D, Zheng Y, Price N, Azevedo RBR, Zufall RA, Elhaik E (2013). On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE. Genome Biology and Evolution.

3. What is the extent of intergenic transcription?

slide-5
SLIDE 5
  • What is the extent of noncoding transcription?

Previous studies:

  • Long intergenic ncRNAs (lincRNAs): intergenic transcripts longer than 200 nucleotides in

length that lack protein coding capacity;

  • There is a limited set of annotated lincRNAs (GENCODE found only 5,000) compared to

expectations from ENCODE;

  • This study:

○ Puts together a "unique set of RNAseq data derived from both novel (6) and published (121) datasets that complements and significantly expands prior efforts". Tiling arrays (problem: repetitive elements) Sequencing-based approaches (problem: just a fraction of the genome; small number of tissues)

slide-6
SLIDE 6
  • This study:

○ Filtered: ■ transcripts overlapping protein coding genes and pseudogenes; ■ transcripts longer than 200 nucleotides with an ORF longer than 100 aminoacids (and transcripts overlapping those); ■ ncRNA genes that were known to be non-lincRNAs; ■ transcripts connected to a protein coding gene by a RNA-seq read (removed transcripts overlapping "extended" gene structures); ■ transcripts with a FPKM < 1 (equivalent to one copy per cell);

  • FPKM: fragments per kilobase of transcript per million (# of fragments /

length of transcript in kb / 106); essentially a measure of transcript abundance. ○ Merged transcripts sharing an exon (to avoid redundancy).

slide-7
SLIDE 7
slide-8
SLIDE 8

RESULTS

  • RNA-seq from the analyzed datasets mapped to 78.9% of the genome;
  • When information from known genes, ESTs and cDNAs was incorporated, 85.2% of the

genome showed evidence of transcription;

  • >94% of the final set of merged lincRNAs consists of de novo assembled transcripts from

RNA-seq data.

slide-9
SLIDE 9

RESULTS

  • Read depth = coverage; base calls = (# of positions at specific depth)(read depth); NM genes

= annotated protein coding genes;

  • Protein coding gene exons have a larger fraction of base calls at high coverage (are

transcribed more) - as expected;

  • Intergenic regions "contain many highly expressed (transcribed) regions".
slide-10
SLIDE 10

RESULTS

  • NR genes = ncRNA genes that were previously annotated;
  • "(...) many regions of highly expression do exist within intergenic regions, far more than are

accounted for by current ncRNA gene annotation".

slide-11
SLIDE 11

What is Epigenetics again?

Epigenetics is the study of heritable changes in gene activity which are not caused by changes in the DNA sequence. (...) such changes are DNA methylation and histone modification. "But recently, it has been shown that chromatin modifications can be regarded as indicators of the transcriptional regulatory function and activity

  • f certain types of genomic loci. With recent technological advances

making it routine to survey chromatin modifications on a large scale, the epigenetics field is rapidly expanding from examining individual genes to all genes to the entirety of the human genome."

Hon GC, Hawkins RD, Ren B (2009). Predictive chromatin signatures in the mammalian genome. Human Molecular Genetics 18(2): R195-201

slide-12
SLIDE 12

What is Epigenetics again?

Hon GC, Hawkins RD, Ren B (2009). Predictive chromatin signatures in the mammalian genome. Human Molecular Genetics 18(2): R195-201

  • H3K4me3 and H3K36me3 are canonical epigenetic marks for activation;
  • H3K27me3 is a canonical epigenetic mark for repression.
slide-13
SLIDE 13

What is Epigenetics again?

  • ChIP sequencing (ChIP-seq):

"ChIP is the most direct way to identify the binding sites of a single DNA-binding protein or the location of modified histones";

Furey TS (2012). ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Nature Review 13: 840-52.

"Methods developed to study DNA methylation include the use of (...) affinity enrichment using antibodies specific to 5-methylcytosine".

Ku CS, Naidoo N, Wu M, et. al (2011). Studying the epigenome using next-generation sequencing. Journal of Medical Genetics 48: 721-30.

slide-14
SLIDE 14

What is Epigenetics again?

  • ChIP-chip

/ ChIP-sequencing (ChIP-seq):

Ku CS, Naidoo N, Wu M, et. al (2011). Studying the epigenome using next-generation sequencing. Journal of Medical Genetics 48: 721-30.

slide-15
SLIDE 15

Back to the paper… RESULTS

  • Chromatin signatures were obtained from

ChIP-seq studies;

  • LincRNAs that were more expressed (FPKM

> 5) were significantly enriched with marks for activation (H3K4me3 and H3K36me3) when compared to lincRNAs that were less expressed (FPKM < 1);

  • Conversely,

lincRNAs that were less expressed were significantly enriched with repressive marks (H3K27me3) when compared to lincRNAs that were more expressed.

slide-16
SLIDE 16

Back to the paper… RESULTS

  • Using unsupervised hierarchical clustering, lincRNAs

are shown to be differentially transcribed in a tissue- specific fashion;

  • "The lincRNAs we describe are specifically regulated

(...), attributes inconsistent with transcriptional noise."

slide-17
SLIDE 17
  • Compared lincRNA FPKM* values in polyA+ specific and polyA−

specific RNA-seq libraries in H9 ESCs and HeLa cells;

  • Analyzed transcripts with RNA-seq reads in all four datasets and with

FPKM>1 in at least one of the two fractions for each cell type: ○ 16,819 NM genes ○ 127 lincRNAs

  • Showed individual lincRNA and NM gene ratios of FPKMs in

polyA+/polyA− fractions;

  • Pearson correlation:

○ lincRNAs = 0.622 (P = 5.5E-15) ○ NM genes = 0.702 (P < 2.2E-16)

  • Determined the maximally conserved 50 bp windows in each NM

gene, lincRNA, and repetitive element (nonconserved control sequences);

  • The maximally conserved 50 bp windows of 12 functional human

lincRNAs are indicated for comparison. *Fragments Per Kilobase of exon per Million fragments mapped

slide-18
SLIDE 18

LincRNAs Are Enriched for Trait- Associated SNPs

*P = 0.0173, **P<2.2E-16

95% binomial proportion confidence interval

  • Roughly 50% of all trait-associated SNPs

(TASs) identified in genome-wide association studies are located in intergenic sequence;

  • Only a small portion are in protein coding

gene exons;

  • Supports the hypothesis of an abundance
  • f

functional elements in intergenic sequence.

slide-19
SLIDE 19

LincRNAs Are Enriched for Trait- Associated SNPs

*P = 0.0173, **P<2.2E-16

95% binomial proportion confidence interval

  • TASs have been identified within or

proximal to noncoding RNAs including some lincRNAs;

  • If lincRNAs are functional, they should be

enriched for TASs compared to nonexpressed intergenic regions;

  • The paper finds that lincRNAs are more

than 5-fold enriched for TASs compared to nonexpressed intergenic regions;

  • Hence many trait-associated intergenic

regions may function by encoding lincRNAs.

slide-20
SLIDE 20

Discussion

  • There has been a recent debate about whether there is pervasive

transcription of the human genome and what the number and abundance

  • f intergenic transcripts is;
  • A key missing component to this debate has been an analysis of ultra

deep RNA-seq data sampling a wide array of tissue types;

  • This paper analyzed a set of RNA-seq data that fulfills requirements of

read depth and tissue breadth, covering both polyadenylated and nonpolyadenylated RNA fractions;

  • In strong agreement with the results from the ENCODE project, this paper
  • bserves that approximately 85% of the genome is transcribed.
slide-21
SLIDE 21

Thanks for listening!

Questions?