Epigenomic enrichment analysis using Bioconductor EuroBioc 2019 - - PowerPoint PPT Presentation

epigenomic enrichment analysis using bioconductor
SMART_READER_LITE
LIVE PREVIEW

Epigenomic enrichment analysis using Bioconductor EuroBioc 2019 - - PowerPoint PPT Presentation

Epigenomic enrichment analysis using Bioconductor EuroBioc 2019 Brussels Dario Righelli PhD Istituto per le Applicazioni del Calcolo M. Picone CNR - Napoli d.righelli@na.iac.cnr.it || dario.righelli@gmail.com drighelli Whats


slide-1
SLIDE 1

EuroBioc 2019 – Brussels

Dario Righelli – PhD Istituto per le Applicazioni del Calcolo «M. Picone» – CNR - Napoli d.righelli@na.iac.cnr.it || dario.righelli@gmail.com

Epigenomic enrichment analysis using Bioconductor

drighelli

slide-2
SLIDE 2

What’s the aim?

Compare methods and provide guidelines on epigenomic data analysis

slide-3
SLIDE 3

ATAC-seq dataset

Yijing Su et al. 2017 - Nature Neuroscience - Neuronal activity modifies the chromatin accessibility landscape in the adult brain

Before Fear Induction Condition (E0) After Fear Induction Condition (E1)

} }

4 biological replicates 4 biological replicates } Catching differences in open chromatine regions

slide-4
SLIDE 4

ChIP-seq dataset (NULL dataset)

Home Cage Controls - Histon 3, Lysine 9 Acetilation (H3K9ac)

}

9 biological replicates} How many random differences are we able to catch inside a control dataset?

slide-5
SLIDE 5

BWA and Bowtie2 perform the same

  • Most used aligners for

epigenomics data

  • Correlation computed on

ChIP-seq data coverages

  • used DeepTools

plotCorrelation tool

  • Computed correlations on the

coverages of the same samples on BWA and Bowtie2 bams have value of 1.

slide-6
SLIDE 6
  • MACS2 (No Bioconductor)
  • Most used peak caller
  • Broad and Narrow peaks option
  • DEScan2
  • Has a peak detector in R
  • Peak resolution -> bin size
  • Can work with external peaks
  • DiffBind
  • No peak detection
  • Fast on matrix construction
  • Uses external peaks
  • CSAW
  • Starts from BAM files
  • Computes matrix of bins x samples
  • edgeR
  • Widely used method
  • Very flexible in usage

A Bioconductor Approach

Peak Callers DESCan2 MACS2 CSAW Narrow Broad Peak Consensus & Matrices DESCan2 DiffBind CSAW Differential Enrichment edgeR

slide-7
SLIDE 7

Counts Normalization Affects Differentially Accessible Regions (DARs)

  • Pay attention to the normalization

process

  • One tryes to apply a classic RNA-Seq

normalization

  • The process does not always give the

same results

  • Maybe some more specific

normalization is required for this kind of data

ATAC-seq dataset

slide-8
SLIDE 8

16652 11982 7956 7505 6597 6530 4491 976 976 654 599 523 514 344 282

5000 10000 15000

Intersection Size

DiffBindBroad CSAW DiffBindNarrow DEScan_Z10_K4_DARs 20000 40000

Set Size

ATAC−seq DARs

Comparing DARs across methods

  • All the methods have the biggest
  • verlap on the detected peaks
  • CSAW and DiffBind show a big amount
  • f not-overlapping regions
  • DEScan2 shows the lowest number of

not-overlapping regions

  • The big amount of not-overlapping

regions by CSAW and DiffBind suggests a possible high-level of false positive regions detected.

  • Ad-hoc designed UpsetPlot on GRanges
  • Based on findOverlaps method
slide-9
SLIDE 9

Peaks contrasts on NULL dataset show no results

H3K9ac ChIP-seq dataset

  • Compared performances on a null

dataset of ChIP-seq H3K9ac samples

  • Performed 126 permutations of samples
  • Samples are randomly divided in

two groups

  • All the possible permutations on 9

samples (126)

  • All the methods find mostly 0

Differential Enriched Peaks on the random conditions.

  • Sometimes some differences have been

found

  • With and without normalization

2 4 6 8 D E S c a n 2 _ N

  • N
  • r

m D E S c a n 2 _ N

  • r

m D E S 2 _ M 2 _ B r

  • a

_ N

  • N
  • r

m D E S 2 _ M 2 _ B r

  • a

_ N

  • r

m D E S 2 _ M 2 _ N a r r _ N

  • N
  • r

m D E S 2 _ M 2 _ N a r r _ N

  • r

m D i f f B i n d _ N a r r _ N

  • r

m D i f f B i n d _ B r

  • a

d _ N

  • r

m

method nElem normalized

NO YES

slide-10
SLIDE 10

What’s Next?

On-going and future works

slide-11
SLIDE 11

Some comparisons are still needed

  • Compare CSAW on ChIP-seq
  • Compare normalization methods with all epigenomics methods
  • Explore in-silico biological functions of results
  • Testing ATAC—seq Single Cell dataset
slide-12
SLIDE 12
  • Dr. Claudia Angelini – Istituto per le Applicazioni del Calcolo-CNR
  • Dr. Davide Risso – Univeristy of Padua
  • Dr. Lucia Peixoto – Elon S. Floyd College of Medicine, Washington State University
  • Dr. Timothy Triche Jr. - Van Andel Research Institute
  • Dr. Ben Johnson - Van Andel Research Institute
  • Thank you for your Attention!

Acknowledgements

slide-13
SLIDE 13

http://lists.moo.gs/mailman/listinfo/biocmeetup.naples napoli.r.bioc@gmail.com https://www.facebook.com/pg/NapoliRBiocMeetup

Napoli R/Bioconductor Meetup

  • Since Nov 2018
  • R Consortium Array Group
  • At least 25 people any event with a good

turn-over of attendees

  • Eight meetups until now
  • R Package Creation
  • scRNA-seq Analysis
  • Differentially Methylated Regions Analysis
  • Microscope Image Processing
  • Chromosomal Copy Number Changes

Detection

  • Bulk RNA-seq Differential Expression
  • Hi-C data analysis using HiCeekR
  • Metagenomics analysis workflow
slide-14
SLIDE 14

Napoli R/Bioconductor Meetup

  • Part of a wider idea
  • Third city in the World
  • Boston (USA)
  • New York (USA)
  • Napoli (IT)
  • Useful to
  • share ideas and workflows
  • create new collaborations
  • extend bioinfo community
slide-15
SLIDE 15

Is there a best Aligner?

Bowtie2 vs BWA

slide-16
SLIDE 16

Comparing DARs across methods (2)

ATAC-seq dataset

73600 60794 24236 16120 13994 5854 4376 2879 2424 2122 1913 1857 1457 1046722 703 466 410 351 350 327 241 137 100 93 89 9 5 5 3 2 20000 40000 60000 80000

Intersection Size

DEScanMACSNarrow DEScanMACSBroad DiffBindMACSBroad DEScanZ10K4 DiffBindMACSNarrow 25000 50000 75000 100000 125000

Set Size

ATAC−seq Regions Nar/Broa & DEScan2

  • Ad-hoc designed UpsetPlot on Granges
  • Based on findOverlaps method
  • Results description
slide-17
SLIDE 17

Duplicates Removal doesn’t impact peak detection

  • Diagonal Correlations on

counts matrices show that there is no big differences between duplicates and no-duplicates samples

  • rmDup with samtools
  • DEScan2 counts matrices
  • DiffBind counts matrices
−1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1

noDup_E0_1 noDup_E0_2 noDup_E0_3 noDup_E0_4 noDup_E1_1 noDup_E1_2 noDup_E1_3 noDup_E1_4 withDup_E0_1 withDup_E0_2 withDup_E0_3 withDup_E0_4 withDup_E1_1 withDup_E1_2 withDup_E1_3 withDup_E1_4

DEScan2

−1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1

noDup_E0_1 noDup_E0_2 noDup_E0_3 noDup_E0_4 noDup_E1_1 noDup_E1_2 noDup_E1_3 noDup_E1_4 withDup_E0_1 withDup_E0_2 withDup_E0_3 withDup_E0_4 withDup_E1_1 withDup_E1_2 withDup_E1_3 withDup_E1_4

DiffBind

DiffBind

Dup_DEScan2 noDup_DEScan2 Dup_DiffBind noDup_DiffBind Final Peaks with/without Duplicates 10000 20000 30000 40000
slide-18
SLIDE 18

DEScan2 – Differential Enriched Scan 2

  • Filter out the peaks with a score lower than a

user-defined threshold

  • Aligns the peaks over user-defined number of

samples

  • Different thresholds produce different trends

in number of final peaks detected