Genome-wide supervised ChIP-seq peak detection Toby Dylan Hocking - PowerPoint PPT Presentation

Genome-wide supervised ChIP-seq peak detection Toby Dylan Hocking toby.hocking@mail.mcgill.ca joint work with Guillem Rigaill, Paul Fearnhead, Guillaume Bourque 26 Jan 2017

Problem: optimizing ChIP-seq peak detection Segment neighborhood model (constraint on number of peaks) Results on benchmark data (labeled chromosome subsets) Optimal partitioning model (penalize number of peaks) Conclusions and future work

Chromatin immunoprecipitation sequencing (ChIP-seq) Analysis of DNA-protein interactions. Source: “ChIP-sequencing,” Wikipedia.

Problem: find peaks in each of several samples 10 kb Scale hg19 chr11: 118,095,000 118,100,000 118,105,000 118,110,000 118,115,000 118,120,000 118,125,000 UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) AMICA1 MPZL3 AK289390 MPZL2 1 _ Alignability of 100mers by GEM from ENCODE/CRG(Guigo) CRG Align 100 0 _ 36.366 _ McGill0002.MS000201: monocyte, H3K4me3, signal 000201mono.k4me3 0.1254 _ 5.1414 _ McGill0004.MS000401: CD4-positive helper T cell, H3K4me3, signal 000401htc.k4me3 0.1254 _ 13.0345 _ McGill0091.MS009101: B cell, H3K4me3, signal 009101bCell.k4me3 0.0995 _ 7.8597 _ McGill0103.MS010302: B cell, H3K4me3, signal 010302bCell.k4me3 0.1107 _ Grey profiles are normalized aligned read count signals. Black bars are “peaks” called by MACS2 (Zhang et al, 2008): ◮ many false positives. ◮ overlapping peaks have different start/end positions.

Previous work in genomic peak detection ◮ Model-based analysis of ChIP-Seq (MACS), Zhang et al, 2008. ◮ SICER, Zang et al, 2009. ◮ HOMER, Heinz et al, 2010. ◮ CCAT, Xu et al, 2010. ◮ RSEG, Song et al, 2011. ◮ Triform, Kornacker et al, 2012. ◮ Histone modifications in cancer (HMCan), Ashoor et al, 2013. ◮ PeakSeg, Hocking, Rigaill, Bourque, ICML 2015. ◮ PeakSegJoint Hocking and Bourque, arXiv:1506.01286. ◮ ... dozens of others. Two big questions: how to choose the best... ◮ ...algorithm? (testing) ◮ ...parameters? (training)

How to choose parameters of unsupervised peak detectors? 19 parameters for Model-based analysis of ChIP-Seq (MACS), Zhang et al, 2008. [-g GSIZE] [-s TSIZE] [--bw BW] [-m MFOLD MFOLD] [--fix-bimodal] [--nomodel] [--extsize EXTSIZE | --shiftsize SHIFTSIZE] [-q QVALUE | -p PVALUE | -F FOLDENRICHMENT] [--to-large] [--down-sample] [--seed SEED] [--nolambda] [--slocal SMALLLOCAL] [--llocal LARGELOCAL] [--shift-control] [--half-ext] [--broad] [--broad-cutoff BROADCUTOFF] [--call-summits] 10 parameters for Histone modifications in cancer (HMCan), Ashoor et al, 2013. minLength 145 medLength 150 maxLength 155 smallBinLength 50 largeBinLength 100000 pvalueThreshold 0.01 mergeDistance 200 iterationThreshold 5 finalThreshold 0 maxIter 20

Which macs parameter is best for these data?

Compute likelihood/loss of piecewise constant model

Idea: choose the parameter with a lower loss

PeakSeg: search for the peaks with lowest loss Choose the number of peaks via standard penalties (AIC, BIC, ...) or learned penalties based on visual labels (more on this later).

Maximum likelihood Poisson segmentation models ◮ Previous work: unconstrained maximum likelihood mean for s segments ( s − 1 changes), Cleynen et al 2014. ◮ Hocking et al, ICML 2015: PeakSeg constraint enforces up, down, up, down changes (and not up, up, down). ◮ Odd-numbered segments are background noise, even-numbered segments are peaks. ◮ Constrained Dynamic Programming Algorithm, O ( N 2 ) time for N data points.

Genome-wide supervised ChIP-seq peak detection Toby Dylan Hocking - PowerPoint PPT Presentation

Genome-wide supervised ChIP-seq peak detection Toby Dylan Hocking toby.hocking@mail.mcgill.ca joint work with Guillem Rigaill, Paul Fearnhead, Guillaume Bourque 26 Jan 2017 Problem: optimizing ChIP-seq peak detection Segment neighborhood

Jen Grenier Director, TREx Facility Announcements New and Improved Project Submission Form

Importing data Peter Humburg Statistician, Macquarie University DataCamp ChIP-seq Workflows in

Methods for Analyzing ChIP-Seq data Introduction to the ChIP-Seq server at SIB Lausanne Public

ChIP-seq data analysis 04-05-12 Outlook Friday 04-05-12: Next-generation sequencing

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

Scaling normalisation for ChIP-seq with exogenous chromatin Workshop on ChIP-seq data analysis

Introduction to Chromatin IP sequencing (ChIP-seq) data analysis Workshop on ChIP-seq data

Introduction to differential binding Peter Humburg Statistician, Macquarie University DataCamp

The Epigenome Tools 2: ChIP-Seq and Data Analysis Chongzhi Zang zang@virginia.edu

Re-analysis of a CD4 ChIP-Seq data set with csaw Ryan C. Thompson Salomon Lab The Scripps

RNA-seq Data Analysis Introduction to RNA-seq data analysis June, 2018 1 Luigi Grassi < lg

Calibration des Microroc (II) Alex, Cyril, Giom, Jean, Max 09 Mai 2011, Annecy 1 Reminder 2

Genome Reassembly From Fragments 7 January 2019 OSU CSE 1 Genome A genome is the encoding

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational

Genome Sequencing & Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

Introduction to ChIP-seq Joanna Krupka CRUK Summer School in Bioinformatics

PROTEIN EXPRESSION AND PURIFICATION PROTEIN EXPRESSION AND PURIFICATION Why do we decide to

Architectures Prof. Mateo Valero BSC Director Barcelona, July 4, 2018 Professor Tomas Lang

Speed Improvements in pqR: Current Status and Future Plans Radford M. Neal, University of Toronto

Wait, IPython can do that?! Sebastian Witowski $ whoami Python consultant and trainer

1 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 2 3 4 Where is everyone located? Why is peer

SOLUTIONS An translator ( a.k.a. compiler) wriDen in the implementaEon language reads a program

ESPResSo under the hood Axel Arnold Institute for Computational Physics Universit at

transplantation-related viral infections Pier Giulio Conaldi Unit of Regenerative Medicine and

Genome-wide supervised ChIP-seq peak detection Toby Dylan Hocking - PowerPoint PPT Presentation

Genome-wide supervised ChIP-seq peak detection Toby Dylan Hocking toby.hocking@mail.mcgill.ca joint work with Guillem Rigaill, Paul Fearnhead, Guillaume Bourque 26 Jan 2017 Problem: optimizing ChIP-seq peak detection Segment neighborhood

Jen Grenier Director, TREx Facility Announcements New and Improved Project Submission Form

Importing data Peter Humburg Statistician, Macquarie University DataCamp ChIP-seq Workflows in

Methods for Analyzing ChIP-Seq data Introduction to the ChIP-Seq server at SIB Lausanne Public

ChIP-seq data analysis 04-05-12 Outlook Friday 04-05-12: Next-generation sequencing

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

Scaling normalisation for ChIP-seq with exogenous chromatin Workshop on ChIP-seq data analysis

Introduction to Chromatin IP sequencing (ChIP-seq) data analysis Workshop on ChIP-seq data

Introduction to differential binding Peter Humburg Statistician, Macquarie University DataCamp

The Epigenome Tools 2: ChIP-Seq and Data Analysis Chongzhi Zang zang@virginia.edu

Re-analysis of a CD4 ChIP-Seq data set with csaw Ryan C. Thompson Salomon Lab The Scripps

RNA-seq Data Analysis Introduction to RNA-seq data analysis June, 2018 1 Luigi Grassi &lt; lg

Calibration des Microroc (II) Alex, Cyril, Giom, Jean, Max 09 Mai 2011, Annecy 1 Reminder 2

Genome Reassembly From Fragments 7 January 2019 OSU CSE 1 Genome A genome is the encoding

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics &amp; Computational

Genome Sequencing &amp; Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

Introduction to ChIP-seq Joanna Krupka CRUK Summer School in Bioinformatics

PROTEIN EXPRESSION AND PURIFICATION PROTEIN EXPRESSION AND PURIFICATION Why do we decide to

Architectures Prof. Mateo Valero BSC Director Barcelona, July 4, 2018 Professor Tomas Lang

Speed Improvements in pqR: Current Status and Future Plans Radford M. Neal, University of Toronto

Wait, IPython can do that?! Sebastian Witowski $ whoami Python consultant and trainer

1 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 2 3 4 Where is everyone located? Why is peer

SOLUTIONS An translator ( a.k.a. compiler) wriDen in the implementaEon language reads a program

ESPResSo under the hood Axel Arnold Institute for Computational Physics Universit at

transplantation-related viral infections Pier Giulio Conaldi Unit of Regenerative Medicine and

RNA-seq Data Analysis Introduction to RNA-seq data analysis June, 2018 1 Luigi Grassi < lg

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational

Genome Sequencing & Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference