Uni.lu HPC School 2019 PS12: Bioinformatics workflows with - - PowerPoint PPT Presentation

uni lu hpc school 2019
SMART_READER_LITE
LIVE PREVIEW

Uni.lu HPC School 2019 PS12: Bioinformatics workflows with - - PowerPoint PPT Presentation

Uni.lu HPC School 2019 PS12: Bioinformatics workflows with Snakemake and Conda Uni.lu High Performance Computing (HPC) Team S. Peter University of Luxembourg (UL), Luxembourg http://hpc.uni.lu S. Peter & Uni.lu HPC Team (University of


slide-1
SLIDE 1

Uni.lu HPC School 2019

PS12: Bioinformatics workflows with Snakemake and Conda

Uni.lu High Performance Computing (HPC) Team

  • S. Peter

University of Luxembourg (UL), Luxembourg http://hpc.uni.lu

1 / 13

  • S. Peter & Uni.lu HPC Team (University of Luxembourg)

Uni.lu HPC School 2019/ PS12

slide-2
SLIDE 2

Latest versions available on Github: UL HPC tutorials:

https://github.com/ULHPC/tutorials

UL HPC School:

http://hpc.uni.lu/hpc-school/

PS12 tutorial sources:

ulhpc-tutorials.rtfd.io/en/latest/bio/snakemake/ 2 / 13

  • S. Peter & Uni.lu HPC Team (University of Luxembourg)

Uni.lu HPC School 2019/ PS12

slide-3
SLIDE 3

Introduction

Summary

1 Introduction 2 Workflow

3 / 13

  • S. Peter & Uni.lu HPC Team (University of Luxembourg)

Uni.lu HPC School 2019/ PS12

slide-4
SLIDE 4

Introduction

Main Objectives

In this tutorial you will learn how to run a ChIP-seq analysis with the conda package manager and the snakemake workflow engine

  • n the cluster.

4 / 13

  • S. Peter & Uni.lu HPC Team (University of Luxembourg)

Uni.lu HPC School 2019/ PS12

slide-5
SLIDE 5

Introduction

ChIP-seq

ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP- seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. — Wikipedia

5 / 13

  • S. Peter & Uni.lu HPC Team (University of Luxembourg)

Uni.lu HPC School 2019/ PS12

slide-6
SLIDE 6

Introduction

ChIP-seq

6 / 13

  • S. Peter & Uni.lu HPC Team (University of Luxembourg)

Uni.lu HPC School 2019/ PS12

slide-7
SLIDE 7

Introduction

ChIP-seq

By Jkwchui - Cell diagram adapted from LadyOfHats’ Animal Cell diagram. Information based on Illumina data sheet, as well as ChIP and immunoprecipitation articles & references., CC BY-SA 3.0, Link 7 / 13

  • S. Peter & Uni.lu HPC Team (University of Luxembourg)

Uni.lu HPC School 2019/ PS12

slide-8
SLIDE 8

Introduction

Conda

  • pen source package and environment management system

runs on Windows, macOS and Linux quickly installs, runs and updates packages and their dependencies easily creates, saves, loads and switches between environments on your local computer

8 / 13

  • S. Peter & Uni.lu HPC Team (University of Luxembourg)

Uni.lu HPC School 2019/ PS12

slide-9
SLIDE 9

Introduction

Snakemake

create reproducible and scalable data analyses workflows described via human readable, Python based language seamless scaling to server, cluster, grid and cloud environments, without the need to modify workflow definition workflows can entail description of required software, which will be automatically deployed to any execution environment

9 / 13

  • S. Peter & Uni.lu HPC Team (University of Luxembourg)

Uni.lu HPC School 2019/ PS12

slide-10
SLIDE 10

Workflow

Summary

1 Introduction 2 Workflow

10 / 13

  • S. Peter & Uni.lu HPC Team (University of Luxembourg)

Uni.lu HPC School 2019/ PS12

slide-11
SLIDE 11

Workflow

Overview

1 Setup the environment 2 Create snakemake workflow

(a) Mapping (b) Peak calling (c) Generate bigWig files for visualisation (d) Summary rule

3 Cluster configuration for snakemake

(a) Adjust mapping step to run on multiple threads (b) Configure job parameters with cluster.yaml (c) Run snakemake with cluster configuration

4 Inspect results in IGV 5 (Optional) Immediately submit all jobs

11 / 13

  • S. Peter & Uni.lu HPC Team (University of Luxembourg)

Uni.lu HPC School 2019/ PS12

slide-12
SLIDE 12

Workflow

Snakemake workflow

all peak_calling sample: TC1-ST2-D0.12 bigwig sample: TC1-ST2-D0.12_control_lambda bigwig sample: TC1-ST2-D0.12_treat_pileup mapping sample: INPUT-TC1-ST2-D0.12 mapping sample: H3K4-TC1-ST2-D0.12 12 / 13

  • S. Peter & Uni.lu HPC Team (University of Luxembourg)

Uni.lu HPC School 2019/ PS12

slide-13
SLIDE 13

Thank you for your attention...

Questions?

http://hpc.uni.lu High Performance Computing @ uni.lu

  • Prof. Pascal Bouvry
  • Dr. Sebastien Varrette

Valentin Plugaru Sarah Peter Hyacinthe Cartiaux Clement Parisot

  • Dr. Fréderic Pinel
  • Dr. Emmanuel Kieffer

University of Luxembourg, Belval Campus Maison du Nombre, 4th floor 2, avenue de l’Université L-4365 Esch-sur-Alzette mail: hpc@uni.lu

1

Introduction

2

Workflow 13 / 13

  • S. Peter & Uni.lu HPC Team (University of Luxembourg)

Uni.lu HPC School 2019/ PS12