Bioinformatics pipeline for revealing tumour heterogeneity Mustafa - - PowerPoint PPT Presentation

bioinformatics pipeline for revealing tumour heterogeneity
SMART_READER_LITE
LIVE PREVIEW

Bioinformatics pipeline for revealing tumour heterogeneity Mustafa - - PowerPoint PPT Presentation

Bioinformatics pipeline for revealing tumour heterogeneity Mustafa Anl Tuncel Department of Biosystems Science and Engineering Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anl Tuncel | 10.07.19 | 1 Mustafa


slide-1
SLIDE 1

| | Department of Biosystems Science and Engineering

Bioinformatics pipeline for revealing tumour heterogeneity

Mustafa Anıl Tuncel

10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 1

slide-2
SLIDE 2

| | Department of Biosystems Science and Engineering

anilbey /in/aniltuncel anilbey

Mustafa Anıl Tuncel

Software Engineer @ ETH Zürich

10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 2

Research interests § Data analysis workflows § Bioinformatics § Machine learning § Recommender systems

slide-3
SLIDE 3

| | Department of Biosystems Science and Engineering

Outline

10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 3

§ Background

§ Biology prior § Single cell sequencing technologies § Mutations on DNA

§ DNA mutation trees

§ Tree model § MCMC moves

§ Pipeline

§ Snakemake § HDF5

slide-4
SLIDE 4

| | Department of Biosystems Science and Engineering

What is a cell?

10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 4

Figure 1. Representation of cell, tissue, organ, system and organism. Retrieved from https://www.colscol.com/body-system/

slide-5
SLIDE 5

| | Department of Biosystems Science and Engineering

DNA from single cells

10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 5

Figure 2. DNA structure. Retrieved from https://www.interleucina.org/

slide-6
SLIDE 6

| | Department of Biosystems Science and Engineering

§ Copy number variations

§ Deletion § Duplication

Structural mutations on DNA

10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 6

§ Mutations from DNA of single cells § Heterogeneous § Have ancestors, children, siblings

slide-7
SLIDE 7

| | Department of Biosystems Science and Engineering

Trees to represent structural mutations

10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 7

DNA:

Region 1 Region 2 Region 3 Region 5 Region 4

root

R1 : +1 R2 : -1, R3:+1 R4 : -1 R5 : +2 R1 : +1 R3 : -1

slide-8
SLIDE 8

| | Department of Biosystems Science and Engineering

Learning the tree

10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 8 root

R1 : +1 R2 : -1, R3:+1 R4 : -1 R5 : +2 R1 : +1 R3 : -1

  • Dirichlet-multinomial model with overdispersion
  • We target maximising the tree posterior with an MCMC

scheme

  • Prune-reattach
  • Label swap
  • Add/remove events
  • Add/remove node
  • Condense/split node
  • Genotype preserving prune-reattach
slide-9
SLIDE 9

| | Department of Biosystems Science and Engineering

Prune-reattach

10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 9 root

R1 : +1 R2 : -1, R3:+1 R4 : -1 R5 : +2 R1 : +1 R3 : -1

root

R1 : +1 R2 : -1, R3:+1 R4 : -1 R5 : +2 R1 : +1 R3 : -1

Before After

slide-10
SLIDE 10

| | Department of Biosystems Science and Engineering

Add / remove node

10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 10 root

R1 : +1 R2 : -1, R3:+1 R4 : -1 R5 : +2 R1 : +1 R3 : -1

Before After

root

R1 : +1 R2 : -1, R3:+1 R4 : -1 R5 : +2 R1 : +1 R3 : -1 R1 : +1

slide-11
SLIDE 11

| | Department of Biosystems Science and Engineering

Condense / split node

10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 11 root

R1 : +1 R2 : -1, R3:+1 R4 : -1 R5 : +2 R1 : +1 R3 : -1

Before After

root

R1 : +1 R2 : -1, R3:+1 R4 : -1 R5 : +2, R1:+1 R3 : -1

slide-12
SLIDE 12

| | Department of Biosystems Science and Engineering

Tree learned from mouse data

10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 12

slide-13
SLIDE 13

| | Department of Biosystems Science and Engineering

What else is required?

10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 13

§ Reproducibility in research § Scalability § Support for Multiple programming languages § Multi processing § Cluster execution § Resources management § Statistics about resource usages

slide-14
SLIDE 14

| | Department of Biosystems Science and Engineering

Workflow management system

10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 14

slide-15
SLIDE 15

| | Department of Biosystems Science and Engineering

Snakemake

10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 15

  • A Pythonic workflow management system
  • Extends the Python syntax
  • Follows the GNU make paradigm
  • Workflows are defined in terms of rules that define how to create output files from input

files

  • Dependencies between the rules are determined automatically
  • Benefits from Python libraries
  • Automated logging of the status
  • Suspend/resume workflow
  • A general-purpose workflow management system for any discipline
slide-16
SLIDE 16

| | Department of Biosystems Science and Engineering

Example: read mapping

10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 16

slide-17
SLIDE 17

| | Department of Biosystems Science and Engineering

Example: read mapping (generalised)

10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 17

slide-18
SLIDE 18

| | Department of Biosystems Science and Engineering 10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 18

DAG of jobs

slide-19
SLIDE 19

| | Department of Biosystems Science and Engineering 10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 19

Snakefile

slide-20
SLIDE 20

| | Department of Biosystems Science and Engineering 10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 20

Config file

slide-21
SLIDE 21

| | Department of Biosystems Science and Engineering 10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 21

Cluster execution

  • Configurable for LSF/BSUB scheduler
  • Allows scaling without changing the workflow
slide-22
SLIDE 22

| | Department of Biosystems Science and Engineering

HDF5

10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 22

Data Metadata

Dataspace

3

Rank

Dim_2 = 5 Dim_1 = 4

Dimensions

Time = 32.4 Pressure = 987 Temp = 56

Attributes

Chunked Compressed Dim_3 = 7

Storage info

Integer

Datatype § Hierarchical data format v5 § Binary files

  • Easy to manage multiple datasets
  • Keeps metadata with data
  • Fast I/O operations & storage

space optimization (compressed binary files)

  • Platform/language independent
  • Self describing
  • No need to load whole data

HDF = Hierarchical Data Format

slide-23
SLIDE 23

| | Department of Biosystems Science and Engineering 10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 23

HDF5 wrappers in Python

h5py is a thin, pythonic wrapper around the HDF5

slide-24
SLIDE 24

| | Department of Biosystems Science and Engineering

Outline

10.07.19 Bioinformatics pipeline for revealing tumour heterogeneity. |. Mustafa Anıl Tuncel 24

§ Background

§ Biology prior § Single cell sequencing technologies § Mutations on DNA

§ DNA mutation trees

§ Tree model § MCMC moves

§ Pipeline

§ Snakemake § HDF5

Future work

  • Publish the method
  • Compare to clustering methods
  • Evaluate on simulated data
  • Show results on real data
  • Wrap up the workflow as a Python

package

  • Do the C++ bindings
  • Open source it
slide-25
SLIDE 25

Thank you!

anilbey /in/aniltuncel anilbey mtuncel@ethz.ch Mustafa Anıl Tuncel Software Engineer ETH Zurich