Computational Genomics Francisco Garca Garca BIER - - PowerPoint PPT Presentation

computational genomics
SMART_READER_LITE
LIVE PREVIEW

Computational Genomics Francisco Garca Garca BIER - - PowerPoint PPT Presentation

Computational Genomics Francisco Garca Garca BIER fgardos@gmail.com Mster en Biotecnologa Biomdica. UPV Why are we interested in Computational Genomics? The overall goal : Apply computational methods to biomedical and


slide-1
SLIDE 1

Computational Genomics

Francisco García García

BIER

fgardos@gmail.com

Máster en Biotecnología Biomédica. UPV

slide-2
SLIDE 2

Why are we interested in Computational Genomics?

The overall goal:

Apply computational methods to biomedical and biotechnological problems

Research interests:

The development and application of novel bioinformatics methods aimed at discovering new drugs

Identifjcation of genes or proteins may be considered therapeutic targets

Personalized medicine: tools for discovering and diagnostic

Why Computational Genomics? Introduction

slide-3
SLIDE 3

Computational Genomics

Omics sciences Introduction

Metabolomics Proteomics Genomics Transcriptomics Lipidomics Epigenomics

slide-4
SLIDE 4

How do these technologies work ?

High throughput technologies: microarrays Introduction

Computational Genomics

slide-5
SLIDE 5

Reference genome

Introduction

High throughput technologies: Next Generation Sequencing

How do these technologies work ?

Computational Genomics

slide-6
SLIDE 6

Clinical and biological databases Introduction

ClinVar HUMSAVAR HGMD COSMIC

Biological knowledge Clinical knowledge

Gene Ontology KEGG pathways Regulatory elements

MiRNA, CisRed Transcription Factor Binding Sites

Biocarta pathways InterPro Motifs

Bioentities from literature

Gene Expression in tissues

Computational Genomics

slide-7
SLIDE 7

Personalized Medicine Introduction

Computational Genomics

slide-8
SLIDE 8

+

Personalized Medicine Introduction

Computational Genomics

slide-9
SLIDE 9

Descripción de las sesiones

Máster en Biotecnología Biomédica. UPV. 3 sesiones (7 horas) sobre el uso de herramientas web para el análisis e interpretación de datos de secuenciación. T

  • da la documentación (presentaciones + ejercicios) que

necesitaremos durante estos días, estarán disponibles en este enlace http://bioinfo.cipf.es/mbb/. T ambién en Poliformat. Docentes: Marta Hidalgo y Paco García. El enfoque de las sesiones será práctico y sólo introduciremos aquellos conceptos que precisemos para los ejercicios. Introduction

slide-10
SLIDE 10

Programa

Máster en Biotecnología Biomédica. UPV.

Sesión 1

  • Introducción a las tecnologías NGS.
  • Estudios de detección de variación genómica. Pipeline de análisis de datos genómicos.
  • ¿Cómo detectar mutaciones de interés en estudios de exomas completos? Ejercicios

con la herramienta web BiERapp.

Sesión 2

 Estudios de variación genómica: secuenciación genómica dirigida.  ¿Cómo diseñar un panel de genes? ¿Cómo analizar e interpretar datos de paneles de

genes?. Ejercicios con TEAM.

 Variabilidad genética española. Base de datos CSVS.  Estudios transcriptómicos con datos de NGS. Pipeline de análisis de datos de expresión.

¿Cómo analizar datos de RNA-Seq desde la suite Babelomics?

Sesión 3

 Análisis de datos transcriptómicos en el contexto de las rutas de señalización.  Ejercicios con las herramientas web hipathia y PathAct.

Introduction

slide-11
SLIDE 11

Web tools to analyze

  • mic data

BIER

fgardos@gmail.com

Máster en Biotecnología Biomédica. UPV

slide-12
SLIDE 12

NGS Data Analysis Pipeline

Sequence preprocessing Alignment Variant calling Variant annotation Prioritization

Fastq BAM VCF

Visualization

BAM

RNA-Seq processing RNA-Seq data analysis Functional analysis

Count matrix

Fastq

RNA-Seq Data Analysis Resequencing Data Analysis

NGS data analysis: pipelines Introduction

slide-13
SLIDE 13

Fastq format

 We could say “it is a fasta with qualities”:

 1. Header (like the fasta but starting with “@”)  2. Sequence (string of nt)  3. “+” and sequence ID (optional)  4. Encoded quality of the sequence

@SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65

NGS data analysis: fjles format Introduction

slide-14
SLIDE 14

BAM/SAM format

@PG ID:HPG-Aligner VN:1.0 @SQ SN:20 LN:63025520 HWI-ST700660_138:2:2105:7292:79900#2@0/1 16 20 76703 254 76= * 0 0 GTTTAGATACTGAAAGGTACATACTTCTTTGTAGGAACAAGCTATCATGCTGCATTTCTATAATATCACATGAATA GIJGJLGGFLILGGIEIFEKEDELIGLJIHJFIKKFELFIKLFFGLGHKKGJLFIIGKFFEFFEFGKCKFHHCCCF AS:i:254 NH:i:1 NM:i:0 HWI-ST700660_138:2:2208:6911:12246#2@0/1 16 20 76703 254 76= * 0 0 GTTTAGATACTGAAAGGTACATACTTCTTTGTAGGAACAAGCTATCATGCTGCATTTCTATAATATCACATGAATA HHJFHLGFFLILEGIKIEEMGEDLIGLHIHJFIKKFELFIKLEFGKGHEKHJLFHIGKFFDFFEFGKDKFHHCCCF AS:i:254 NH:i:1 NM:i:0 HWI-ST700660_138:2:1201:2973:62218#2@0/1 0 20 76655 254 76M * 0 0 AACCCCAAAAATGTTGGAAGAATAATGTAGGACATTGCAGAAGACGATGTTTAGATACTGAAAGGGACATACTTCT FEFFGHHHGGHFKCCJKFHIGIFFIFLDEJKGJGGFKIHLFIJGIEGFLDEDFLFGEIIMHHIKL$BBGFFJIEHE AS:i:254 NH:i:1 NM:i:1 HWI-ST700660_138:2:1203:21395:164917#2@0/1 256 20 68253 254 4M1D72M * 0 0 NCACCCATGATAGACCAGTAAAGGTGACCACTTAAATTCCTTGCTGTGCAGTGTTCTGTATTCCTCAGGACACAGA #4@ADEHFJFFEJDHJGKEFIHGHBGFHHFIICEIIFFKKIFHEGJEHHGLELEGKJMFGGGLEIKHLFGKIKHDG AS:i:254 NH:i:3 NM:i:1 HWI-ST700660_138:2:1105:16101:50526#6@0/1 16 20 126103 246 53M4D23M * 0 0 AAGAAGTGCAAACCTGAAGAGATGCATGTAAAGAATGGTTGGGCAATGTGCGGCAAAGGGACTGCTGTGTTCCAGC FEHIGGHIGIGJI6FCFHJIFFLJJCJGJHGFKKKKGIJKHFFKIFFFKHFLKHGKJLJGKILLEFFLIHJIEIIB AS:i:368 NH:i:1 NM:i:4

SAM Specifjcation: http://samtools.sourceforge.net/SAM1.pdf

NGS data analysis: fjles format Introduction

slide-15
SLIDE 15

VCF format

http://www.1000genomes.org/

NGS data analysis: fjles format Introduction

slide-16
SLIDE 16

Counts

Sample Gene

NGS data analysis: fjles format Introduction

slide-17
SLIDE 17

Transcriptomic Studies

BIER

fgardos@gmail.com

Máster en Biotecnología Biomédica. UPV

slide-18
SLIDE 18
  • 1. Sequence preprocessing
  • 1. Sequence preprocessing
  • 2. Mapping
  • 3. Quantifjcation
  • 6. Functional Profjling

RNA-Seq Data Analysis Pipeline

Primary Secondary

RNA-Seq Data Analysis

  • 5. Difgerential expression
  • 4. Normalization

Babelomics 5

slide-19
SLIDE 19

Babelomics 5

Analyzing omics data + functional profjling Babelomics 5

http://babelomics.bioinfo.cipf.es/

slide-20
SLIDE 20

Differential Expression

UPLOAD DATA EDIT DATA NORMALIZATION + DIFFERENTIAL EXPRESSION FUNCTIONAL PROFILING

Analyzing omics data + functional profjling Babelomics 5

slide-21
SLIDE 21

Supervised and Unsupervised Classification

UPLOAD DATA CLUSTERING PREDICTORS NORMALIZE DATA EDIT DATA RPKM TMM

Analyzing omics data + functional profjling Babelomics 5

slide-22
SLIDE 22

Signaling Pathways Analysis

hiPhatia Signaling Pathways Analysis

http://hipathia.babelomics.org/

slide-23
SLIDE 23

Genomic Variation Studies

BIER

fgardos@gmail.com

Máster en Biotecnología Biomédica. UPV

slide-24
SLIDE 24
  • 1. Sequence preprocessing
  • 1. Sequence preprocessing
  • 2. Mapping
  • 3. Variant calling
  • 4. Variant prioritization

Genomics Data Analysis Pipeline

Primary Analysis Secondary

Resequencing Data Analysis Pipeline

slide-25
SLIDE 25

http://courses.babelomics.org/bierapp/

How do we prioritize variants in whole exome studies?

BIER

Discovering variants BiERapp

slide-26
SLIDE 26

Introduction

Whole-exome sequencing has become a fundamental tool for the discovery of disease-related genes of familial diseases but there are diffjculties to fjnd the causal mutation among the enormous background

There are difgerent scenarios, so we need difgerent and immediate strategies of prioritization

Vast amount of biological knowledge available in many databases

We need a tool to integrate this information and fjlter immediately to select candidate variants related to the disease

Discovering variants BiERapp

slide-27
SLIDE 27

How does BiERapp work?

VCF fjle multisample BiERapp

CellBase VARIANT Filterings

Discovering variants BiERapp

slide-28
SLIDE 28

Input: VCF fjle

  • 1. Sequence preprocessing
  • 1. Sequence preprocessing
  • 2. Mapping
  • 3. Variant calling
  • 4. Variant prioritization

Primary Analysis Secondary

VCF fjles BiERapp

Discovering variants BiERapp

slide-29
SLIDE 29

http://courses.babelomics.org/team/

Can I interpret sequencing data for diagnostic?

Targeted Enrichment Analysis and Management

BIER

TEAM

slide-30
SLIDE 30

Gene panel

Targeted Enrichment Analysis and Management

TEAM

Sequencing data

Diagnostic

ClinVar HUMSAVAR HGMD COSMIC

Biological knowledge

TEAM

slide-31
SLIDE 31

Gene panel

  • 1. VCF fjles

TEAM

  • 2. Gene panel

ClinVar HUMSAVAR HGMD COSMIC

Targeted Enrichment Analysis and Management TEAM

slide-32
SLIDE 32

CSVS:

CIBERER Spanish Variant Server

Repositorio de frecuencias de variantes en la población española

http://csvs.babelomics.org/

CIBERER Spanish Variant Server CSVS

slide-33
SLIDE 33

Local genetic variability

CIBERER Spanish Variant Server

CSVS

slide-34
SLIDE 34

Tool interface

CIBERER Spanish Variant Server CSVS

http://csvs.babelomics.org/

slide-35
SLIDE 35

A next-generation web-based genome browser

Genome Maps

Visualizador genómico que interactúa con bases de datos funcionales http://genomemaps.org/

Genome Maps

slide-36
SLIDE 36

Tool interface

Genome Maps A next-generation web-based genome browser

slide-37
SLIDE 37

Cell Maps

Herramienta de modelización y visualización de redes biológicas http://cellmaps.babelomics.org/

Visualizing and integrating biological networks Cell Maps

slide-38
SLIDE 38

Cell Maps

1)Es una herramienta que permite la integración, visualización y el análisis de redes biológicas. 2)El input es un fichero donde indicamos las relaciones entre los nodos de nuetra red. Opcionalmente podemosincluir un fichero con los atributos de cada nodo. 3)El output gráfico es una red en la que se muestran las relaciones de los distintos nodos que la integran. Tutorial: https://github.com/opencb/cell-maps/wiki

Visualizing and integrating biological networks Cell Maps

slide-39
SLIDE 39

Tool interface

Visualizing and integrating biological networks Cell Maps

slide-40
SLIDE 40

Cell Maps: inputs

Visualizing and integrating biological networks Cell Maps

slide-41
SLIDE 41

Cell Maps: outputs

Visualizing and integrating biological networks Cell Maps

slide-42
SLIDE 42

Omics Data Integration from a Systems Biology perspective

BIER

Francisco García fgardos@gmail.com

Omics Data Integration

slide-43
SLIDE 43

Omics Data Integration

Omics Data Integration

Patient T echnologies Data Analysis Integration and interpretation Molecular and clinical model

Introduction

slide-44
SLIDE 44

Omics Data Integration

Multidimensional Gene Set Analysis

Case Control

microRNA- Seq

Case Control

mRNA-Seq MicroRNA-Seq & mRNA-Seq

Gene1 0.01 Gene2 0.04 Gene3 0.09 Gene4 0.2 ... miRNA1 0.5 miRNA2 1.2 miRNA3 1.3 miRNA4 1.7 ...

Ranking Index Logistic Regression Patterns GOs InterPRO KEGGs Functional annotation

Strategies

slide-45
SLIDE 45

Omics Data Integration

Functional Meta-Analysis

Case Control

mRNA-Seq

N mRNA-Seq studies

Case Control

mRNA-Seq

Case Control

mRNA-Seq

Case Control

mRNA-Seq

…..

Difgerential Expression Functional Profjling

Meta- Analysis

Difgerential Expression Functional Profjling Difgerential Expression Functional Profjling Difgerential Expression Functional Profjling

GOs InterPRO KEGGs

Strategies

slide-46
SLIDE 46

PATHiVAR estimates the functional impact that mutations have over the human signalling network.

PATHiVAR:

Analyses VCF fjles

Extract the deleterious mutations

Locate them over the signalling pathways in the selected tissue (with the appropriate expression pattern)

Provide a comprehensive, graphic and interactive view of the predicted signal transduction probabilities across the difgerent signalling pathways.

PATHiVAR

PATHiVAR: mutations and expression

http://pathivar.babelomics.org/

Strategies

slide-47
SLIDE 47

How does PATHiVARK work?

VCF fjle

PATHiVAR

SIFT

PATHiVAR

PolyPhen Pathways Tissues Inheritance pattern

Strategies

slide-48
SLIDE 48

PATHiVAR

PATHiVAR

CALCIUM SIGNALING PATHWAY

Strategies

slide-49
SLIDE 49

Other resources for Genomic Data Analysis

NGS data analysis More resources

  • 1. Sequence preprocessing
  • 1. Sequence preprocessing
  • 2. Mapping
  • 3. Variant calling
  • 4. Variant prioritization

Primary Analysis BiERapp Secondary HPG Aligner HPG Variant CellBase HPG Pore

http://www.opencb.or g/

slide-50
SLIDE 50

Any question?

fgardos@gmail.com

Francisco García García