Computational Genomics
Francisco García García
BIER
fgardos@gmail.com
Máster en Biotecnología Biomédica. UPV
Computational Genomics Francisco Garca Garca BIER - - PowerPoint PPT Presentation
Computational Genomics Francisco Garca Garca BIER fgardos@gmail.com Mster en Biotecnologa Biomdica. UPV Why are we interested in Computational Genomics? The overall goal : Apply computational methods to biomedical and
Francisco García García
BIER
fgardos@gmail.com
Máster en Biotecnología Biomédica. UPV
Why are we interested in Computational Genomics?
The overall goal:
Apply computational methods to biomedical and biotechnological problems
Research interests:
The development and application of novel bioinformatics methods aimed at discovering new drugs
Identifjcation of genes or proteins may be considered therapeutic targets
Personalized medicine: tools for discovering and diagnostic
Why Computational Genomics? Introduction
Computational Genomics
Omics sciences Introduction
Metabolomics Proteomics Genomics Transcriptomics Lipidomics Epigenomics
How do these technologies work ?
High throughput technologies: microarrays Introduction
Computational Genomics
Reference genome
Introduction
High throughput technologies: Next Generation Sequencing
How do these technologies work ?
Computational Genomics
Clinical and biological databases Introduction
ClinVar HUMSAVAR HGMD COSMIC
Biological knowledge Clinical knowledge
Gene Ontology KEGG pathways Regulatory elements
MiRNA, CisRed Transcription Factor Binding Sites
Biocarta pathways InterPro Motifs
Bioentities from literature
Gene Expression in tissues
Computational Genomics
Personalized Medicine Introduction
Computational Genomics
+
Personalized Medicine Introduction
Computational Genomics
Máster en Biotecnología Biomédica. UPV. 3 sesiones (7 horas) sobre el uso de herramientas web para el análisis e interpretación de datos de secuenciación. T
necesitaremos durante estos días, estarán disponibles en este enlace http://bioinfo.cipf.es/mbb/. T ambién en Poliformat. Docentes: Marta Hidalgo y Paco García. El enfoque de las sesiones será práctico y sólo introduciremos aquellos conceptos que precisemos para los ejercicios. Introduction
Máster en Biotecnología Biomédica. UPV.
Sesión 1
con la herramienta web BiERapp.
Sesión 2
Estudios de variación genómica: secuenciación genómica dirigida. ¿Cómo diseñar un panel de genes? ¿Cómo analizar e interpretar datos de paneles de
genes?. Ejercicios con TEAM.
Variabilidad genética española. Base de datos CSVS. Estudios transcriptómicos con datos de NGS. Pipeline de análisis de datos de expresión.
¿Cómo analizar datos de RNA-Seq desde la suite Babelomics?
Sesión 3
Análisis de datos transcriptómicos en el contexto de las rutas de señalización. Ejercicios con las herramientas web hipathia y PathAct.
Introduction
BIER
fgardos@gmail.com
Máster en Biotecnología Biomédica. UPV
Sequence preprocessing Alignment Variant calling Variant annotation Prioritization
Fastq BAM VCF
Visualization
BAM
RNA-Seq processing RNA-Seq data analysis Functional analysis
Count matrix
Fastq
RNA-Seq Data Analysis Resequencing Data Analysis
NGS data analysis: pipelines Introduction
We could say “it is a fasta with qualities”:
1. Header (like the fasta but starting with “@”) 2. Sequence (string of nt) 3. “+” and sequence ID (optional) 4. Encoded quality of the sequence
@SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
NGS data analysis: fjles format Introduction
@PG ID:HPG-Aligner VN:1.0 @SQ SN:20 LN:63025520 HWI-ST700660_138:2:2105:7292:79900#2@0/1 16 20 76703 254 76= * 0 0 GTTTAGATACTGAAAGGTACATACTTCTTTGTAGGAACAAGCTATCATGCTGCATTTCTATAATATCACATGAATA GIJGJLGGFLILGGIEIFEKEDELIGLJIHJFIKKFELFIKLFFGLGHKKGJLFIIGKFFEFFEFGKCKFHHCCCF AS:i:254 NH:i:1 NM:i:0 HWI-ST700660_138:2:2208:6911:12246#2@0/1 16 20 76703 254 76= * 0 0 GTTTAGATACTGAAAGGTACATACTTCTTTGTAGGAACAAGCTATCATGCTGCATTTCTATAATATCACATGAATA HHJFHLGFFLILEGIKIEEMGEDLIGLHIHJFIKKFELFIKLEFGKGHEKHJLFHIGKFFDFFEFGKDKFHHCCCF AS:i:254 NH:i:1 NM:i:0 HWI-ST700660_138:2:1201:2973:62218#2@0/1 0 20 76655 254 76M * 0 0 AACCCCAAAAATGTTGGAAGAATAATGTAGGACATTGCAGAAGACGATGTTTAGATACTGAAAGGGACATACTTCT FEFFGHHHGGHFKCCJKFHIGIFFIFLDEJKGJGGFKIHLFIJGIEGFLDEDFLFGEIIMHHIKL$BBGFFJIEHE AS:i:254 NH:i:1 NM:i:1 HWI-ST700660_138:2:1203:21395:164917#2@0/1 256 20 68253 254 4M1D72M * 0 0 NCACCCATGATAGACCAGTAAAGGTGACCACTTAAATTCCTTGCTGTGCAGTGTTCTGTATTCCTCAGGACACAGA #4@ADEHFJFFEJDHJGKEFIHGHBGFHHFIICEIIFFKKIFHEGJEHHGLELEGKJMFGGGLEIKHLFGKIKHDG AS:i:254 NH:i:3 NM:i:1 HWI-ST700660_138:2:1105:16101:50526#6@0/1 16 20 126103 246 53M4D23M * 0 0 AAGAAGTGCAAACCTGAAGAGATGCATGTAAAGAATGGTTGGGCAATGTGCGGCAAAGGGACTGCTGTGTTCCAGC FEHIGGHIGIGJI6FCFHJIFFLJJCJGJHGFKKKKGIJKHFFKIFFFKHFLKHGKJLJGKILLEFFLIHJIEIIB AS:i:368 NH:i:1 NM:i:4
SAM Specifjcation: http://samtools.sourceforge.net/SAM1.pdf
NGS data analysis: fjles format Introduction
http://www.1000genomes.org/
NGS data analysis: fjles format Introduction
Sample Gene
NGS data analysis: fjles format Introduction
BIER
fgardos@gmail.com
Máster en Biotecnología Biomédica. UPV
Primary Secondary
RNA-Seq Data Analysis
Babelomics 5
Analyzing omics data + functional profjling Babelomics 5
http://babelomics.bioinfo.cipf.es/
UPLOAD DATA EDIT DATA NORMALIZATION + DIFFERENTIAL EXPRESSION FUNCTIONAL PROFILING
Analyzing omics data + functional profjling Babelomics 5
UPLOAD DATA CLUSTERING PREDICTORS NORMALIZE DATA EDIT DATA RPKM TMM
Analyzing omics data + functional profjling Babelomics 5
hiPhatia Signaling Pathways Analysis
http://hipathia.babelomics.org/
BIER
fgardos@gmail.com
Máster en Biotecnología Biomédica. UPV
Primary Analysis Secondary
Resequencing Data Analysis Pipeline
http://courses.babelomics.org/bierapp/
BIER
Discovering variants BiERapp
Whole-exome sequencing has become a fundamental tool for the discovery of disease-related genes of familial diseases but there are diffjculties to fjnd the causal mutation among the enormous background
There are difgerent scenarios, so we need difgerent and immediate strategies of prioritization
Vast amount of biological knowledge available in many databases
We need a tool to integrate this information and fjlter immediately to select candidate variants related to the disease
Discovering variants BiERapp
VCF fjle multisample BiERapp
CellBase VARIANT Filterings
Discovering variants BiERapp
Primary Analysis Secondary
VCF fjles BiERapp
Discovering variants BiERapp
Targeted Enrichment Analysis and Management
BIER
TEAM
Targeted Enrichment Analysis and Management
Sequencing data
Diagnostic
ClinVar HUMSAVAR HGMD COSMIC
Biological knowledge
TEAM
ClinVar HUMSAVAR HGMD COSMIC
Targeted Enrichment Analysis and Management TEAM
CIBERER Spanish Variant Server CSVS
Local genetic variability
CSVS
CIBERER Spanish Variant Server CSVS
A next-generation web-based genome browser
Genome Maps
Genome Maps A next-generation web-based genome browser
Visualizing and integrating biological networks Cell Maps
1)Es una herramienta que permite la integración, visualización y el análisis de redes biológicas. 2)El input es un fichero donde indicamos las relaciones entre los nodos de nuetra red. Opcionalmente podemosincluir un fichero con los atributos de cada nodo. 3)El output gráfico es una red en la que se muestran las relaciones de los distintos nodos que la integran. Tutorial: https://github.com/opencb/cell-maps/wiki
Visualizing and integrating biological networks Cell Maps
Visualizing and integrating biological networks Cell Maps
Visualizing and integrating biological networks Cell Maps
Visualizing and integrating biological networks Cell Maps
BIER
Francisco García fgardos@gmail.com
Omics Data Integration
Omics Data Integration
Patient T echnologies Data Analysis Integration and interpretation Molecular and clinical model
Introduction
Omics Data Integration
Case Control
microRNA- Seq
Case Control
Gene1 0.01 Gene2 0.04 Gene3 0.09 Gene4 0.2 ... miRNA1 0.5 miRNA2 1.2 miRNA3 1.3 miRNA4 1.7 ...
Ranking Index Logistic Regression Patterns GOs InterPRO KEGGs Functional annotation
Strategies
Omics Data Integration
Case Control
mRNA-Seq
Case Control
mRNA-Seq
Case Control
mRNA-Seq
Case Control
mRNA-Seq
…..
Difgerential Expression Functional Profjling
Meta- Analysis
Difgerential Expression Functional Profjling Difgerential Expression Functional Profjling Difgerential Expression Functional Profjling
GOs InterPRO KEGGs
Strategies
PATHiVAR estimates the functional impact that mutations have over the human signalling network.
PATHiVAR:
Analyses VCF fjles
Extract the deleterious mutations
Locate them over the signalling pathways in the selected tissue (with the appropriate expression pattern)
Provide a comprehensive, graphic and interactive view of the predicted signal transduction probabilities across the difgerent signalling pathways.
PATHiVAR
http://pathivar.babelomics.org/
Strategies
SIFT
PATHiVAR
PolyPhen Pathways Tissues Inheritance pattern
Strategies
PATHiVAR
CALCIUM SIGNALING PATHWAY
Strategies
NGS data analysis More resources
Primary Analysis BiERapp Secondary HPG Aligner HPG Variant CellBase HPG Pore
http://www.opencb.or g/
fgardos@gmail.com
Francisco García García