Program an analysis workflow Day 1. Basic functionality of - PowerPoint PPT Presentation

Program – an analysis workflow Day 1. � Basic functionality of Chipster (Eija) Microarray data analysis with Chipster � Data import (Eija) � Quality control (Jarno) 16.-17.4.2008 � Normalization (Jarno) • Describing the experiment � Filtering and missing value considerations (Jarno) Jarno Tuimala Day 2. Eija Korpelainen � Statistical testing (Jarno) � Clustering and visualization (Jarno) � Annotation (Eija) � Promoter analysis (Eija) � Experimental design (Jarno) – if time allows Demo data � Affymetrix • Kidney cancer Introduction to microarrays Introduction to microarrays • 8 controls, 9 cancer patients � Agilent • Acute leukemia • 7 controls, 7 FLT mutated � Illumina • Teratozoospermia • 5 controls, 8 affected 1

Research using microarrays � Plan! • Experimental design Introduction to Chipster � Laboratory work • Extract, label, hybridize � Computer work • Scanning, image analysis • Bioinformatics � Laboratory work • Confirmation � Publish • Submit data to public databases How does it work? Chipster � Goal: Easy access to leading analysis tools such as those developed in the CSC internet desktop R/Bioconductor project � Features • Easy to use graphical user interface • Comprehensive selection of tools SSL front Java Web Start security client • Support for different array types (Affymetrix, Agilent, Illumina, cDNA) installs and end • Compatible with Windows, Linux and Mac OS X updates client • Easy to install and update automatically SOAP • Wizards and workflows • Interactive graphics analyser • Transparency (as opposed to “black box”) • Alternative annotations for Affymetrix arrays Corona/Murska • Automatic tracking of performed analyses international Web Services VISUALISATION ANALYSIS � http://www.csc.fi/english/customers/university/useraccounts/scientificservices.pdf � http://chipster.csc.fi 2

Acknowledgements � Aleksi Kallio � Jarno Tuimala � Taavi Hupponen � Mika Rissanen, Janne Käki, Mikko Koski, Petri Klemelä � All the pilot users � Department of computer science (HY) � Dario Greco (HY) � Prof. Olli Yli-Harja’s group (TUT) � GeneCruiser team (MIT Broad Institute) � Tekes/SA SYSBIO-program Phenodata – describing your experiment � Phenodata file is created during normalization Tools � Fill in the group column with numbers describing your experimental setup • e.g. 1 = healthy control, 2 = cancer sample Data • necessary for the statistical tests to work � If you bring in previously created normalized data and phenodata: • Choose ”import directly” in the import tool • Right click on normalized data, choose ”Link to” phenodata and link type ”Annotation” � If you brought in normalized data and need to create phenodata for it: Visualization • Utilities/ Generate phenodata (fill in the chiptype parameter!) • Right click on normalized data, choose ”Link to” phenodata and link type ”Annotation” • Fill in the group column 3

Interactive visualizations by the client Visualizing the data � Spreadsheet � Histogram � Data visualization panel � Scatterplot • Maximize and redraw for better viewing � 3D scatterplot � Expression profiles � Two types of visualizations � Clustered profiles 1. Interactive visualizations produced by the client program � Hierarchical clustering • Select the visualization method from the pulldown menu of the data � SOM clustering visualization panel � Array pseudo-image • Save by right clicking on the image 2. Static images produced by R/Bioconductor, Weeder, etc • Select from Analysis tools/ Visualisation Available actions: • View by double clicking on the image file � Change titles, colors etc • Save by right clicking on the file name and choosing ”Export” � Zoom in/out � Select and annotate genes using the MIT GeneCruiser 4

Static images produced by R/Bioconductor � Volcano plot � Box plot � Histogram � Heatmap � Venn diagram � Idiogram � Chromosomal position � Correlogram � Dendrogram � QC stats plot � RNA degradation plot � K-means clustering � SOM-clustering Automatic tracking of analysis history Running many analyses simultaneously � You can have max 5 analysis jobs running at the same time � Use Task manager to • view parameters, status,… • cancel jobs 5

Workflow – reusing your analysis pipeline Workspace – continue later/elsewhere � Creates a ”macro” that can be applied to another normalized dataset and � Saving your workspace allows you to continue later phenodata • File/ Save workflow • File/ Load workflow � Choose a dataset, and workflow records the analysis steps that lead to that dataset � Currently it is possible to have only one workspace saved at the time � You can give the workflow a meaningful name (ending .bsh), but it has to be � If you would like to continue your work on another computer, you located in the chipster-scripts folder need to transfer the workspace-snapshot -folder to the corresponding location under nami-work-files • C:\Documents and Settings\ekorpela\nami-work-files\workspace-snapshot � You can run a workflow on another computer by making it visible to Chipster with ”Reload workflows from disk” � You can change parameters directly to the workflow file Wizard – autopilot for analysis Wizard for Affymetrix data � Ready-made workflow to find differentially expressed genes • Normalization • Phenodata creation • Statistical test • Hierarchical clustering 6

Importing files Import tool, step 1 � Affymetrix CEL-files are imported to � Define Chipster automatically • Header • Footer � Other files are imported using the • Title row Import tool • Delimiter Import tool, step 2 Importing Agilent files � Define columns � Sample (rMeanSignal) � Modify flags � Sample background (rBGMedianSignal) � Control (gMeanSignal) � Control background (gBGMedianSignal) � Identifier (ProbeUID) � Annotation (ControlType) � https://extras.csc.fi/biosciences/chipster-manual/data-formats.html 7

Exercise I 1. Import the demo data of your favorite type in Chipster � Affymetrix Exercise � Agilent 2. Save the workspace 3. Have lunch (back at 13.00) Quality control tools � Quality control -tools • Affymetrix basic Quality control RNA degradation + Affy QC • Affymetrix RLE & NUSE (might take a long time to run) Fits a model to expression values • Agilent MA-plot + density plot + boxplot � Visualization – dendrogram � Statistics - NMDS 8

Affymetrix I Affymetrix II � Quality control tools are run on raw data (CEL files). • Dendrogram and NMDS on normalized data Agilent General QC – dendrogram and NMDS 9

Scatterplots Heatmaps (this took an hour to calculate) QC-tools in Chipster � Quality control • Affymetrix basic Normalization • Affymetrix RLE and NUSE • Agilent 2-color � Visualization • Dendrogram • Heatmap • Correlogram � Statistics • NMDS 10

What is normalization? Methods � Normalization is the process of removing systematic � Affymetrix variation from the data. • Background correction + expression estimation + summarization � Typically you would normalize your data so that all the • RMA (default) uses only PM probes, fits a model to them, and gives out chips become comparable. expression values after quantile normalization and median polishing � Agilent • Background correction + averaging duplicate spots + normalization � After normalization the expression values are always expressed on log2-scale Affymetrix Agilent I � Methods: MAS5, Plier, RMA, GCRMA, Li-Wong � Background correction • MAS5 is the older Affymetrix method, Plier is a newer one • RMA is the default, and works rather nicely if you have more than a • Background treatment few chips None, Subtract, Edwards, Normexp • GCRMA is similar to RMA, but takes also GC% content into account • Background offset • Li-Wong is the method implemented in dChip 0 or 50 � Variance stabilization makes the variance over all the chips � Normalize chips similar • None, median, loess • Works only with MAS5 and Plier, since all others output log2- � Normalize genes (not typically used) tranformed data by default (and thus corrected for the same • None , scale (to median), quantile phenomenon) � Chiptype � Custom chiptype • A must setting! • If you want to use reannotated probes (they are really assigned to the genes where they belong), select one from this menu. 11

Checking normalization Agilent II � Background treatment typically generates many negative values that are coded as missing values after log2-transformation. • Usual subtract option does this • Using normexp + offset 50 will generate no negative values, and gives rather good estimates (best method reported) � Loess removes curvature from the data (suggested) Exercise II � Normalize your dataset • Use two different normalization schemes Exercise � Describe the experiment (fill in phenodata) � Check the quality of your dataset • Is there difference between the normalization schemes • If there is, select the better one, and continue with it 12

Program an analysis workflow Day 1. Basic functionality of - PowerPoint PPT Presentation

Program an analysis workflow Day 1. Basic functionality of Chipster (Eija) Microarray data analysis with Chipster Data import (Eija) Quality control (Jarno) 16.-17.4.2008 Normalization (Jarno) Describing the

Peoplesoft Workflow Peoplesoft Workflow Technology Technology Putting Customer First SOA IT

STAR-CCM+ in your Workflow Bill Jester, CD-adapco STAR-CCM+ in your workflow Contents

Day 8 Workflow Cloud Resource Provisioning Todays Agenda Introduction What is workflow?

workflow: workflow: QSPR = Quantitative Structure Property

A Workflow Workflow for for Retrieving Retrieving Orthologous Orthologous A Promoters and I

Design of a Petri Net-based Design of a Petri Net-based Workflow Engine Workflow Engine Simone

Introduction to CONNJUR Workflow Builder and Yes Workflow 2017 Summer Workshop: June 29, 2017

Kap. 12 Workflow Management in ERP-Systemen 12.1 Workflow Management: Konzepte 12.2 Einbindung

Module 4 - Smoothing the Workflow with the Kanban Best Practices Establishing an Even Workflow

Diagnostic Information for Control-Flow Analysis of Workflow Graphs (aka Free-Choice Workflow Nets)

Cbio 16S analysis pipeline Katie Lennard Microbiome analysis workflow Data preprocessing (UCT

Analysis and Optimizations Analysis and Optimizations Program Analysis Program Analysis

The Workflow Evolution (The Learning Curve) The Client The CAA The Utility The Workflow

Same-Day Dental Sealant Workflow Same-Day Sealant Workflow Why implement? Prevention of

Workflow Plus Signature Capture Tool for Synergy Enterprise What is This Tool ? This tool

JV WORKFLOW & BUSINESS EDITS HAND IN HAND WE LEARN 1 University Financial Services JV

Multimedia Indexing and Retrieval Georges Qunot Multimedia Information Modeling and Retrieval

Color Image Indexing Using BTC Author: Guoping Qiu Source: IEEE Transaction on Image Processing,

Improved Soft Decisions in Missing Data ASR: Using Harmonicity in Conjunction with Local SNR

Understanding the Long-Term Self-Similarity of Internet Traffic Steve Uhlig and Olivier

Modeling brain cognitive functions by oscillatory neural networks Institute of Mathematical

Ecological models: A management tool of promising species with biomass potential in the Ecuadorian

Active Data Mining of Correspondence for Qualitative Assessment of Scientific Computations Chris

Completeness via correspondence for extensions of first degree entailment supplied with classical

Sambuz

Useful Links

Newsletter

Mail Us

Program an analysis workflow Day 1. Basic functionality of - PowerPoint PPT Presentation

Program an analysis workflow Day 1. Basic functionality of Chipster (Eija) Microarray data analysis with Chipster Data import (Eija) Quality control (Jarno) 16.-17.4.2008 Normalization (Jarno) Describing the

Peoplesoft Workflow Peoplesoft Workflow Technology Technology Putting Customer First SOA IT

STAR-CCM+ in your Workflow Bill Jester, CD-adapco STAR-CCM+ in your workflow Contents

Day 8 Workflow Cloud Resource Provisioning Todays Agenda Introduction What is workflow?

workflow: workflow: QSPR = Quantitative Structure Property

A Workflow Workflow for for Retrieving Retrieving Orthologous Orthologous A Promoters and I

Design of a Petri Net-based Design of a Petri Net-based Workflow Engine Workflow Engine Simone

Introduction to CONNJUR Workflow Builder and Yes Workflow 2017 Summer Workshop: June 29, 2017

Kap. 12 Workflow Management in ERP-Systemen 12.1 Workflow Management: Konzepte 12.2 Einbindung

Module 4 - Smoothing the Workflow with the Kanban Best Practices Establishing an Even Workflow

Diagnostic Information for Control-Flow Analysis of Workflow Graphs (aka Free-Choice Workflow Nets)

Cbio 16S analysis pipeline Katie Lennard Microbiome analysis workflow Data preprocessing (UCT

Analysis and Optimizations Analysis and Optimizations Program Analysis Program Analysis

The Workflow Evolution (The Learning Curve) The Client The CAA The Utility The Workflow

Same-Day Dental Sealant Workflow Same-Day Sealant Workflow Why implement? Prevention of

Workflow Plus Signature Capture Tool for Synergy Enterprise What is This Tool ? This tool

JV WORKFLOW &amp; BUSINESS EDITS HAND IN HAND WE LEARN 1 University Financial Services JV

Multimedia Indexing and Retrieval Georges Qunot Multimedia Information Modeling and Retrieval

Color Image Indexing Using BTC Author: Guoping Qiu Source: IEEE Transaction on Image Processing,

Improved Soft Decisions in Missing Data ASR: Using Harmonicity in Conjunction with Local SNR

Understanding the Long-Term Self-Similarity of Internet Traffic Steve Uhlig and Olivier

Modeling brain cognitive functions by oscillatory neural networks Institute of Mathematical

Ecological models: A management tool of promising species with biomass potential in the Ecuadorian

Active Data Mining of Correspondence for Qualitative Assessment of Scientific Computations Chris

Completeness via correspondence for extensions of first degree entailment supplied with classical

Sambuz

Useful Links

Newsletter

Mail Us

JV WORKFLOW & BUSINESS EDITS HAND IN HAND WE LEARN 1 University Financial Services JV