program an analysis workflow
play

Program an analysis workflow Day 1. Basic functionality of - PowerPoint PPT Presentation

Program an analysis workflow Day 1. Basic functionality of Chipster (Eija) Microarray data analysis with Chipster Data import (Eija) Quality control (Jarno) 16.-17.4.2008 Normalization (Jarno) Describing the


  1. Program – an analysis workflow Day 1. � Basic functionality of Chipster (Eija) Microarray data analysis with Chipster � Data import (Eija) � Quality control (Jarno) 16.-17.4.2008 � Normalization (Jarno) • Describing the experiment � Filtering and missing value considerations (Jarno) Jarno Tuimala Day 2. Eija Korpelainen � Statistical testing (Jarno) � Clustering and visualization (Jarno) � Annotation (Eija) � Promoter analysis (Eija) � Experimental design (Jarno) – if time allows Demo data � Affymetrix • Kidney cancer Introduction to microarrays Introduction to microarrays • 8 controls, 9 cancer patients � Agilent • Acute leukemia • 7 controls, 7 FLT mutated � Illumina • Teratozoospermia • 5 controls, 8 affected 1

  2. Research using microarrays � Plan! • Experimental design Introduction to Chipster � Laboratory work • Extract, label, hybridize � Computer work • Scanning, image analysis • Bioinformatics � Laboratory work • Confirmation � Publish • Submit data to public databases How does it work? Chipster � Goal: Easy access to leading analysis tools such as those developed in the CSC internet desktop R/Bioconductor project � Features • Easy to use graphical user interface • Comprehensive selection of tools SSL front Java Web Start security client • Support for different array types (Affymetrix, Agilent, Illumina, cDNA) installs and end • Compatible with Windows, Linux and Mac OS X updates client • Easy to install and update automatically SOAP • Wizards and workflows • Interactive graphics analyser • Transparency (as opposed to “black box”) • Alternative annotations for Affymetrix arrays Corona/Murska • Automatic tracking of performed analyses international Web Services VISUALISATION ANALYSIS � http://www.csc.fi/english/customers/university/useraccounts/scientificservices.pdf � http://chipster.csc.fi 2

  3. Acknowledgements � Aleksi Kallio � Jarno Tuimala � Taavi Hupponen � Mika Rissanen, Janne Käki, Mikko Koski, Petri Klemelä � All the pilot users � Department of computer science (HY) � Dario Greco (HY) � Prof. Olli Yli-Harja’s group (TUT) � GeneCruiser team (MIT Broad Institute) � Tekes/SA SYSBIO-program Phenodata – describing your experiment � Phenodata file is created during normalization Tools � Fill in the group column with numbers describing your experimental setup • e.g. 1 = healthy control, 2 = cancer sample Data • necessary for the statistical tests to work � If you bring in previously created normalized data and phenodata: • Choose ”import directly” in the import tool • Right click on normalized data, choose ”Link to” phenodata and link type ”Annotation” � If you brought in normalized data and need to create phenodata for it: Visualization • Utilities/ Generate phenodata (fill in the chiptype parameter!) • Right click on normalized data, choose ”Link to” phenodata and link type ”Annotation” • Fill in the group column 3

  4. Interactive visualizations by the client Visualizing the data � Spreadsheet � Histogram � Data visualization panel � Scatterplot • Maximize and redraw for better viewing � 3D scatterplot � Expression profiles � Two types of visualizations � Clustered profiles 1. Interactive visualizations produced by the client program � Hierarchical clustering • Select the visualization method from the pulldown menu of the data � SOM clustering visualization panel � Array pseudo-image • Save by right clicking on the image 2. Static images produced by R/Bioconductor, Weeder, etc • Select from Analysis tools/ Visualisation Available actions: • View by double clicking on the image file � Change titles, colors etc • Save by right clicking on the file name and choosing ”Export” � Zoom in/out � Select and annotate genes using the MIT GeneCruiser 4

  5. Static images produced by R/Bioconductor � Volcano plot � Box plot � Histogram � Heatmap � Venn diagram � Idiogram � Chromosomal position � Correlogram � Dendrogram � QC stats plot � RNA degradation plot � K-means clustering � SOM-clustering Automatic tracking of analysis history Running many analyses simultaneously � You can have max 5 analysis jobs running at the same time � Use Task manager to • view parameters, status,… • cancel jobs 5

  6. Workflow – reusing your analysis pipeline Workspace – continue later/elsewhere � Creates a ”macro” that can be applied to another normalized dataset and � Saving your workspace allows you to continue later phenodata • File/ Save workflow • File/ Load workflow � Choose a dataset, and workflow records the analysis steps that lead to that dataset � Currently it is possible to have only one workspace saved at the time � You can give the workflow a meaningful name (ending .bsh), but it has to be � If you would like to continue your work on another computer, you located in the chipster-scripts folder need to transfer the workspace-snapshot -folder to the corresponding location under nami-work-files • C:\Documents and Settings\ekorpela\nami-work-files\workspace-snapshot � You can run a workflow on another computer by making it visible to Chipster with ”Reload workflows from disk” � You can change parameters directly to the workflow file Wizard – autopilot for analysis Wizard for Affymetrix data � Ready-made workflow to find differentially expressed genes • Normalization • Phenodata creation • Statistical test • Hierarchical clustering 6

  7. Importing files Import tool, step 1 � Affymetrix CEL-files are imported to � Define Chipster automatically • Header • Footer � Other files are imported using the • Title row Import tool • Delimiter Import tool, step 2 Importing Agilent files � Define columns � Sample (rMeanSignal) � Modify flags � Sample background (rBGMedianSignal) � Control (gMeanSignal) � Control background (gBGMedianSignal) � Identifier (ProbeUID) � Annotation (ControlType) � https://extras.csc.fi/biosciences/chipster-manual/data-formats.html 7

  8. Exercise I 1. Import the demo data of your favorite type in Chipster � Affymetrix Exercise � Agilent 2. Save the workspace 3. Have lunch (back at 13.00) Quality control tools � Quality control -tools • Affymetrix basic Quality control RNA degradation + Affy QC • Affymetrix RLE & NUSE (might take a long time to run) Fits a model to expression values • Agilent MA-plot + density plot + boxplot � Visualization – dendrogram � Statistics - NMDS 8

  9. Affymetrix I Affymetrix II � Quality control tools are run on raw data (CEL files). • Dendrogram and NMDS on normalized data Agilent General QC – dendrogram and NMDS 9

  10. Scatterplots Heatmaps (this took an hour to calculate) QC-tools in Chipster � Quality control • Affymetrix basic Normalization • Affymetrix RLE and NUSE • Agilent 2-color � Visualization • Dendrogram • Heatmap • Correlogram � Statistics • NMDS 10

  11. What is normalization? Methods � Normalization is the process of removing systematic � Affymetrix variation from the data. • Background correction + expression estimation + summarization � Typically you would normalize your data so that all the • RMA (default) uses only PM probes, fits a model to them, and gives out chips become comparable. expression values after quantile normalization and median polishing � Agilent • Background correction + averaging duplicate spots + normalization � After normalization the expression values are always expressed on log2-scale Affymetrix Agilent I � Methods: MAS5, Plier, RMA, GCRMA, Li-Wong � Background correction • MAS5 is the older Affymetrix method, Plier is a newer one • RMA is the default, and works rather nicely if you have more than a • Background treatment few chips None, Subtract, Edwards, Normexp • GCRMA is similar to RMA, but takes also GC% content into account • Background offset • Li-Wong is the method implemented in dChip 0 or 50 � Variance stabilization makes the variance over all the chips � Normalize chips similar • None, median, loess • Works only with MAS5 and Plier, since all others output log2- � Normalize genes (not typically used) tranformed data by default (and thus corrected for the same • None , scale (to median), quantile phenomenon) � Chiptype � Custom chiptype • A must setting! • If you want to use reannotated probes (they are really assigned to the genes where they belong), select one from this menu. 11

  12. Checking normalization Agilent II � Background treatment typically generates many negative values that are coded as missing values after log2-transformation. • Usual subtract option does this • Using normexp + offset 50 will generate no negative values, and gives rather good estimates (best method reported) � Loess removes curvature from the data (suggested) Exercise II � Normalize your dataset • Use two different normalization schemes Exercise � Describe the experiment (fill in phenodata) � Check the quality of your dataset • Is there difference between the normalization schemes • If there is, select the better one, and continue with it 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend