Identifying Stage-Specific Genes by Combining Information from Two - - PowerPoint PPT Presentation

identifying stage specific genes by combining information
SMART_READER_LITE
LIVE PREVIEW

Identifying Stage-Specific Genes by Combining Information from Two - - PowerPoint PPT Presentation

Identifying Stage-Specific Genes by Combining Information from Two Types of Oligonucleotide Arrays Yin Liu, Ning Sun, Junfeng Liu, Liang Chen, Michael McIntosh, Liangbiao Zheng and Hongyu Zhao Yale University 1 CAMDA 2004 Yin Liu


slide-1
SLIDE 1

CAMDA 2004 Yin Liu Nov. 12, 2004

1

Identifying Stage-Specific Genes by Combining Information from Two Types of Oligonucleotide Arrays

Yin Liu, Ning Sun, Junfeng Liu, Liang Chen, Michael McIntosh, Liangbiao Zheng and Hongyu Zhao

Yale University

slide-2
SLIDE 2

CAMDA 2004 Yin Liu Nov. 12, 2004

2

Objective

Identify genes differentially expressed in sporozoite and gametocyte stages - potential candidates for transmission blocking vaccine development

slide-3
SLIDE 3

CAMDA 2004 Yin Liu Nov. 12, 2004

3

Outline

  • Description of the datasets
  • Our approaches
  • Sporozoite/gametocyte stages specifically expressed genes

(not expressed in blood stages)

  • Sporozoite/gametocyte stages up-regulated genes

(constantly expressed in blood stages)

  • Results
  • Gene Ontology analysis
  • Protein-protein interactions
  • Conclusions
slide-4
SLIDE 4

CAMDA 2004 Yin Liu Nov. 12, 2004

4

Two Microarray Datasets

  • The DeRisi Data
  • 7,462 long (70mer) oligonucleotides
  • 4,488 genes
  • 46 time points across the complete asexual blood

stages at 1-hour time scale resolution

The Winzeler Data

  • 260,596 25mer probes
  • 5,159 genes
  • six blood stages synchronized by two methods as

well as merozoites, gametocyte and sporozoite stages

slide-5
SLIDE 5

CAMDA 2004 Yin Liu Nov. 12, 2004

5

Negative Controls

  • 281 “EMPTY” spots in the

DeRisi’s array

  • The intensities of these

spots were standardized

  • Summarize the standardized

intensities across all the time points for each “EMPTY” spot

  • The red and green channel

intensities have very similar distribution

slide-6
SLIDE 6

CAMDA 2004 Yin Liu Nov. 12, 2004

6

Genes Not Expressed in Blood Stages

Standardize all the spots by empMeant, empVart: mean and variance of the intensities

  • f the true “EMPTY” spots, respectively

Expression cutoff: 95% percentile of the summarized standardized intensities of “EMPTY” spots 721 genes not expressed across the complete blood stages

slide-7
SLIDE 7

CAMDA 2004 Yin Liu Nov. 12, 2004

7

Identify Genes Specifically Expressed in Sporozoite/Gametocyte Stage

  • Determine an expression cutoff in the Winzeler data

Assume the number of genes identified as not expressed in blood

stages based on Winzeler’s data is the same as what identified based

  • n DeRisi’s data

17% (721/4250) of them are identified as not expressed in blood

stages based on DeRisi’s data

Genes specifically expressed in sporozoite/gametocyte stage

  • Intensity values in sporozoite/gametocyte stages above the cutoff
  • Not expressed in blood stages
slide-8
SLIDE 8

CAMDA 2004 Yin Liu Nov. 12, 2004

8

Representative Genes Specifically Expressed in Sporozoite/Gametocyte Stage

Sporozoite

Sporozoite surface protein 2 Pbs36-related protein rifin-encoded proteins

Gametocyte

25 kDa ookinete surface antigen Gametocyte antigen 377

slide-9
SLIDE 9

CAMDA 2004 Yin Liu Nov. 12, 2004

9

Identify Genes Upregulated in Sporozoite/Gametocyte Stages

  • Goal

Predict the gene expression values on sporozoite and gametocyte

stages in the DeRisi’s dataset using Winzeler’s dataset

  • Nonparametric regression

Local Linear regression

  • Problem

Gene expression values measured in different time scale resolutions

in two datasets

Choose an “invariant” gene set: constantly expressed in blood stages Average the expression values of the “invariant” genes across blood

stages

Kernel function Smoothing parameter

slide-10
SLIDE 10

CAMDA 2004 Yin Liu Nov. 12, 2004

10

Identify Genes Upregulated in Sporozoite/Gametocyte Stages

  • Predict the gene expression

values at sporozoite and gametocyte stages

  • Compare the gene expression

values at blood stages and the values at sporozoite or gametocyte stage

  • Upregulated genes

Expression values increase at least 1.5 fold

slide-11
SLIDE 11

CAMDA 2004 Yin Liu Nov. 12, 2004

11

Results Summary

  • Compare with the results from Winzeler’s study

We obtain larger number of sporozoite- or gametocyte-stage

specific genes

Concordance rate: 78% and 69% Novel genes

  • MAL13P1.304 (malaria surface antigen)
  • MAL13P1.148 (P.falciparum myosin)
slide-12
SLIDE 12

CAMDA 2004 Yin Liu Nov. 12, 2004

12

Gene Ontology Analysis

Gene Ontology database

  • Describe the roles of genes and gene products in
  • rganisms
  • 40% of gene products in P.falciparum were assigned

GO terms

  • Molecular function
  • Biological process
  • Cellular Component

Investigate the enrichment of GO categories of the stage-specific genes

slide-13
SLIDE 13

CAMDA 2004 Yin Liu Nov. 12, 2004

13

Molecular Function

Cell Adhesion Defense/Immunity Protein

slide-14
SLIDE 14

CAMDA 2004 Yin Liu Nov. 12, 2004

14

Biological Process

Cell Communication Metabolism

slide-15
SLIDE 15

CAMDA 2004 Yin Liu Nov. 12, 2004

15

Cellular Component

Extracellular

slide-16
SLIDE 16

CAMDA 2004 Yin Liu Nov. 12, 2004

16

Gametocyte Stage-Specific Genes

Note: The genes identified in Winzeler’s study don’t show different GO term enrichment compared to the overall genes

slide-17
SLIDE 17

CAMDA 2004 Yin Liu Nov. 12, 2004

17

Relate Protein Interaction with Gene Expression Pattern

Purpose

An important component of functional annotation Investigate relationship between gene expression

pattern and protein interactions

It is reasonable to believe that there should be a

relationship between the gene expression pattern and protein interactions

slide-18
SLIDE 18

CAMDA 2004 Yin Liu Nov. 12, 2004

18

Identify Potential Protein Interaction Pairs

“All-against-all” BLASTP comparisons of sequences of the

  • S. cerevisiae and P. falciparum proteomes

Apply program INAPRANOID to identify ortholog groups (orthologs and paralogs) Concept of “interolog”: (A,B) and (A’,B’) Transfer the protein interaction information between species to predict the protein interaction pairs in P.falciparum

slide-19
SLIDE 19

CAMDA 2004 Yin Liu Nov. 12, 2004

19

Protein Interaction Pairs in Sporozoites and Gametocytes

  • ! !
  • "#$%&"%'%

&(&) ")&"&%"&&

slide-20
SLIDE 20

CAMDA 2004 Yin Liu Nov. 12, 2004

20

Conclusions

  • Identification of Sporozoite stage- and gametocyte stage-

specific genes

Well-known stage-specific genes High degree of overlaps between our identified genes and those of

the Winzeler’s study

Significant enrichment for certain GO categories Related to the number of predicted protein interaction pairs

  • Combine information from different sources

A dataset with higher time scale resolution Discover novel stage-specific genes Depend on the data quality of the datasets used

slide-21
SLIDE 21

CAMDA 2004 Yin Liu Nov. 12, 2004

21

References

  • Bozdech Z, et al. The Transcriptome of the Intraerythrocytic Developmental Cycle
  • f Plasmodium falciparum. PLoS Biol. 1(1):E5, 2003.
  • Le Roch KG, et al. Discovery of gene function by expression profiling of the

malaria parasite life cycle. Science 301: 1503-8, 2003.

  • http://biosun01.biostat.jhsph.edu/Eririzarr/Raffy/
  • http://bioinf.wehi.edu.au/limma
  • Bowman A. and Azzalini A. Applied Smoothing Techniques for Data Analysis,

Clarendon Press, Oxford, 1997.

  • Remm M, Storm CE, Sonnhammer EL. Automatic clustering of orthologs and in-

paralogs from pairwise species comparisons. J Mol Biol. 314(5): 1041-52, 2001.

  • Gardner M, et al. Genome sequence of the human malaria parasite Plasmodium
  • falciparum. Nature 419(6906):498-511, 2002
slide-22
SLIDE 22

CAMDA 2004 Yin Liu Nov. 12, 2004

22

Acknowledgements

  • Dr. Hongyu Zhao’s Group
  • Dr. Ning Sun
  • Dr. Junfeng Liu

Liang Chen

  • Dr. Liangbiao Zheng’s Group
  • Dr. Michael McIntosh

This work was supported by NSF grant DMS-0241160 and NIH Institutional Training Grants for Informatics Research.

slide-23
SLIDE 23

CAMDA 2004 Yin Liu Nov. 12, 2004

23

Thank You!