Cancer Classification Using Cancer Classification Using Informative - PowerPoint PPT Presentation

Cancer Classification Using Cancer Classification Using Informative Gene Profiles Informative Gene Profiles Xue- -wen wen Chen Chen Xue Bioinformatics and Computational Life- -Sciences Laboratory Sciences Laboratory Bioinformatics and Computational Life The university of Kansas The university of Kansas Interface 2004, Baltimore

OUTLINE OUTLINE • Introduction • Introduction • Microarray • Microarray Data Analyses Data Analyses • Bootstrapped GA/Margin Methods • Bootstrapped GA/Margin Methods • Experiment Results • Experiment Results • Discussions • Discussions

INTRODUCTION INTRODUCTION • Traditional biology • Traditional biology : one (or few) gene in one : one (or few) gene in one experiment, hard to capture the “ “whole picture whole picture” ” of of experiment, hard to capture the gene function gene function • Microarray • Microarray : monitor thousands of genes on a : monitor thousands of genes on a single chip simultaneously; provides a better single chip simultaneously; provides a better understanding of the interactions among genes; understanding of the interactions among genes; helps explore the underlying genetic causes of helps explore the underlying genetic causes of many human diseases. many human diseases.

MICROARRAY: CANCER CLASSIFICATION MICROARRAY: CANCER CLASSIFICATION • Microarray • Microarray has been successfully applied to has been successfully applied to cancer classification problems cancer classification problems • According to • According to Dudoit Dudoit, , Fridlyand Fridlyand, and Speed, , and Speed, there are three main problems related to there are three main problems related to microarray based cancer classification: based cancer classification: microarray – Cancer discovery (clustering) – Cancer discovery (clustering) – Cancer classification into known classes – Cancer classification into known classes (supervised learning) (supervised learning) – Identification of gene – Identification of gene “ “markers markers” ” (gene selection) (gene selection)

UNSUPERVISED METHODS: CLUSTERING UNSUPERVISED METHODS: CLUSTERING Partition genes (or samples) into homogeneous groups in order to Partition genes (or samples) into homogeneous groups in order to explore the similarity among genes explore the similarity among genes • Hierarchical Clustering • Hierarchical Clustering (Eisen Eisen et al. Proc. Natl. et al. Proc. Natl. ( Acad. Sci Sci. 1998) . 1998) Acad. • SOMs • SOMs ( (Tamayo Tamayo et al. et al. Proc. Natl. Acad. Sci Sci., ., Proc. Natl. Acad. 1999) 1999) • K • K- -means means ( (Tavazoie Tavazoie et et al. Nature Genetics, al. Nature Genetics, 1999) 1999) • More • More

SUPERVISED LEARNING SUPERVISED LEARNING • Learning (Training) Task • Learning (Training) Task – Given: Expressed gene profiles of cells and their class – Given: Expressed gene profiles of cells and their class lables lables – Learn: Models distinguishing cells of one class from – Learn: Models distinguishing cells of one class from cells in other classes (genes are features) cells in other classes (genes are features) • Classification (Test) Task • Classification (Test) Task – Given: Expression profile of a cell whose class is – Given: Expression profile of a cell whose class is unknown unknown – Test: Predict the class to which this cell belongs – Test: Predict the class to which this cell belongs

SUPERVISED LEARNING METHODS SUPERVISED LEARNING METHODS • Neural Networks • Neural Networks ( (Mateos Mateos et al. 2002) et al. 2002) • K • K- -nearest Neighbors nearest Neighbors ( (Theilhaber Theilhaber et al. 2002) et al. 2002) • Support Vector Machines • Support Vector Machines (Brown et al. 2000) (Brown et al. 2000) • Fisher • Fisher Discriminant Discriminant Analysis Analysis ( (Dudoit Dudoit et al. 2002) et al. 2002) • Decision Trees • Decision Trees ( (Dubitzky Dubitzky et al. 2000) et al. 2000) • And more • And more

CHALLENGES IN LEARNING MICROARRAY DATA CHALLENGES IN LEARNING MICROARRAY DATA • High dimensionality: • High dimensionality: in in microarray microarray data analysis, the data analysis, the number of features (genes) is normally much larger number of features (genes) is normally much larger than the # of training samples. than the # of training samples. • Often noisy and not normally distributed ( • Often noisy and not normally distributed (Hunter et Hunter et al. 2001, bioinformatics) ) al. 2001, bioinformatics • Too many features are not desirable in learning: • Too many features are not desirable in learning: poor poor generalization is expected (or overfitting overfitting). ). generalization is expected (or • Essential to reduce the # of genes to use • Essential to reduce the # of genes to use

GENE SELECTION (MARKER IDENTIFICATION) GENE SELECTION (MARKER IDENTIFICATION) • Feature selection is essential • Feature selection is essential to reduce the test to reduce the test errors in microarray microarray data classification. data classification. errors in • Given such huge amount of data • Given such huge amount of data, we need to , we need to remove genes irrelevant to the learning remove genes irrelevant to the learning problems problems • For diagnostics or identification • For diagnostics or identification of therapeutic of therapeutic targets, a small subset of discriminant discriminant genes is genes is targets, a small subset of needed needed

GENE SELECTION GENE SELECTION Golub et al. (1999): et al. (1999): [mean(+) [mean(+) – – mean( mean(- -)]/[std(+) + std( )]/[std(+) + std(- -)]. )]. Golub Xing et al. (2001): information gain to rank genes. information gain to rank genes. Xing et al. (2001): Long et al. (2001): t t- -test with a Gaussian model test with a Gaussian model Long et al. (2001): Furey et al. (2000): et al. (2000): the Fisher score the Fisher score Furey Newton et al.(2001): a Gamma a Gamma- -Gamma Gamma- -Bernoulli model Bernoulli model Newton et al.(2001): Kerr et al., (2000): ANOVA A F ANOVA A F- -statistics statistics Kerr et al., (2000): Dudoit et al. (2002): et al. (2002): a nonparametric t a nonparametric t- -test test Dudoit Bo and Jonassen Jonassen (2002), (2002), Inza Inza et al. (2002): et al. (2002): Forward selection Forward selection Bo and Khan et al. (2001): PCA PCA Khan et al. (2001): Li et al. (2001): GA/ GA/knn knn Li et al. (2001): more … … more Univariate vs. vs. Multivariate Multivariate Univariate Filter vs. wrapper wrapper Filter vs.

IN THIS PAPER IN THIS PAPER • A method for: • A method for: – Cancer classification and gene identification – Cancer classification and gene identification – Simultaneously – Simultaneously • Wrapper methods • Wrapper methods

Gene Selection: General Idea Gene Selection: General Idea Feature Criterion Search = + Selection Function Algorithm Criterion function: s s hould generalize (predict) well Criterion function: hould generalize (predict) well (wrapper); particularly important in microarray microarray data data (wrapper); particularly important in classifications, since very limited training samples are available. le. classifications, since very limited training samples are availab Search algorithms: eff eff icient for very high Search algorithms: icient for very high- -d data (e.g., # d data (e.g., # genes ~ 2000) in terms of both computation time and solutions genes ~ 2000) in terms of both computation time and solutions Margin: : ability to generalize; used as the criterion function Margin ability to generalize; used as the criterion function GAs: : better performance than SFS, much faster than GAs better performance than SFS, much faster than exhaustive search; used as the search algorithm exhaustive search; used as the search algorithm Bootstrapping: because of limited training samples because of limited training samples Bootstrapping:

MAXIMUM MARGIN MAXIMUM MARGIN =-1 =+1 maximizing the margin margin (the minimum distance (the minimum distance maximizing the between a hyperplane hyperplane that separates two classes and that separates two classes and between a the closest training samples to the decision surface). the closest training samples to the decision surface). Motivation: Obtain tightest possible bounds for Obtain tightest possible bounds for Motivation: generalization ; is capable of avoiding overfitting overfitting generalization ; is capable of avoiding

MARGIN MARGIN H1 H2 d + d - H ● Define Define the hyperplane H the hyperplane H such such that: that: ● +b ≥ ≥ +1 • w +1 when when y y i =+1 i • w +b i =+1 x i x +b ≤ ≤ - • w -1 1 when when y y i =- -1 1 i • w +b i = x i x

Cancer Classification Using Cancer Classification Using Informative - PowerPoint PPT Presentation

Cancer Classification Using Cancer Classification Using Informative Gene Profiles Informative Gene Profiles Xue- -wen wen Chen Chen Xue Bioinformatics and Computational Life- -Sciences Laboratory Sciences Laboratory Bioinformatics and

Graph Classification Classification Outline Introduction, Overview Classification using

Meritus Health Systems 1 Breast Cancer Breast Cancer is cancer that forms in breast cells

A microbial beacon for cancer detection Primary Metastasis cancer 1 WHO Cancer Fact Sheet N

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Nutrition and Cancer: Fighting Cancer With Food Fighting Cancer With Food Living a healthy

Cancer Program m e Board Patient-Centred Cancer Services Patient-Centred Care Project Aim s To

Overview of New Therapies for Ovarian Cancer & Prostate Cancer Prostate and Ovarian cancer

BLADDER CANCER 101: BCAN RESOURCES TO REACH & TEACH ABOUT BLADDER CANCER #BCANSummit18 1

BREAST CANCER PREVENTION AND SCREENING What is Breast Cancer Why Breast Cancer Screening is

The Word When Kids Get CANCER CANCER ? ? CANCER What is CANCER To answer this question we

Cancer Risk among Florida Fire Rescue Personnel David J Lee, PhD For the Firefighter Cancer

Genetic Testing in Cancer Hereditary cancer accounts for only a small portion of all cancer

Management of Cervical Cancer in Resource Limited Settings Linus Chuang MD Cervix Cancer

Lung Cancer : Lung Cancer : Lung Cancer : Lung Cancer : Improving the Cure Rate Improving the

Anuja Jhingran, MD Cervix Cancer Education Symposium, January 2019 Gynecologic Cancer InterGroup

BREAST CANCER DATA ANALYSIS SAIF UR-REHMAN CANCER INFORMATICS BREAKTHROUGH BREAST CANCER

Presentation Exercise: Chapter 12 Multiple Choice. This chapter a. is another long and important

Roadside Marker Inventory Massachusetts Bay Colony- Tercentenary Commission Markers (MBC-TC)

Columbia River Sturgeon Angling Regulations Upstream of Bonneville Dam Tucker Jones Ocean Salmon

CORUS Co-receptor usage as a marker for specific HIV diagnostics with high sensitivity CXCR4

Corporate Presentation v April 2017 Safe harbor statement This presentation contains

Corporate Presentation Forward Looking Statements This presentation contains certain

- EU SPIDIA Project - Pre-analytical handling of biosamples; optimising biobank sample quality

ISBT WORKING PARTY ON RARE DONORS A National Rare Donor Program Experience and Challenges Coral

Sambuz

Useful Links

Newsletter

Mail Us

Cancer Classification Using Cancer Classification Using Informative - PowerPoint PPT Presentation

Cancer Classification Using Cancer Classification Using Informative Gene Profiles Informative Gene Profiles Xue- -wen wen Chen Chen Xue Bioinformatics and Computational Life- -Sciences Laboratory Sciences Laboratory Bioinformatics and

Graph Classification Classification Outline Introduction, Overview Classification using

Meritus Health Systems 1 Breast Cancer Breast Cancer is cancer that forms in breast cells

A microbial beacon for cancer detection Primary Metastasis cancer 1 WHO Cancer Fact Sheet N

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Nutrition and Cancer: Fighting Cancer With Food Fighting Cancer With Food Living a healthy

Cancer Program m e Board Patient-Centred Cancer Services Patient-Centred Care Project Aim s To

Overview of New Therapies for Ovarian Cancer &amp; Prostate Cancer Prostate and Ovarian cancer

BLADDER CANCER 101: BCAN RESOURCES TO REACH &amp; TEACH ABOUT BLADDER CANCER #BCANSummit18 1

BREAST CANCER PREVENTION AND SCREENING What is Breast Cancer Why Breast Cancer Screening is

The Word When Kids Get CANCER CANCER ? ? CANCER What is CANCER To answer this question we

Cancer Risk among Florida Fire Rescue Personnel David J Lee, PhD For the Firefighter Cancer

Genetic Testing in Cancer Hereditary cancer accounts for only a small portion of all cancer

Management of Cervical Cancer in Resource Limited Settings Linus Chuang MD Cervix Cancer

Lung Cancer : Lung Cancer : Lung Cancer : Lung Cancer : Improving the Cure Rate Improving the

Anuja Jhingran, MD Cervix Cancer Education Symposium, January 2019 Gynecologic Cancer InterGroup

BREAST CANCER DATA ANALYSIS SAIF UR-REHMAN CANCER INFORMATICS BREAKTHROUGH BREAST CANCER

Presentation Exercise: Chapter 12 Multiple Choice. This chapter a. is another long and important

Roadside Marker Inventory Massachusetts Bay Colony- Tercentenary Commission Markers (MBC-TC)

Columbia River Sturgeon Angling Regulations Upstream of Bonneville Dam Tucker Jones Ocean Salmon

CORUS Co-receptor usage as a marker for specific HIV diagnostics with high sensitivity CXCR4

Corporate Presentation v April 2017 Safe harbor statement This presentation contains

Corporate Presentation Forward Looking Statements This presentation contains certain

- EU SPIDIA Project - Pre-analytical handling of biosamples; optimising biobank sample quality

ISBT WORKING PARTY ON RARE DONORS A National Rare Donor Program Experience and Challenges Coral

Sambuz

Useful Links

Newsletter

Mail Us

Overview of New Therapies for Ovarian Cancer & Prostate Cancer Prostate and Ovarian cancer

BLADDER CANCER 101: BCAN RESOURCES TO REACH & TEACH ABOUT BLADDER CANCER #BCANSummit18 1