Consistent Biclustering via Fractional 01 Programming Panos - PowerPoint PPT Presentation

Consistent Biclustering via Fractional 0–1 Programming Panos Pardalos, Stanislav Busygin and Oleg Prokopyev Center for Applied Optimization Department of Industrial & Systems Engineering University of Florida Consistent Biclustering via Fractional 0–1 Programming

Introduction Consistent Biclustering Conclusions Massive Datasets The proliferation of massive datasets brings with it a series of special computational challenges. This data avalanche arises in a wide range of scientific and commercial applications. In particular, microarray technology allows one to grasp simultaneously thousands of gene expressions throughout the entire genome. To extract useful information from such datasets a sophisticated data mining algorithm is required. Consistent Biclustering via Fractional 0–1 Programming

Introduction Consistent Biclustering Conclusions Massive Datasets Abello, J.; Pardalos, P .M.; Resende, M.G. (Eds.), Handbook of Massive Data Sets, Series: Massive Computing, Vol. 4, Kluwer, 2002. Consistent Biclustering via Fractional 0–1 Programming

Introduction Consistent Biclustering Conclusions Data Representation A dataset (e.g., from microarray experiments) is normally given as a rectangular m × n matrix A , where each column represents a data sample (e.g., patient) and each row represents a feature (e.g., gene): A = ( a ij ) m × n , where the value a ij is the expression of i -th feature in j -th sample. Consistent Biclustering via Fractional 0–1 Programming

Introduction Consistent Biclustering Conclusions Major Data Mining Problems Clustering (Unsupervised): Given a set of samples partition them into groups of similar samples according to some similarity criteria. Classification (Supervised Clustering): Determine classes of the test samples using known classification of training data set. Feature Selection: For each of the classes, select a subset of features responsible for creating the condition corresponding to the class (it’s also a specific type of dimensionality reduction ). Outlier Detection: Some of the samples are not good representative of any of the classes. Therefore, it is better to disregard them while preforming data mining. Consistent Biclustering via Fractional 0–1 Programming

Introduction Consistent Biclustering Conclusions Major challenges in Data Mining Typical noisiness of data arising in many data mining applications complicates solution of data mining problems. High-dimensionality of data makes complete search in most of data mining problems computationally infeasible. Some data values may be inaccurate or missing. The available data may be not sufficient to obtain statistically significant conclusions. Consistent Biclustering via Fractional 0–1 Programming

Introduction Consistent Biclustering Conclusions Biclustering Biclustering is a methodology allowing for feature set and test set clustering (supervised or unsupervised) simultaneously. It finds clusters of samples possessing similar characteristics together with features creating these similarities. The required consistency of sample and feature classification gives biclustering an advantage over other methodologies treating samples and features of a dataset separately of each other. Consistent Biclustering via Fractional 0–1 Programming

Introduction Consistent Biclustering Conclusions Biclustering Figure: Partitioning of samples and features into 2 classes. Consistent Biclustering via Fractional 0–1 Programming

Introduction Consistent Biclustering Conclusions Survey on Biclustering Methodologies “Direct Clustering” (Hartigan) The algorithm begins with the entire data as a single block and then iteratively finds the row and column split of every block into two pieces. The splits are made so that the total variance in the blocks is minimized. The whole partitioning procedure can be represented in a hierarchical manner by trees. Drawback: this method does NOT optimize a global objective function. Consistent Biclustering via Fractional 0–1 Programming

Introduction Consistent Biclustering Conclusions Survey on Biclustering Methodologies Cheng & Church’s algorithm The algorithm constructs one bicluster at a time using a statistical criterion – a low mean squared resedue (the variance of the set of all elements in the bicluster, plus the mean row variance and the mean column variance). Once a bicluster is created, its entries are replaced by random numbers, and the procedure is repeated iteratively. Consistent Biclustering via Fractional 0–1 Programming

Introduction Consistent Biclustering Conclusions Survey on Biclustering Methodologies Graph Bipartitioning Define a bipartite graph G ( F , S , E ) , where F is the set of data set features, S is the set of data set samples, and E are weighted edges such that the weight E ij = a ij for the edge connecting i ∈ F with j ∈ S . The biclustering corresponds to partitioning of the graph into bicliques. Consistent Biclustering via Fractional 0–1 Programming

Introduction Consistent Biclustering Conclusions Survey on Biclustering Methodologies Given vertex subsets V 1 and V 2 , define � � cut ( V 1 , V 2 ) = a ij i ∈ V 1 j ∈ V 2 and for k vertex subsets V 1 , V 2 , . . . , V k , � cut ( V 1 , V 2 , . . . , V k ) = cut ( V i , V j ) i < j Consistent Biclustering via Fractional 0–1 Programming

Introduction Consistent Biclustering Conclusions Survey on Biclustering Methodologies Biclustering may be performed as V 1 , V 2 ,..., V k cut ( V 1 , V 2 , . . . , V k ) , min on G or with some modification of the definition of cut to favor balanced clusters. This problem is NP -hard, but spectral heuristics show good performance [ Dhillon ] Consistent Biclustering via Fractional 0–1 Programming

Introduction Consistent Biclustering Conclusions Biclustering: Applications Biological and Medical: Microarray data analysis Analysis of drug activity, Liu and Wang (2003) Analysis of nutritional data, Lazzeroni et al. (2000) Consistent Biclustering via Fractional 0–1 Programming

Introduction Consistent Biclustering Conclusions Biclustering: Applications Text Mining: Dhillon (2001, 2003) Marketing: Gaul and Schader (1996) Dimensionality Reduction in Databases: Agrawal et al. (1998) Others: electoral data - Hartigan (1972) currency exchange - Lazzeroni et al. (2000) Consistent Biclustering via Fractional 0–1 Programming

Introduction Consistent Biclustering Conclusions Biclustering: Surveys S. Madeira, A.L. Oliveira, Biclustering Algorithms for Biological Data Analysis: A Survey, 2004. A. Tanay, R. Sharan, R. Shamir, Biclustering Algorithms: A Survey, 2004. D. Jiang, C. Tang, A. Zhang, Cluster Analysis for Gene Expression Data: A Survey, 2004. Consistent Biclustering via Fractional 0–1 Programming

Introduction Conception of Consistent Biclustering Consistent Biclustering Supervised Biclustering Conclusions Unsupervised Biclustering Definitions Data set of n samples and m features is a matrix A = ( a ij ) m × n , where the value a ij is the expression of i -th feature in j -th sample. We consider classification of the samples into classes S 1 , S 2 , . . . , S r , S k ⊆ { 1 . . . n } , k = 1 . . . r , S 1 ∪ S 2 ∪ . . . ∪ S r = { 1 . . . n } , S k ∩ S ℓ = ∅ , k , ℓ = 1 . . . r , k � = ℓ. Consistent Biclustering via Fractional 0–1 Programming

Introduction Conception of Consistent Biclustering Consistent Biclustering Supervised Biclustering Conclusions Unsupervised Biclustering Definitions This classification should be done so that samples from the same class share certain common properties. Correpondingly, a feature i may be assigned to one of the feature classes F 1 , F 2 , . . . , F r , F k ⊆ { 1 . . . m } , k = 1 . . . r , F 1 ∪ F 2 ∪ . . . ∪ F r = { 1 . . . m } , F k ∩ F ℓ = ∅ , k , ℓ = 1 . . . r , k � = ℓ, in such a way that features of the class F k are “responsible” for creating the class of samples S k . Consistent Biclustering via Fractional 0–1 Programming

Introduction Conception of Consistent Biclustering Consistent Biclustering Supervised Biclustering Conclusions Unsupervised Biclustering Definitions This may mean for microarray data, for example, strong up-regulation of certain genes under a cancer condition of a particular type (whose samples constitute one class of the data set). Such a simultaneous classification of samples and features is called biclustering (or co-clustering ). Consistent Biclustering via Fractional 0–1 Programming

Introduction Conception of Consistent Biclustering Consistent Biclustering Supervised Biclustering Conclusions Unsupervised Biclustering Definitions Definition A biclustering of a data set is a collection of pairs of sample and feature subsets B = (( S 1 , F 1 ) , ( S 2 , F 2 ) , . . . , ( S r , F r )) such that the collection ( S 1 , S 2 , . . . , S r ) forms a partition of the set of samples, and the collection ( F 1 , F 2 , . . . , F r ) forms a partition of the set of features. Consistent Biclustering via Fractional 0–1 Programming

Consistent Biclustering via Fractional 01 Programming Panos - PowerPoint PPT Presentation

Consistent Biclustering via Fractional 01 Programming Panos Pardalos, Stanislav Busygin and Oleg Prokopyev Center for Applied Optimization Department of Industrial & Systems Engineering University of Florida Consistent Biclustering via

Convex Biclustering Eric Chi Rice University joint work with Genevera Allen and Rich Baraniuk

Efficient Numerical Methods for Fractional Laplacian and time fractional PDEs Jie Shen Purdue

Ekaterina Nosova DMI Dept of Mathematics and Informatics, University of Salerno, Italy

Just-In-TimeReview Sections 18-21 JIT18: SimplifyingRatio- nalExpressions Fractional

Feasibility of Consistent, Feasibility of Consistent, Feasibility of Consistent, Feasibility of

Constraint Solving via Fractional Edge Covers Martin Grohe and D aniel Marx

Constraint Solving via Fractional Edge Covers D aniel Marx Joint work with Martin Grohe

CSS Modules with BEM Consistent Design Consistent Design Different Module Versions Consistent

General Structure of a PW code Self-Consistent KS eqs. or Global Minimization approach

An introduction to fractional calculus Mohammad Hossein Heydari Department of Mathematics, Shiraz

Potency test Potency test Fractional beta- -cell Viability cell Viability Fractional beta

Fractional Gaussian Noise, Fractional Gaussian Noise, Subdiffusion and Stochastic and Stochastic

Fractional L evy processes Heikki Tikanm aki Stockholm, March 15 2010 Heikki Tikanm aki

Analysis of the controllability of space-time fractional diffusion and super diffusion equations

NON-SYMMETRIC FRACTIONAL DIFFUSION NON-SYMMETRIC FRACTIONAL DIFFUSION AS A SPECIAL CASE OF AS A

A New Fractional Process: A Fractional Non-homogeneous Poisson Process Enrico Scalas University

CS 188: Artificial Intelligence Markov Models Instructors: Sergey Levine and Stuart Russell

Scaling Communication-Intensive Applications on BlueGene/P Using One- Sided Communication and

Scalable Machine Learning 6. Kernels Alex Smola Yahoo! Research and ANU

ON A CHINESE BUS DNA IS A CODE SCIENTIFIC INFERENCE: DESIGN IN BIOLOGY 1. The pattern in DNA is

Genetics and/of basket options Wolfgang Karl Hrdle Elena Silyakova Ladislaus von Bortkiewicz

Parallel and Hybrid Evolutionary Algorithm in Python E. Kieffer UL HPC Userssession -- UL

Artificial Intelligence Database Performance Tuning Roel Van de Paar Percona Agenda GA:

Forensic Feature Extraction and Cross-Drive Analysis Simson L. Garfinkel Center for Research on