Optimization Challenges in Cell Identification Stefan Wild Argonne - PowerPoint PPT Presentation

Optimization Challenges in Cell Identification Stefan Wild Argonne National Laboratory Mathematics and Computer Science Division Joint work with Sven Leyffer, Thanh Ngo, and Siwei Wang August 1, 2012

Disconnect and OPT ( f, c ) = min x ∈ R n { f ( x ) : c ( x ) ≤ 0 } Gap between science, formulated problem, and algorithmic solution ⋄ “Solving OPT ( f, c ) results in overfitting.” ⋄ “Solution to OPT ( f, c ) must be post-processed.” ⋄ “What is OPT ( f, c ) ? I just have an algorithm that gives me the solution.” ⋄ “I can’t solve the science, but I can solve OPT ( f, c ) .” ⋄ “I don’t know how to solve OPT ( f, c ) on a (large) cluster.” CScADS 12 1

Disconnect and OPT ( f, c ) = min x ∈ R n { f ( x ) : c ( x ) ≤ 0 } Gap between science, formulated problem, and algorithmic solution ⋄ “Solving OPT ( f, c ) results in overfitting.” ⋄ “Solution to OPT ( f, c ) must be post-processed.” ⋄ “What is OPT ( f, c ) ? I just have an algorithm that gives me the solution.” ⋄ “I can’t solve the science, but I can solve OPT ( f, c ) .” ⋄ “I don’t know how to solve OPT ( f, c ) on a (large) cluster.” I will not close this gap! ⋄ Initial examples on (nonlinear) continuous-discrete-mixed numerical/math optimization for data analysis (many [,better] others) ⋄ Experimental data CScADS 12 1

Part 1: Elemental Maps

Multi-Dim. Imaging in X-ray Fluorescence Microscopy Science challenges in Nano-medicine and Theranostics ⋄ Design new treatment and drugs for targeted drug delivery ⋄ Combine therapy and diagnostics by targeting nanoparticles at cancer ⋄ Extract efficiency score from multiple sources of data (instruments) � X-ray, fluorescent, and visible light images CScADS 12 3

Manually Finding Cells is Difficult* CScADS 12 4

Manually Finding Cells is Difficult* red blood cells CScADS 12 4

Manually Finding Cells is Difficult* algae cells CScADS 12 4

Manually Finding Cells is Difficult* yeast cells CScADS 12 4

Challenges and Goals Accurate statistics/recognition of hundreds of cells and elemental distributions within regions of interest 1. Lack of manual annotations 2. Nonuniformity of cells/noise/background A first task: Data reduction ⋄ Raw energy channel maps → elemental maps ⋄ People only look at a handful of “elements” rather than 2000 channels X e,p number of photons arriving at location p , range of energies around e X non-negative energy channel × pixel matrix (think: 10 3 × 10 7 ) CScADS 12 5

2D (Channel-Pixel) Optimization Approaches (I) Unconstrained low-rank approximation �� 2 � X − W H T � F : W ∈ R m × k , H ∈ R k × n min � � � ⋄ k ≪ min( m, n ) known k ˜ ⋄ W i H T X = � i i =1 ⋄ W = channel basis ⋄ H = pixel basis ⋄ Solved by SVD (unknown W and H ) � W 1 , H 1 non-negative � W i , H i mixed signs for i > 1 CScADS 12 6

2D (Channel-Pixel) Optimization Approaches (I) Unconstrained low-rank approximation �� 2 � X − W H T � F : W ∈ R m × k , H ∈ R k × n min � � � ⋄ k ≪ min( m, n ) known 0 10 Avg W 1 k ˜ ⋄ W i H T X = � W 2 −1 i 10 −W 2 i =1 ⋄ W = channel basis −2 10 ⋄ H = pixel basis −3 10 ⋄ Solved by SVD (unknown W and H ) −4 10 � W 1 , H 1 non-negative � W i , H i mixed signs for i > 1 0 200 400 600 800 1000 CScADS 12 6

2D (Channel-Pixel) Optimization Approaches (II) Constrained approximation �� 2 � � X − W H T � F : W ∈ R m × k , H ∈ R k × n , W ≥ 0 , H ≥ 0 min � � � Non-negative matrix factorization (NMF) ⋄ W = channel basis ⋄ H = pixel basis ⋄ Preserve structure and approximation ⋄ Multiplicative update algorithms ( XH ) i,j � W i,j ← W i,j ( W ( H T H )) i,j ( W T X ) i,j � H j,i ← H j,i (( W T W ) H T ) i,j ⋄ Other formulations (nnz ( W ) ≤ θ ) CScADS 12 7

2D (Channel-Pixel) Optimization Approaches (II) Constrained approximation �� 2 � � X − W H T � F : W ∈ R m × k , H ∈ R k × n , W ≥ 0 , H ≥ 0 min � � � Non-negative matrix factorization (NMF) P ⋄ W = channel basis ⋄ H = pixel basis ⋄ Preserve structure and Cu approximation × ≈ ⋄ Multiplicative update algorithms Zn ( XH ) i,j � W i,j ← W i,j ( W ( H T H )) i,j ( W T X ) i,j � H j,i ← H j,i (( W T W ) H T ) i,j ⋄ Other formulations (nnz ( W ) ≤ θ ) CScADS 12 7

Revealing Latent Structure Through NMF ⋄ Non-negative output compatible with intuitive psychological and physiological evidence ⋄ Reconstruction through additive combination of nonnegative W i,j yields ∗ sparse, parts-based representation Applications Natural language processing ⋄ Sparsity helps! Bag-of-words ⋄ Latent Dirichlet allocation, semantic role labeling, K-L divergence,. . . Face recognition/image clustering ⋄ Reveal noses, lips, eyes, . . . ⋄ [Lee & Seung, Nature 1999] DNA microarray CScADS 12 8

No Silver Bullet Challenges/Drawbacks of NMF ⋄ Unique parts-based representation only under specific conditions (e.g., separable complete factorial family [Donoho et al. 2003] ). ⋄ Initialization directly impacts the quality of its output ⋄ Challenging objective functions (nonlinear, nonconvex, . . . ) ⋄ Many local minima ⋄ Expert/modeler needs to specify goals � Sparse features? � Accurate approximation? � Labeled/semi-supervised data? � Features corresponding to elements? CScADS 12 9

Incorporating The Science: Basis Initialization ⋄ Gaussian distributions describing reference elements via an “element signature” ⋄ Gaussians at K α 1 , K α 2 , K β 1 for elements of interest CScADS 12 10

Weight Image H S Associated With S Basis Previous fitting Square initialization Gaussian initialization (iter=1000) (iter=100) 1 hour 1.5 minutes 10 seconds CScADS 12 11

Multi-Channel Images Corresponding to Chemical Elements Ca Cl Cu Fe K P S TFY Zn s s s + Sufficient for many users/groups − Initial step to ultimate cell identification/classification goals − Neglects spatial attributes of pixels CScADS 12 12

Part 2: Finding Cells

Identifying Cells in Images ⋄ Cells have different sizes and shapes ⋄ Images are noisy, potentially large ( O (10 7 ) pixels) Zn map with more than 500 cells CScADS 12 14

Graph Partitioning Approaches ⋄ Build an undirected graph G = ( V, E ) from the image � v ∈ V corresponds to a pixel or a small region � e uv ∈ E connects u and v with weight w uv ⋄ Connectivity: connect local pixels (k-nearest neighbors or r -neighborhood) � w uv large for pixels within a group, small for pixels in different groups Goal: Partition the graph into disjoint partitions CScADS 12 15

Discrete Optimization and 2-way Graph Partitioning Minimum weight cut      Cut ( A, ¯ � w uv : A ∪ ¯ A = V, A ∩ ¯ A = ∅ , A � = ∅ , ¯ min A ) = A � = ∅ u ∈ A,v ∈ ¯  A + Efficient combinatorial algorithms exist − Often favors unbalanced cuts CScADS 12 16

Discrete Optimization and 2-way Graph Partitioning Minimum weight cut      Cut ( A, ¯ � w uv : A ∪ ¯ A = V, A ∩ ¯ A = ∅ , A � = ∅ , ¯ min A ) = A � = ∅ u ∈ A,v ∈ ¯  A + Efficient combinatorial algorithms exist − Often favors unbalanced cuts To obtain balanced cuts A ) = Cut ( A, ¯ + Cut ( A, ¯ A ) A ) RatioCut ( A, ¯ | ¯ | A | A | A ) = Cut ( A, ¯ + Cut ( A, ¯ A ) A ) NormalizedCut ( A, ¯ vol ( ¯ vol ( A ) A ) − Minimizing these objectives is hard CScADS 12 16

Spectral Relaxations � 1 if i ∈ A, 2 z T Lz, Cut ( A, ¯ A ) = 1 where z i = 0 otherwise. | ¯ � A | if i ∈ A, A ) = z T Lz RatioCut ( A, ¯ | A | z T z , where z i = − | A | otherwise. | ¯ A |  � vol ( ¯ A ) if i ∈ A, A ) = z T Lz  NormalizedCut ( A, ¯ vol ( A ) z T Dz , where z i = � vol ( A ) − otherwise  vol ( ¯ A ) L = D − W ; W = adjacency matrix; D ii = � j w ij CScADS 12 17

Spectral Relaxations � 1 if i ∈ A, 2 z T Lz, Cut ( A, ¯ A ) = 1 where z i = 0 otherwise. | ¯ � A | if i ∈ A, A ) = z T Lz RatioCut ( A, ¯ | A | z T z , where z i = − | A | otherwise. | ¯ A |  � vol ( ¯ A ) if i ∈ A, A ) = z T Lz  NormalizedCut ( A, ¯ vol ( A ) z T Dz , where z i = � vol ( A ) − otherwise  vol ( ¯ A ) L = D − W ; W = adjacency matrix; D ii = � j w ij Relax z ∈ { 0 , 1 } to have real values ⋄ Solve for the eigenvector associated with the 2nd smallest eigenvalue of RatioCut Lz = λz NormalizedCut (generalized eigenproblem) Lz = λDz • eigenvector y of the normalized graph Laplacian L = I − D − 1 / 2 W D − 1 / 2 , then take z = D − 1 / 2 y [Luxburg, “A tutorial on spectral clustering,” 2007] CScADS 12 17

Optimization Challenges in Cell Identification Stefan Wild Argonne - PowerPoint PPT Presentation

Optimization Challenges in Cell Identification Stefan Wild Argonne National Laboratory Mathematics and Computer Science Division Joint work with Sven Leyffer, Thanh Ngo, and Siwei Wang August 1, 2012 Disconnect and OPT ( f, c ) = min x R n

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA

Cell Communication and Cell Signaling Why is cell signaling important? Why is cell signaling

Does God play dice with the cell? Does God play dice with the cell? Does God play dice with the

Cell Hydration as Cell Hydration as an Essential Cell Parameter for an Essential Cell Parameter

Eukaryotic Cell Structures and Functions General Animal Cell Structure General Plant Cell

VHL and clear cell Renal Cell Carcinoma Gene expression profiles in renal cell VHL syndrome

Unique Identification Number Project: Unique Identification Number Project: Challenges and

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Optimization algorithms on Cell processor Vladim r T rebick y Optimization algorithms

flowDensity Automated gating of pre-defined cell populations Cell subset identification

Cell Communication Topics 4.1 through 4.2 Topic 4.1 Cell Communication Importance of Cell

MP-ECBC CELL Lets build Tomorrow presentation by Kaushal Lodaya MP-ECBC CELL Member

Standard Cell Design Advanced VLSI Design CMPE 641 Standard Cell Libraries Standard cell

2. Cellular and Molecular Biology 2.1 Cell Structure 2.2 Transport Across Cell Membranes 2.3

Free Response Slide 2 / 6 1 Sketch a plant cell, an animal cell, and a bacterial cell. Label

UK Cell Salvage Action Group UK Cell Salvage Action Group The UK Cell Salvage Action Group is a

Observing the birth of planets Valentin Christiaens Postdoctoral researcher - Monash University

W HAT IS GOING TO HAPPEN TO THE POLAR ICE CAPS ? They affect temperature, sea level (70 m

A formal proof of Borodin-Trakhtenbrots Gap Theorem Andrea Asperti DISI, University of

Solving Linear and Integer Programs Robert E. Bixby ILOG, Inc. and Rice University Outline

Environmental Quality Board Will Seuffert| Executive Director MN Environmental Quality for

Certificate Measurements to Detect Man-in-the-Middle Attacks and Middleboxes Mark ONeill,

VCE Board Meeting April 25, 2018 Woodland City Council Chambers It Item 10 Summary ry of

Experimental Results for the Propagation of Outdoor IEEE802.11 Links Karl Jonas Michael

Optimization Challenges in Cell Identification Stefan Wild Argonne - PowerPoint PPT Presentation

Optimization Challenges in Cell Identification Stefan Wild Argonne National Laboratory Mathematics and Computer Science Division Joint work with Sven Leyffer, Thanh Ngo, and Siwei Wang August 1, 2012 Disconnect and OPT ( f, c ) = min x R n

Bacteria Without a Cell Wall L-forms Pros &amp; Cons of Cell Wall Cell membrane Cell wall DNA

Cell Communication and Cell Signaling Why is cell signaling important? Why is cell signaling

Does God play dice with the cell? Does God play dice with the cell? Does God play dice with the

Cell Hydration as Cell Hydration as an Essential Cell Parameter for an Essential Cell Parameter

Eukaryotic Cell Structures and Functions General Animal Cell Structure General Plant Cell

VHL and clear cell Renal Cell Carcinoma Gene expression profiles in renal cell VHL syndrome

Unique Identification Number Project: Unique Identification Number Project: Challenges and

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Optimization algorithms on Cell processor Vladim r T rebick y Optimization algorithms

flowDensity Automated gating of pre-defined cell populations Cell subset identification

Cell Communication Topics 4.1 through 4.2 Topic 4.1 Cell Communication Importance of Cell

MP-ECBC CELL Lets build Tomorrow presentation by Kaushal Lodaya MP-ECBC CELL Member

Standard Cell Design Advanced VLSI Design CMPE 641 Standard Cell Libraries Standard cell

2. Cellular and Molecular Biology 2.1 Cell Structure 2.2 Transport Across Cell Membranes 2.3

Free Response Slide 2 / 6 1 Sketch a plant cell, an animal cell, and a bacterial cell. Label

UK Cell Salvage Action Group UK Cell Salvage Action Group The UK Cell Salvage Action Group is a

Observing the birth of planets Valentin Christiaens Postdoctoral researcher - Monash University

W HAT IS GOING TO HAPPEN TO THE POLAR ICE CAPS ? They affect temperature, sea level (70 m

A formal proof of Borodin-Trakhtenbrots Gap Theorem Andrea Asperti DISI, University of

Solving Linear and Integer Programs Robert E. Bixby ILOG, Inc. and Rice University Outline

Environmental Quality Board Will Seuffert| Executive Director MN Environmental Quality for

Certificate Measurements to Detect Man-in-the-Middle Attacks and Middleboxes Mark ONeill,

VCE Board Meeting April 25, 2018 Woodland City Council Chambers It Item 10 Summary ry of

Experimental Results for the Propagation of Outdoor IEEE802.11 Links Karl Jonas Michael

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA