A new smart-pooling strategy for high-throughput screening: the - PowerPoint PPT Presentation

A new smart-pooling strategy for high-throughput screening: the Shifted Transversal Design Nicolas Thierry-Mieg CNRS / LSR-IMAG laboratory Grenoble, France DIMACS CGT Workshop, 17/05/2006 1

Context: systems biology • Many high-throughput projects – basic yes-or-no test to a large collection of “objects” – low-frequency positives – experimental noise • A natural solution: smart-pooling, provided that – objects are individually available – basic assay on pool of objects (OR: XOR is not available) • Advantages: – Number of pools is small – Pools are redundant → error-correction • Main difficulty: designing the pools – Non-adaptive designs – Specific constraints (e.g. pool size) 2

Example of smart-pooling: row and columns (from: Thierry-Mieg N. Pooling in systems biology becomes smart. Nat Methods. 2006 Mar;3(3):161-2.) 3

Layout of the talk • Biological context • Definition of STD • Properties • Behavior and efficiency • Application: protein-protein interaction mapping 4

STD: preliminary definitions • Pooling problem (n,t,E): • A n = {A 0 , …,A n-1 } set of Boolean variables (n ≈ 10 3 -10 6 ) • t = number of positives ( ≈ 1-10) • E = number of errors ( ≈ 1-40% of tests) • Pool: subset of A n , value=OR • Goal: build a set of v pools → v small → guarantee correction of errors & identification of positives 5

Matrix representation v × n Boolean matrix: M(i,j) true ⇔ pool i contains variable j Example: n=9, A 9 = {0, 1,…, 8} : pools: 1 0 0 1 0 0 1 0 0 {0,3,6}    0 1 0 0 1 0 0 1 0  {1,4,7}   0 0 1 0 0 1 0 0 1 {2,5,8}     “layer” = partition of A n 6

Shifted Transversal Design: idea “Transversal” construction: layers. “shift” variables from layer to layer • limit co-occurrence of variables • constant-sized intersection between pools STD(n;q;k) : n variables, q prime, q < n, k number of layers (k ≤ q+1) • First q layers: symmetric construction, q pools of size n/q or n/q+1 • If k=q+1: additional singular layer, up to q pools of heterogeneous sizes Let: • Γ (q,n) = min{ γ | q γ +1 ≥ n} x x     1 q     x x • σ q circular permutation on {0,1} q : 2 1 σ =     q � �     7     x x − 1 q q    

STD Construction ∀ j ∈ {0,…,q}: Mj q × n Boolean matrix, representing layer L(j) columns : , ,..., C C − 0 , 1 j j n 1   ( , ) = σ  0  s i j , and ∀ i ∈ {0,…,n-1} where: ( ) C C = C   , 0 , 0 j i q 0 , 0 o     0   Γ i   ⋅ • if j < q: s(i,j) = c j ∑   c q = 0   c i   • if j=q (singular layer): s(i,q) = Γ   q   − 1 k For k ∈ {1,2,..., q+1}, STD(n;q;k) = t ( ) L j = 0 j 8

STD example: n=9, q=3 1 0 0 1 0 0 1 0 0   L(0) = {{0,3,6}, {1,4,7}, {2,5,8}} = 0 1 0 0 1 0 0 1 0   M 0   0 0 1 0 0 1 0 0 1     1 0 0 0 0 1 0 1 0   = L(1) = {{0,5,7}, {1,3,8}, {2,4,6}}  0 1 0 1 0 0 0 0 1  M 1   0 0 1 0 1 0 1 0 0     1 0 0 0 1 0 0 0 1   =  0 1 0 0 0 1 1 0 0  L(2) = {{0,4,8}, {1,5,6}, {2,3,7}} M 2   0 0 1 1 0 0 0 1 0     1 1 1 0 0 0 0 0 0   =  0 0 0 1 1 1 0 0 0  L(3) = {{0,1,2}, {3,4,5}, {6,7,8}} M 3   0 0 0 0 0 0 1 1 1     STD(n=9;q=3;k=2) = L(0) ∪ L(1). 9

STD example: n=9 to 27, q=3 n=9, q=3, third layer (j=2): 1 0 0 0 1 0 0 0 1   L(2) = {{0,4,8}, {1,5,6}, {2,3,7}} =  0 1 0 0 0 1 1 0 0  M 2   0 0 1 1 0 0 0 1 0     n=27, q=3, j=2: +(1+j+j 2 ) +1 +(1+j) 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 0   =   0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 1 0 M 2   0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 0 1     10

Layout of the talk • Biological context • Definition of STD • Properties: a solution to the pooling problem • Behavior and efficiency • Application: protein-protein interaction mapping 11

Co-occurrence of variables ∀ k ∈ {1,...,q+1}, ∀ i ∈ {0,…,n-1}: pools k (i) = {p ∈ STD(n;q;k) | A i ∈ p} Theorem: (q prime). ∀ i 1 ,i 2 ∈ {0,…,n-1}, [i 1 ≠ i 2 ] ⇒ [Card( pools q+1 (i 1 ) ∩ pools q+1 (i 2 ) ) ≤ Γ (q,n)]. = (Idea of) proof: Card( pools q+1 (i 1 ) ∩ pools q+1 (i 2 ) ) = Card {j ∈ {0,…,q}, }. C C , , j i j i 1 2 However, for j < q: Γ i i =       ≡ ⋅ − ≡ ⇔ ( , ) ( , ) ⇔ mod 1 2 0 mod c C C s i j s i j q j q   , , ∑ 1 2 j i j i     c c   1 2 q q = 0     c   Ζ q is the field GF(q); Since q is prime, Ζ i i   And since i 1 ≠ i 2 , there exists at least one c ≤ Γ such that .     − ≠ 1 2 0 mod q       c c   q q       We therefore have a non-zero polynomial (in j) of degree at most Γ on GF(q). If : OK. ≠ C C , , q i q i 1 2 If , coefficient of j Γ in the polynomial is zero by definition of s(i,q) : OK. = C C , , q i q i 1 2 12

Example: n=9, q=3 (hence Γ Γ Γ Γ =1) L(0) = {{0,3,6}, {1,4,7}, {2,5,8}}, L(1) = {{0,5,7}, {1,3,8}, {2,4,6}}, L(2) = {{0,4,8}, {1,5,6}, {2,3,7}}, L(3) = {{0,1,2}, {3,4,5}, {6,7,8}}. pools 4 (0) = {{0,3,6}, {0,5,7}, {0,4,8},{ 0,1,2}}. 0 appears exactly once ( Γ =1) with each other variable. 13

A solution in the absence of noise Corollary 1: If there are at most t positive variables in A n and in the absence of noise : STD(n;q;k) is a solution, when choosing q prime such that t ⋅Γ (q,n) ≤ q, and k=t ⋅Γ +1. (Idea of) proof: algorithm 1 correctly tags all variables. Algorithm 1: 1. all the variables present in at least one negative pool are tagged negative 2. any variable present in at least one positive pool where all other variables have been tagged negative, is tagged positive 14

Example with n=9, q=3 Let t=1: by corollary 1, k=t ⋅Γ +1=2 layers are sufficient Single positive variable: 8 {{0,3,6}, {1,4,7}, {2,5,8}, {0,5,7}, {1,3,8}, {2,4,6}} Algorithm 1: 1. 4 negative pools show that 0, 1, …, 7 are negative; 2. 2 positive pools each show that 8 is positive (since 2, 5, 1 and 3 negative). Note: if more than t variables are positive, all tags are still correct but some variables may not be tagged: they are “unresolved” (“ambiguous”). 15

Error-correction Corollary 2: If there are at most t positive variables in A n and at most E observation errors : STD(n;q;k) is a solution, when choosing q prime such that t ⋅Γ (q,n)+2 ⋅ E ≤ q, and k=t ⋅Γ +2 ⋅ E+1. (Idea of) proof: algorithm 2 correctly tags all variables. Any contradictory observation is erroneous. Algorithm 2: 1. all the variables present in at least E+1 negative pools are tagged negative 2. any variable present in at least E+1 positive pools where all other variables have been tagged negative, is tagged positive 16

Error-correction (2) Errors can be false-positives or false negatives Corollary 3: If there are at most t positive variables in A n and at most E false positive and E false negative observations : STD(n;q;k) is a solution, when choosing q prime such that t ⋅Γ (q,n)+2 ⋅ E ≤ q, and k=t ⋅Γ +2 ⋅ E+1. (Idea of) proof: same algorithm as corollary 2. 17

Error-detection If more than E errors: detection if • some variables tagged twice or not at all • more than t variables are tagged positive • more than E observations identified as erroneous Question: how many errors are necessary to avoid detection? Answer: • at least E+ Γ +1 false negatives, or • at least E+ Γ +1 false positives, or • if E < 2 ⋅Γ -1: at least 3 ⋅ E+2 errors including at least E+1 errors of each type. 18

Error detection and correction 19

Even redistribution of variables Theorem: Let m ≤ k ≤ q and consider {P 1 ,…,P m } ⊂ STD(n;q;k), each belonging to a different layer. Then: Γ − m 1 n   λ ≤ ≤ λ +   − , where . h λ = ⋅ 1 % c m P q q ∑ m h m   m   c q = 1 = h   c m   Proof: see BMC Bioinformatics 2006, 7:28. Notes: • λ m depends only on m, not on the choice of the pools P 1 ,…,P m . Hence the theorem expresses that every pool, and every intersection between 2 or more pools, is redistributed evenly in each remaining layer • L(q) does not work (k ≤ q) 20

Layout of the talk • Biological context • Definition of STD • Properties • Behavior and efficiency • Application: protein-protein interaction mapping 21

A new smart-pooling strategy for high-throughput screening: the - PowerPoint PPT Presentation

A new smart-pooling strategy for high-throughput screening: the Shifted Transversal Design Nicolas Thierry-Mieg CNRS / LSR-IMAG laboratory Grenoble, France DIMACS CGT Workshop, 17/05/2006 1 Context: systems biology Many high-throughput

Risk Pooling Strategies to Reduce and Hedge Uncertainty Location Pooling Product Pooling

Deep Learning (Partly) Need for Pooling Demystified Which Pooling . . . Pooling Four Values

SMART ENERGY SMART ASSET SMART SMART SMART & CUSTOMER ASSET PURPOSE PEOPLE

Business rates and pooling Cameron Hall, Ian Hewitt, Mark Holland, Owen Jones, Zoe Lawson, Neeraj

13 IN THIS CHAPTER Benefits of Thread Pooling 308 Considerations and Costs of Thread

QUAPO : Quantitative Analysis of Pooling in High-Throughput Drug Screening Raghu Kainkaryam

Smart and Adaptive Cyber-Physical Systems Chapters 1,2 Cyber-Physical Systems Smart mobility

High throughput High throughput kafka for science kafka for science Testing Kafkas limits

Evaluation of Improved Scalability Comparison points Throughput (IPC/Node)

Quality of Life - Smart Mobility - Smart Infrastructure - Smart People, Smart Living ARC 590

CONTENTS Smart Schools Bond Act Committees and the Smart Schools Investment Plan Smart Schools

Packet-Level Signatures for Smart Home Devices Rahmadi Trimananda, Janus Varmarken, Athina

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Deep learning 4.5. Pooling Fran cois Fleuret https://fleuret.org/ee559/ Nov 2, 2020 The

Sustainability and Smart Grid Implementing a Non residential Smart Metering System PaperCon

Smart Metering Smart Metering The Power of Smart Metering The Power of Smart Metering MOST

Comparison of Different OCT Systems Teresa C. Chen, MD Associate Professor of Ophthalmology,

Resources for Preclinical Resources for Preclinical Assessment of Nanomaterial Assessment of

Logical modelling of hematopoietic cell fate decisions Denis Thieffry Brussels, March 21, 2011

Leishmania Life Cycle into another macrophage ( amastigote ) amastigote still in the macrophage

DPS studies at LHCb Vanya Belyaev (ITEP, Moscow) On behalf of LHCb collaboration High energy

Detection of Data Corruption via Combinatorial Group Testing and beyond Kazuhiko Minematsu

Applications of the Human Eye Working Principle: CORPS-BFN Diego Betancourt Carlos del

Unsupervised learning in medical imaging Discovering phenotypes and detecting anomalies Johannes

A new smart-pooling strategy for high-throughput screening: the - PowerPoint PPT Presentation

A new smart-pooling strategy for high-throughput screening: the Shifted Transversal Design Nicolas Thierry-Mieg CNRS / LSR-IMAG laboratory Grenoble, France DIMACS CGT Workshop, 17/05/2006 1 Context: systems biology Many high-throughput

Risk Pooling Strategies to Reduce and Hedge Uncertainty Location Pooling Product Pooling

Deep Learning (Partly) Need for Pooling Demystified Which Pooling . . . Pooling Four Values

SMART ENERGY SMART ASSET SMART SMART SMART &amp; CUSTOMER ASSET PURPOSE PEOPLE

Business rates and pooling Cameron Hall, Ian Hewitt, Mark Holland, Owen Jones, Zoe Lawson, Neeraj

13 IN THIS CHAPTER Benefits of Thread Pooling 308 Considerations and Costs of Thread

QUAPO : Quantitative Analysis of Pooling in High-Throughput Drug Screening Raghu Kainkaryam

Smart and Adaptive Cyber-Physical Systems Chapters 1,2 Cyber-Physical Systems Smart mobility

High throughput High throughput kafka for science kafka for science Testing Kafkas limits

Evaluation of Improved Scalability Comparison points Throughput (IPC/Node)

Quality of Life - Smart Mobility - Smart Infrastructure - Smart People, Smart Living ARC 590

CONTENTS Smart Schools Bond Act Committees and the Smart Schools Investment Plan Smart Schools

Packet-Level Signatures for Smart Home Devices Rahmadi Trimananda, Janus Varmarken, Athina

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Deep learning 4.5. Pooling Fran cois Fleuret https://fleuret.org/ee559/ Nov 2, 2020 The

Sustainability and Smart Grid Implementing a Non residential Smart Metering System PaperCon

Smart Metering Smart Metering The Power of Smart Metering The Power of Smart Metering MOST

Comparison of Different OCT Systems Teresa C. Chen, MD Associate Professor of Ophthalmology,

Resources for Preclinical Resources for Preclinical Assessment of Nanomaterial Assessment of

Logical modelling of hematopoietic cell fate decisions Denis Thieffry Brussels, March 21, 2011

Leishmania Life Cycle into another macrophage ( amastigote ) amastigote still in the macrophage

DPS studies at LHCb Vanya Belyaev (ITEP, Moscow) On behalf of LHCb collaboration High energy

Detection of Data Corruption via Combinatorial Group Testing and beyond Kazuhiko Minematsu

Applications of the Human Eye Working Principle: CORPS-BFN Diego Betancourt Carlos del

Unsupervised learning in medical imaging Discovering phenotypes and detecting anomalies Johannes

SMART ENERGY SMART ASSET SMART SMART SMART & CUSTOMER ASSET PURPOSE PEOPLE