Screen Mining with KNIME
A user-friendly framework for high throughput / content data analysis
Mar$n Stöter HT - Technology Development Studio (TDS), the HC-Screening Unit at the MPI-CBG stoeter@mpi-cbg.de
KNIME workshop February 27th 2016, Berlin
Outline - IntroducAon into High-Content Screening (HCS) data and the - - PowerPoint PPT Presentation
Mar$n Stter KNIME workshop HT - Technology Development Studio (TDS), the HC-Screening Unit at the MPI-CBG February 27 th 2016, Berlin stoeter@mpi-cbg.de Screen Mining with KNIME A user-friendly framework for high throughput / content data
Mar$n Stöter HT - Technology Development Studio (TDS), the HC-Screening Unit at the MPI-CBG stoeter@mpi-cbg.de
KNIME workshop February 27th 2016, Berlin
Martin Stöter, MPI-CBG, Dresden, Germany
2
Martin Stöter, MPI-CBG, Dresden, Germany
3
MPI-CBG, Dresden, Germany Screening facility for academic laboratories Provide full service for automaAon and cell-based screens, RNAi and chemical screens Equipment: liquid handling robots, drop dispensers, plate washers, plate readers, High Content Screening plaTorms
4
Data analyst
Complex Experiments Lots of data (too much for Excel) Fancy data analysis / mining Many scienAsts, but few data analysts SomeAmes different languages Data analysis is oYen a boZleneck!
Scien$sts
Martin Stöter, MPI-CBG, Dresden, Germany
5
Data generaAon
Tasks/problems
SQL database, XML, Excel, various .csv …
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 A B DMSO DMSO DMSO C 0.001 DMSO DMSO 0.001 D 10 DMSO DMSO 10 E 10 DMSO DMSO 10 F 3 DMSO DMSO 3 G 3 DMSO DMSO 3 H 1 DMSO DMSO 1 I 1 DMSO DMSO 1 J 0.3 DMSO DMSO 0.3 K 0.3 DMSO DMSO 0.3 L 0.1 DMSO DMSO 0.1 M no AB no AB 0.1 DMSO DMSO 0.1 N no AB no AB 0.1 DMSO DMSO 0.1 O DMSO DMSO P
Data Import
Image Analysis Readers (Opera, OpereQa, Mo$onTracking) Plate Readers (Envision, GeniusPro, MSD SectorImager) Other (Example Data, Generic XML)
NormalizaAon
Percent-of-control (POC), Normalized percent inhibi$on (NPI) Z-score, B-score Vector Length Normaliza$on (clustering) Op$onal: robust sta$s$cs (Median + MAD) Select wells to normalize (controls, samples)
Quality Control
Z-prime factor (Z‘), Mul$variate Z‘, SSMD CV (coefficient of variance) Op$onal: robust sta$s$cs (Median + MAD) Select wells to normalize (controls, samples)
UAliAes
Handle barcodes, wells and row leQers Join Layout from Excel (well annota$on, meta data) Create Well Posi$on (NEW)
VisualizaAon
Plate Heatmap Viewer Dose Response (dependent on R!)
Advanced StaAsAcs
BinningAnalysis
Data ManitupaAon / Pre-Processing
Split / Combine Colums (by header) Number FormaQer (NEW) Range Filter, SpliQer Outlier Removal
“barcode”, “plateRow”, “plateColumn”, param1, param2, …
StandardizaAon of the well coordinates:
NEW NODE
Regular expression for interpretaAon of barcode:
Regular expression for interpretaAon of barcode:
Regular expression for interpretaAon of barcode:
Excel is the tool for experiment documentaAon and assay development Join Layout node is Excel Reader for defined spread sheet Plate format with mulAple well aZributes (1 plate layout -> 1 column in KNIME)
To compare data from different plates, days or runs data must be normalized per plate Selectable reference well populaAon per plate Percent-of-control (POC), Normalizes-percent-of- inhibiAon (NPI), Z-Score Robust staAsAcs (median & mad instead of mean & sd) with staAsAcs table as second output
To compare data from different plates, days or runs data must be normalized per plate Selectable reference well populaAon per plate Percent-of-control (POC), Normalizes-percent-of- inhibiAon (NPI), Z-Score Robust staAsAcs (median & mad instead of mean & sd) with staAsAcs table as second output
To compare data from different plates, days or runs data must be normalized per plate Selectable reference well populaAon per plate Percent-of-control (POC), Normalizes-percent-of- inhibiAon (NPI), Z-Score Robust staAsAcs (median & mad instead of mean & sd) with staAsAcs table as second output
Quality control staAsAc measure the assay performance Selectable (mulAple) reference well populaAon per plate Z-Prime factor (Z’), mulAvariate Z’, strictly standardized mean difference (SSMD), coefficient of variance (CV) Robust staAsAcs (median & mad instead of mean & sd)
Quality control staAsAc measure the assay performance Selectable (mulAple) reference well populaAon per plate Z-Prime factor (Z’), mulAvariate Z’, strictly standardized mean difference (SSMD), coefficient of variance (CV) Robust staAsAcs (median & mad instead of mean & sd)
Quality control staAsAc measure the assay performance Selectable (mulAple) reference well populaAon per plate Z-Prime factor (Z’), mulAvariate Z’, strictly standardized mean difference (SSMD), coefficient of variance (CV) Robust staAsAcs (median & mad instead of mean & sd)
Binning analysis describes changes in distribuAons Great tool for moving from cell to well data (instead of just taking mean per well)
"CellProfiler and KNIME: open source tools for high content screening.". Methods in molecular biology (Clifton, N.J.) 2013 986, S. 105-22
Binning analysis describes changes in distribuAons Great tool for moving from cell to well data (instead of just taking mean per well)
"CellProfiler and KNIME: open source tools for high content screening.". Methods in molecular biology (Clifton, N.J.) 2013 986, S. 105-22
Binning analysis describes changes in distribuAons Great tool for moving from cell to well data (instead of just taking mean per well)
"CellProfiler and KNIME: open source tools for high content screening.". Methods in molecular biology (Clifton, N.J.) 2013 986, S. 105-22
Martin Stöter, MPI-CBG, Dresden, Germany
23
179 plates x 384wells = ~70.000 data points Ames x parameters
VisualizaAon of screening campaigns with meta data Easy to find visually paZers, driYs, errors… New features:
VisualizaAon of screening campaigns with meta data Easy to find visually paZers, driYs, errors… New features:
VisualizaAon of screening campaigns with meta data Easy to find visually paZers, driYs, errors… New features:
VisualizaAon of screening campaigns with meta data Easy to find visually paZers, driYs, errors… New features:
VisualizaAon of screening campaigns with meta data Easy to find visually paZers, driYs, errors… New features:
VisualizaAon of screening campaigns with meta data Easy to find visually paZers, driYs, errors… New features:
VisualizaAon of screening campaigns with meta data Easy to find visually paZers, driYs, errors… New features:
VisualizaAon of screening campaigns with meta data Easy to find visually paZers, driYs, errors… New features:
VisualizaAon of screening campaigns with meta data Easy to find visually paZers, driYs, errors… New features:
VisualizaAon of screening campaigns with meta data Easy to find visually paZers, driYs, errors… New features:
New nodes
Enhancements
Plate Viewer was disconAnued Binning Analysis work in progress
Transforms numbers to defined string
Ok… now let’s go to the workflow and see the nodes… The data set: CellProfiler Image data (pre-cleaned up as a .table due to technical reasons)
36
Antje Janosch Tim Nicolaisen Magdalena Rucinsk Felix Meyerhofer (past) Holger Brandl (past)
Michael Berthold and the KNIME team