Lhasa trusted community KNIME nodes Data processing and metabolism - - PowerPoint PPT Presentation

lhasa trusted community knime nodes
SMART_READER_LITE
LIVE PREVIEW

Lhasa trusted community KNIME nodes Data processing and metabolism - - PowerPoint PPT Presentation

Lhasa trusted community KNIME nodes Data processing and metabolism prediction Dr Samuel Webb samuel.webb@lhasalimited.org Who am I? Working within the Research Group at Lhasa Limited Activities include: Software tool development


slide-1
SLIDE 1

Lhasa trusted community KNIME nodes

Data processing and metabolism prediction

Dr Samuel Webb samuel.webb@lhasalimited.org

slide-2
SLIDE 2

Who am I?

  • Working within the Research Group at Lhasa Limited
  • Activities include:
  • Software tool development
  • Data mining
  • Algorithm development
  • Managing Lhasa’s internal KNIME nodes and build
  • Managing Lhasa’s open source KNIME contribution
slide-3
SLIDE 3

What is KNIME?

  • Analytics platform
  • Core software open source
  • Software development kit (SDK) makes it easy to develop

your own nodes

“Our KNIME Analytics Platform is the leading open solution for data-driven innovation, designed for discovering the potential hidden in data, mining for fresh insights, or predicting new futures. Organizations can take their collaboration, productivity and performance to the next level with a robust range of commercial extensions to our open source platform.” – www.knime.org/about

slide-4
SLIDE 4

KNIME and cheminformatics

  • Large number of downloads for the community plugins
  • Large number of community developers
  • Some examples of node types:
  • Chemical engines: ChemAxon, RDKit, CDK and Indigo
  • General purpose and algorithms: Vernalis, Enalos and Lhasa
  • Data searches: CIR and EMBL-EBI
slide-5
SLIDE 5

What does Lhasa use KNIME for?

  • Data processing:
  • Combining datasets: find overlap, compare activities when
  • verlap exists, join in data where no overlap exists…
  • Monitoring:
  • Extracting data from a the database which has been altered

identifying review work content

  • (Q)SAR
  • Model building, clustering, algorithm development,

applicability domains, chemical space investigation….

slide-6
SLIDE 6

LHASA CONTRIBUTION TO KNIME

Free, open source plugins released

slide-7
SLIDE 7

What have we released?

General nodes

  • Data manipulation
  • Discretise
  • Model scoring
  • Binary Scorer
  • Binned performance
  • Result
  • Table manipulation
  • Dumb Joiner (to be deprecated)
  • Row Splitter (col+)
  • Table to HTML

Metabolism nodes

  • SMARTCyp 2.4.2
  • Cytochrome P450 site of metabolism

predictor

  • Integration of Patrick Rydberg’s
  • pen source tool
  • WhichCyp 1.2
  • Prediction of binding to Cytochrome

P450 isoform(s)

  • Integration of Patrick Rydberg’s
  • pen source tool
slide-8
SLIDE 8

Disclaimer

  • These nodes / plugins are not Lhasa Limited products
  • Help / support for these nodes is provided via:
  • The KNIME forum: https://tech.knime.org/forum/lhasa-nodes
  • knime@lhasalimited.org (preferable to use the KNIME

forum)

slide-9
SLIDE 9

More information

  • https://tech.knime.org/lhasa-nodes-for-knime
slide-10
SLIDE 10

Why would you use these nodes?

Here we calculate the performance

  • f the Random Forest with Morgan

and MACCS fingerprints Convert the performance table to HTML and email Filter out rows where either model predict active

slide-11
SLIDE 11

Why would you use these nodes?

Here we calculate the performance

  • f the Random Forest with Morgan

and MACCS fingerprints Convert the performance table to HTML and email Filter out rows where either model predict active

slide-12
SLIDE 12

Generic nodes: model performance

  • Similar functionality to the

Scorer node

  • Calculates various

performance metrics for binary classification models

  • Can choose multiple

prediction columns

slide-13
SLIDE 13

Generic nodes: table to HTML

  • Convert a table to a single

HTML cell

  • The String render will render

HTML tags

  • Select which columns to

include

  • StringValue, IntValue, DoubleValue
  • Creates a single cell output
slide-14
SLIDE 14

SMARTCyp 2.4.2

  • SMARTCyp is a method for prediction of which sites in a molecule

that are most liable to metabolism by Cytochrome P450.

  • It has been shown to be applicable to metabolism by the isoforms

1A2, 2A6, 2B6, 2C8, 2C19, 2E1, and 3A4, and specific models for the isoform 2C9 and isoform 2D6 are included in KNIME 2.4.2

  • SMARTCyp is Developed by the Department of Drug Design and

Pharmacology at the University of Copenhagen and is funded by Lhasa Limited. More details can be found at: http://www.farma.ku.dk/smartcyp/about.php

slide-15
SLIDE 15

SMARTCyp 2.4.2 usage

  • Let’s recreate the results table from
  • http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4055970/
  • SMARTCyp: A 2D Method for Prediction of Cytochrome P450-Mediated Drug Metabolism
  • Patrik Rydberg,† David E. Gloriam,† Jed Zaretzki,‡ Curt Breneman,‡ and Lars Olsen*†
  • Metabolic position = any site listed as primary, secondary or tertiary
  • Use the top 3 predicted sites. Accuracy increases as you increase the rank

limit

  • When considering only the top ranked site there is a 65% accuracy in

identifying an experimentally seen SOM vs 81% using the top 3 sites

slide-16
SLIDE 16

SMARTCyp 2.4.2 usage

slide-17
SLIDE 17

SMARTCyp 2.4.2 usage

slide-18
SLIDE 18

SMARTCyp 2.4.2 usage

slide-19
SLIDE 19

SMARTCyp 2.4.2 usage

slide-20
SLIDE 20

SMARTCyp 2.4.2 usage

  • Here we’ve incorporated multiple

chemical engines from the same platform

  • RDKit
  • Rendering
  • CDK
  • Rendering
  • SMARTCyp processing
slide-21
SLIDE 21

WhichCyp

  • Predicts binding to Cyp isoforms: 1A2, 2C9, 2C19, 2D6 and 3A4.
  • Further reading:
  • Michal Rostkowski, Ola Spjuth and Patrik Rydberg. WhichCyp: Prediction of

Cytochromes P450 Inhibition, Bioinformatics, 2013, 29, 2051-2052

slide-22
SLIDE 22

WhichCyp usage

  • Renders images of the predictions as a PNG
  • May be updated to SVG in the future
  • Input: a structure column that is compatible

with a CDK Value such as:

  • Mol
  • SDF
  • Smiles
  • CDK
  • Outputs the values you would get in the CSV

file when running manually:

  • Binding, Missing Signatures and sensitivity

warnings

slide-23
SLIDE 23

WHERE CAN I GET THEM?

slide-24
SLIDE 24

Getting our nodes:

  • Download KNIME: https://www.knime.org/downloads/overview
  • Selecting + all free extensions and Lhasa’s will be included
slide-25
SLIDE 25

Getting our nodes:

  • Alternatively they can be added to an existing KNIME
  • Trusted Community Contributions -

http://update.knime.org/community-contributions/trusted/3.1

slide-26
SLIDE 26

Thank you

Support: https://tech.knime.org/forum