chemical insights from a random forest prediction of
play

Chemical Insights from a Random Forest Prediction of Molecular - PowerPoint PPT Presentation

Chemical Insights from a Random Forest Prediction of Molecular Quantum Properties Beomchang Kang Seoul National University 2019.11.8, 1st XAIENCE Conference Fluorescent molecule Bio-imaging Specification Cell organelles


  1. Chemical Insights from a Random Forest Prediction of Molecular Quantum Properties Beomchang Kang Seoul National University 2019.11.8, 1st XAIENCE Conference

  2. Fluorescent molecule • Bio-imaging • Specification • Cell organelles • Proteins • Observation • Structure • Dynamics

  3. Good fluorescent molecule? • High quantum yield in visible area • Distinctive color • Low toxicity • High synthetic ability

  4. Towards discovery of novel and effective fluorescent molecules • Prediction of quantum properties for a given molecule • High quantum yield • Distinctive color • Searching the chemical space for molecules of desired properties

  5. Today, I focus on… • Prediction of • Oscillator strength to get high quantum yield • Excitation energy • Gaining chemical insight from Random Forest results

  6. Excitation Energy • Energy difference between 2 state • Electronic transition • Determines color

  7. Oscillator strength (OS) • Dimensionless quantity • Probability of electromagnetic radiation • Absorption or emission • Transitions between energy levels • To have high OS (Oscillator Strength) • Orbital shapes of the two states must be different

  8. Methods

  9. Prediction of molecular properties Molecule Predictor Property

  10. PubChemQC Database • Molecular quantum calculation • DFT • TD-DFT • From PubChem • Really synthesized • Molecular orbitals • Quantum properties • Classical properties

  11. Data set for RF • From PubchemQC • Only H, B, C, N, O, F, P, S, Cl • Only neutral molecules • Randomly selected 0.5 M compounds • Training:Test = 9:1

  12. RandomForest • Advantage • Simple • White-box • Feature importance • From feature importance • Chemical Insight • To be compared with deep learning methods

  13. Extended Circular FingerPrint [ECFP] • 2D Molecule -> Identifiers • Parameter - Radius • Bit vector of ECFP • Hashing • One-hot encoding (binary) • Parameter - # of bits

  14. Results & Discussion

  15. RF result - Excitation Energy • RMSE 0.4500(eV) • PearsonR 0.8689

  16. RF result -Oscillator strength • RMSE 0.066 • PearsonR 0.7300

  17. 0.5 M set Mean Median std 0.042 0.009 0.096

  18. Feature importance to Fragments 1 … 6128 6129 6130 … 16384 0.xxx 0.xxx 0.022 0.xxx 0.xxx Many Fragments…

  19. RandomForest - Feature importance • Oscillator strength Bit number 6129 • ECFP6 Cc1=cc=c(o1)c=C Oscillator strength 0.4690 • n_bit = 16384 • Feature Importance > 0.02 Feature # of Bit Number Importance Fragments 9352 0.0330 115 8017 0.0251 107 6192 0.0218 129

  20. Important Fragments • # of molecules which have tag fragment > 3 • Feature importance > 0.02 Fragment radius Mean OS # of molecules • ECFP6, 16384 vector 1 0.175 10590 • Average of OS > 0.1 3 0.175 4 2 0.342 9 3 0.211 11 1 0.207 6263 3 0.101 4

  21. Fragment of high OS • C(=C)c(c)o • Radius = 2 • 9 molecules • Mean of OS = 0.342

  22. Ethyl 5-ethenylfuran-2-carboxylate OS = 0.5230

  23. 5-ethenyl-3H-1,3-oxazole-2-thione OS = 0.4790

  24. ethyl 2-(5-ethenylfuran-2-yl)propanoate OS = 0.4730

  25. Thank You!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend