Deep Learning Approaches for Prediction of Liquid Properties Hyuntae - PowerPoint PPT Presentation

Deep Learning Approaches for Prediction of Liquid Properties Hyuntae Lim and YounJoon Jung Computational Nano-Bio Chemistry Laboratory Seoul National University November 8, 2019

Explicit Solvation Model C. Caleman, J. S. Hub, P. J. van Maaren, D. van der Spoel, Proc. Natl. Acad. Sci. , 2016, 108 , 6838-6842 ◮ Involves detailed structural information of the solvation shell. ◮ But slow! generally not appropriate for QM calculations.

Implicit Solvation Model APBS-PDB2PQR manual, https://apbs-pdb2pqr.readthedocs.io ◮ Solvent is implemented in PB equation as a continuum. ◮ Polarizable continuum model (D-PCM, C-PCM (COSMO), ...) ◮ Generalized Born approximation (SMD, SM8, SM12, ...)

QSPR Solvation Model ◮ Quantitative structure-property relationship https://chem.libretexts.org ◮ For example, we can roughly guess water solubility of various alcohols from their size of hydrocarbon chain. ◮ More detailed description of molecular structure and mathematical model with non-linearity (especially machine learning ) would give us more accurate predictions. ◮ Lacks of concrete theoretical backgrounds or detailed informations of solvation mechanism, but fast and transferable.

Basic Concepts of QSPR https://chem.libretexts.org Encoding function extracts structural features of a given molecule. ex) Substructure fingerprints, topological fingerprints, basic experimental measures, molecular graphs, ... Mapping function predicts target property using regression analysis. ex) Multiple linear regression, kernel ridge regression, support vector machine, decision tree, neural networks, ...

Word Embedding ◮ Word embedding generates the vector representations of given words from distribution of words in sentences of human’s natural language. ◮ Their geometrical position involves their linguistic features! ◮ Words have similar semantic meanings will have adjacent positions. T. Mikolov, I. Suskever, K. Chen, G. Corrado, and J. Dean, arxiv:1310.4546

Word Embedding ◮ Can we use word embedding to encode structural features of compounds? ◮ An application in computational chemistry & cheminformatics: Mol2vec S. Jaeger, S. Fulle and S. Turk, J. Chem. Inf. Model. , 2018, 58 , 27-35

DELFOS DEep Learning model for prediction of solvation Free energies in generic Organic Solvents

Mathematical Workflow ◮ We start with vector-sequences of the given solute and the solvent. { x α } solute sequence (1) { x γ } solvent sequence ◮ First, each recurrent layer trains the given molecular structure. { h α } = RNN( { x α } ) (2) { h γ } = RNN( { x γ } ) ◮ Then the attention mechanism generates context vector of each site. � c α = Softmax( h α · h γ ) h γ γ (3) � c γ = Softmax( h γ · h α ) h α α

Mathematical Workflow ◮ At the final stage of each encoder, two max-pooling layers reduce dimensions of intermediate data. u = MaxPooling( { h α ; c α } ) (4) v = MaxPooling( { h γ ; c γ } ) ◮ And their concatenation is the input of the predictor (mapping function). h (0) = u ; v (5) ◮ Mapping function is simple form of multi-stacked dense layers. h (1) = W (1) h (0) + b (1) · · · (6) h ( n ) = W ( n ) h ( n − 1) + b ( n ) ◮ We can obtain the target properety from the last dense layer.

Substructure Embedding ◮ Embedding procedure of Mol2vec ◮ Morgan algorithm classifies substructures at the identical environment. ◮ Environment is determined with neighboring atoms in different topological distance: r = 0 (itself), r = 1 (nearest neighbor), ... S. Jaeger, S. Fulle and S. Turk, J. Chem. Inf. Model. , 2018, 58 , 27-35

Recurrent Neural Network ◮ Appropriate for handling sequential inputs since each cell also consider forward or backward inputs. ◮ In the proposed model, two modified RNN algorithms (LSTM, GRU) are adopted to to train structural information from embedding layer.

Attention Mechanism ◮ Which parts in the given image and words in the caption are related? ◮ Chemical application: which substructures are strongly related each other? K.Xu, J.Ba, R.Kiros, K.Cho, A.Courville, R.Salakhutdinov, R.Zemel, and Y.Bengio, arxiv:1502.03044

Machine Learning Results ◮ Minnesota solvation database (MNSOL) 2012 ◮ Database contains 2,495 solvation energies for 418 solutes and 91 solvents ◮ Random K-fold cross-validation performed to obtain full predictions https://scikit-learn.org/stable/modules/cross validation.html

Machine Learning Results ◮ Substructural learning with recurrent network exhibits better result.

Machine Learning Results ◮ Comparison with ab initio implicit solvation models Solvent Method MAE N data Aqueous SM12CM5/B3LYP/MG3S 374 0.77 SM8/M06-2X/6-31G(d) 366 0.89 SMD/M05-2X/6-31G(d) 366 0.88 COSMO-RS/BP86/TZVP 274 0.52 D-COSMO-RS/BP86/TZVP 274 0.94 Delfos/BiLSTM 374 0.64 Delfos/BiGRU 374 0.68 Delfos w/o RNNs 374 0.90 Non-aqueous SM12CM5/B3LYP/MG3S 2129 0.54 SM8/M06-2X/6-31G(d) 2129 0.61 SMD/M05-2X/6-31G(d) 2129 0.67 COSMO-RS/BP86/TZVP 2072 0.41 D-COSMO-RS/BP86/TZVP 2072 0.62 Delfos/BiLSTM 2121 0.24 Delfos/BiGRU 2121 0.24 Delfos w/o RNNs 2121 0.36 A. V. Marenich, C. J. Cramer and D. G. Truhlar, J. Chem. Theory Comput. , 2013, 9 , 609-620 A. Klamt and M. Diedenhofen, J. Phys. Chem. A , 2015, 119 , 5439-5445

Extrapolation Robustness ◮ For structurally new compounds, the model will perform an extrapolation . https://scikit-learn.org/stable/modules/cross validation.html ◮ CV task based on K-means clustering algorithm may give us the generalized performance for extrapolation. ◮ In cluster CV task, each fold involves structurally similar compounds.

Extrapolation Robustness Solvent Method N data MAE RMSE All COSMO/BP86/TZVP 2346 2.15 2.57 COSMO-RS/BP86/TZVP 2346 0.42 0.75 SMD/PM6 2500 - 3.6 Delfos/Random CV 2495 0.30 0.57 Delfos/Solvent Clustering 2495 0.82 1.45 Delfos/Solute Clustering 2495 0.99 1.61 Toluene MD/GAFF 21 0.48 0.63 MD/AMOEBA 21 0.92 1.18 COSMO/BP86/TZVP 21 2.17 2.71 COSMO-RS/BP86/TZVP 21 0.27 0.34 Delfos/Random CV 21 0.16 0.37 Delfos/Solvent Clustering 21 0.66 1.10 Delfos/Solute Clustering 21 0.93 1.46 Chloroform MD/GAFF 21 0.92 1.11 MD/AMOEBA 21 1.68 1.97 COSMO/BP86/TZVP 21 1.76 2.12 COSMO-RS/BP86/TZVP 21 0.50 0.66 Delfos/Random CV 21 0.35 0.56 Delfos/Solvent Clustering 21 0.78 0.87 Delfos/Solute Clustering 21 1.14 1.62 Acetonitrile MD/GAFF 6 0.43 0.52 MD/AMOEBA 6 0.73 0.77 COSMO/BP86/TZVP 6 1.42 1.58 COSMO-RS/BP86/TZVP 6 0.33 0.38 Delfos/Random CV 6 0.29 0.39 Delfos/Solvent Clustering 6 0.74 0.82 Delfos/Solute Clustering 6 0.80 0.94 DMSO MD/GAFF 6 0.61 0.75 MD/AMOEBA 6 1.12 1.21 COSMO/BP86/TZVP 6 1.31 1.42 COSMO-RS/BP86/TZVP 6 0.56 0.73 Delfos/Random CV 6 0.41 0.44 Delfos/Solvent Clustering 6 0.93 1.19 Delfos/Solute Clustering 6 0.91 1.11 ◮ Although extrapolation causes a significant decline in terms of prediction accuracy, the results are still within chemical accuracy 1 . 0 kcal / mol . J. C. Kromann, C. Steinmann and J. H. Jensen, J. Chem. Phys. , 2018, 149 , 104102

Molecular Attention Map ◮ Nitromethane in four solvents - the polarity increases towards to right. ◮ Which substructures are more chemically similar with the given solvent?

Conclusions ◮ We introduced a QSPR regression neural network for solvation energy estimation that is inspired by NLP. ◮ The proposed model exhibits excellent prediction accuracy which is comparable with that of several well-known QM solvation models when the neural network is trained with sufficiently varied chemical structures. ◮ Cluster CV task suggests the importance of preparation of ML databases even though Delfos still demonstrates comparable predictions with some theoretical approaches. ◮ Our model not only provides a simple estimation of target property but also offers important pieces of information about which substructures play a dominant role. ◮ We are currently working on construction of more extrapolation-robust model and finding of connection between attention mechanism and physical quantities.

Prediction of Liquid Viscosity ◮ Viscosity prediction for hydrocarbons, alcohols and esters. ◮ MD (molecular descriptor), FGCD (functional group count descriptor D. A. Saldana, L. Starck, P. Mougin, B. Rousseau, N. Ferrando, and B. Creton, Energ. Fuel. , 2012, 26 , 2416-2426

Prediction of Liquid Viscosity ◮ Although the encoding method is quite old-fashioned, the model showed good accuracy in 407 molecules. D. A. Saldana, L. Starck, P. Mougin, B. Rousseau, N. Ferrando, and B. Creton, Energ. Fuel. , 2012, 26 , 2416-2426

Prediction of Liquid Viscosity ◮ DIPPR 801 database (from AIChE) ◮ Contains 34 constant properties and 15 T -dependent properties ◮ Including liquid viscosity and dielectric constant

Deep Learning Approaches for Prediction of Liquid Properties Hyuntae - PowerPoint PPT Presentation

Deep Learning Approaches for Prediction of Liquid Properties Hyuntae Lim and YounJoon Jung Computational Nano-Bio Chemistry Laboratory Seoul National University November 8, 2019 Explicit Solvation Model C. Caleman, J. S. Hub, P. J. van Maaren,

The liquid cooling specialists liquid cooled plates liquid cooling systems heat exchangers

Great Lakes Chloride, Inc. Direct Liquid Application (DLA) Direct Liquid Application (DLA)

Liquid Nitrogen Liquid Nitrogen-Physical Properties Liquid nitrogen (LN2) is produced

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Liquid S Liquid Scintillator for CANDLE cintillator for CANDLE cintillator for CANDLE S System

Luttinger Liquid Alexander Chudnovskiy Hamburg Unversity Luttinger Liquid Luttinger liquid

Introductory Chemical Engineering Thermodynamics Elliott and Lira: Chapter 12 Liquid-Liquid

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Link prediction The link prediction space is vast and imbalanced : real approaches focus only in

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Liquid Cooling Guidelines HPC Compressorless Liquid Cooling Building Supplied Cooling Water

Slide of amoeba Downloaded from www.studiestoday.com Downloaded from www.studiestoday.com

Distributed Objects: A Lightning Tour Distributed Objects: A Lightning Tour What is an

Amplifying Allows temporary increase of privileges Needed for modular programming

Tropical bases by regular projections Kerstin Hept Eindhoven, 10/31/07 joint work with Thorsten

Definable Maximal Families Yurii Khomskii most results joint with J org Brendle and Vera

OS Security Basics CS642: Computer Security Professor

Simplest Scalable Architecture NOW Network Of Workstations Many types of Clusters (form

INSTITUT PASTEUR DE LA GUADELOUPE Meeting RIIP America 2019 Institut Pasteur de la Guadeloupe At

Sambuz

Useful Links

Newsletter

Mail Us