contents
play

Contents Trend in Computer-Aided Materials Discovery - PowerPoint PPT Presentation

Contents Trend in Computer-Aided Materials Discovery High-Throughput Computational Screening & Exhaustive Enumeration Deep-Learning-based Evolutionary Design Deep-Learning-based Inverse Design Efficacy of Computer-Aided


  1. Contents  Trend in Computer-Aided Materials Discovery  High-Throughput Computational Screening & Exhaustive Enumeration  Deep-Learning-based Evolutionary Design  Deep-Learning-based Inverse Design  Efficacy of Computer-Aided Materials Discovery 1

  2. Trend in Computer-Aided Materials Discovery  For accelerated materials discovery First-principles High-performance Machine Learning Quantum Chemistry Computing Trial-and-Error Simulation Virtual screening Targeted design (high cost) (low throughput) (low hit-rate) (high hit-rate) Right solutions Iterative experiments Pre-validation High throughput with minimum effort [ 1 st Gen. ] [ 2 nd Gen.] [ 3 rd Gen. ] Conventional Rationalization Efficiency Intelligence 2

  3. Trend in Computer-Aided Materials Discovery  Prediction of materials property based on machine learning – Build-up of Materials vs. Property DB → Materials Informatics Kernel methods Bayesian approaches Deep Learning ANN ** in Chemistry (’71) (‘16 @ Stanford) QSAR * SMILES *** (‘87 Weininger) (’62, Hansch&Fujita) Bayesian Modeling Graph Kernels (‘09 @ MIT) (‘05 @ UC Irvine) * QSAR: Quantitative Structure-Activity Relationship ** ANN: Artificial Neural Network (‘18 @ Harvard) *** SMILES: Simplified Molecular-input-line Systems Introduction stage of Cheminformatics Development stage machine learning Process of Machine Learning @ Materials Research Descriptor Vector SMILES: CC(C)NCC(O)COC1=CC(CC2=CC=CC=C2)=C(CC(N)=O)C=C1 Fingerprint: Descriptor Training Analysis 011100011111101010010100100000101010001001010… graphs images 3

  4. Trend in Computer-Aided Materials Discovery  Materials design based on machine learning – Inverse QSAR → Inverse Design Deep Learning / Generative Models Inverse QSAR Exhaustive Generation GAN * for molecules (Late 80’s~) (’12 @ Tokyo) (‘17 @ Harvard) Inverse Design Genetic Algorithms (’16 @ SAIT) SMILES Autoencoder (’92 @ Purdue) (‘16 @ Harvard) Focus on autonomous molecular generation * GAN: Generative Adversarial Network Autoencoder Combinatorial Evolutionary 4

  5. Trend in Computer-Aided Materials Discovery  In-silico technologies for materials discovery Elemental Technologies Materials Discovery Methodologies Machine Learning [ In ] Targets Inverse Design [ Out ] Materials Molecules Informatics + + Evolutionary DB Design Molecular Target molecules Enumeration HTCS (High-Throughtput Automated Computational Screening) Simulation 5

  6. High-Throughput Computational Screening & Exhaustive Enumeration “Landscape of phosphorescent light -emitting energies of homoleptic Ir(III)- complexes predicted by a graph- based enumeration and deep learning”, GI01.02.02, 2018 MRS fall meeting 6

  7. High-Throughput Computational Screening  Property prediction with high-performance computing for large- scale exploration of materials candidates Seed Fragments Candidate Pool Combination large amounts Database of candidates Simulation Verification Target Materials 7

  8. High-Throughput Computational Screening  ML (Machine Learning)-assisted HTCS for higher efficiency Seed Fragments Candidate Pool Combination (2) Prioritizing calculation based on active learning large amounts Database of candidates (1) Simulation + ML Verification Target Materials 8

  9. High-Throughput Computational Screening  Exhaustive enumeration based on graph-theory – “Graphs” • Mathematical structures used to model pairwise relations between objects. • Made up of nodes and edges. • In chemistry, graph is used to model molecules, where nodes represent atoms and edges represent bonds. ※ Exhaustive enumeration : Systematical enumeration of all possible molecules for optimal solution search 9

  10. High-Throughput Computational Screening  Complete list of non-isomorphic graphs ID No. of edges No. of edges at each node http://www.cadaeic.net/graphpics.htm 10

  11. High-Throughput Computational Screening  Landscape of phosphorescent light-emitting energies of homoleptic Ir(III)-complex core structures – Ir(III)-complexes • Widely used as phosphorescent OLED dopants. • Figuring out the full landscape of emission color is important for discovering high-performing molecules in target color regions. New J. Chem ., 39 , 246 (2015) ACS Appl. Mater. Interfaces , 10 , 1888 – 1896 (2018) 11 Organic Electronics , 63 , 244 – 249 (2018)

  12. High-Throughput Computational Screening  Approach – Consider the nodes in graph as rings and edges as ring-connections. – Limited the total number rings between 3 and 5. – Exclude non-planar type (5-21) and invalid structures as dopant. → Only 11 graphs are valid among the total 29 graphs. 12

  13. High-Throughput Computational Screening  Enumeration – For 5- and 6-membered rings. – Substitute some carbons of each molecule with nitrogen atoms (max. five). → Total 9,919,469 (~10M) core structures 1. Graphs 3. Set Iridium positions 2. Skeletons total 405 EA 4. Substitute some carbon atoms with nitrogen atoms 13

  14. High-Throughput Computational Screening  Property prediction – Trained a deep-neural-network model with simulated T 1 data • Input: ECFP (Extended Connectivity FingerPrints) of molecular structures • Outputs: T 1 energy (phosphorescent light-emitting wavelength) 0.2 Mean Absolute Error of T 1 0.15 of the DNN (eV) With 80k training data, 0.1 the average prediction error was less than 0.1 eV 0.05 80k 0 10M = 0.8% 10K 20K 30K 40K 50K 60K 70K 80K Size of the training dataset By simulating the properties of only 0.8% molecules, we can fully scan the chemical space of 10M! 14

  15. High-Throughput Computational Screening  Results – Distribution of T 1 values – Blue-color emitting materials are rare compared with red and green 6 x 100,000 5 Number of molecules Red 4 (18.4%) Green (4.3%) 3 Blue 2 (0.4%) 1 0 0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95 1.05 1.15 1.25 1.35 1.45 1.55 1.65 1.75 1.85 1.95 2.05 2.15 2.25 2.35 2.45 2.55 2.65 2.75 2.85 2.95 Predicted T1 (eV) 15

  16. Conclusions  In materials discovery, deep-learning-based HTCS is a good alternative to conventional trial-and-error type approach.  Moreover, exhaustive enumeration makes it possible to systematically explore the whole chemical space.  With the proposed exhaustive enumeration method based on graph theory and deep learning, the whole landscape of 10M phosphorescent Ir-dopants could be scanned with just 0.8% computational cost compared with the pure simulation-based approach. 16

  17. Deep-Learning-based Evolutionary Design “Evolutionary design of organic molecules based on deep learning and genetic algorithm”, COMP , ACS fall 2018 National Meeting 17

  18. Evolutionary Design  A generic population-based metaheuristic optimization technique  Uses bio-inspired operators to reach near-optimal solutions ; mutation, crossover, and selection in case of genetic algorithm https://en.wikipedia.org/wiki/Fitness_landscape Initial population Fitness Calculate fitness Yes Done Satisfy constraints? No Selection Average fitness Mutation Crossover + New population Generation 18

  19. Deep-Learning-Based Evolutionary Design  Proposed approach Conventional Proposed Expectations Molecular Descriptor Graph or ASCII string Bit string (ECFP) • Prevent heuristic bias RNN • Secure chemical validity Molecular Evolution Heuristic Random • Versatile evaluation is possible Fitness Evaluation Simple assessment DNN *ECFP (Extended Connectivity FingerPrint) DNN (Deep Neural Network), RNN (Recurrent Neural Network) SMILES (Simplified Molecular-Input Line-Entry System) DB Seed molecule (ECFP) Best-fit molecule 1 1 1 0 0 1 1 0 Fitness evaluation Inspection of Mutation (n=50) 1 0 1 0 (DNN) chemical validity 1 1 0 0 Decoding to SMILES (RNN) Inspection of Decoding to Iteration chemical validity SMILES (RNN) 1 1 0 0 0 0 0 1 Parents Fitness evaluation Evolution 1 1 0 1 Crossover Selection (DNN) Crossover → Mutation) 1 0 0 1 Mutation 1 1 0 0 0 0 0 1 19

  20. Deep Learning-Based Evolutionary Design  Deep learning models [DNN] 3 hidden layers, 500 hidden units in each layer • [RNN] 3 hidden layers, 500 long short-term memory units • DNN Model RNN Model Input (ECFP*) <start> y 1 =‘CCC’ y 2 =‘CCC’ y T =‘)=O’ Input … t=1 t=2 t=3 t=T+1 (ECFP*) y 1 =‘ CCC ’ y 2 =‘ CC C ’ y 3 =‘ CC ( ’ <end> Output (SMILES) y = (‘ CCC ’,‘ CC C ’,‘ CC ( ’,…, ‘ )= O ’) → ‘ CCCC(N)=O ’ Output (Properties) *ECFP (dimension=5,000, neighbor size=6) 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend