 
              DATA ANALYTICS IN NANOMATERIALS DISCOVERY Michael Fernandez | OCE-Postdoctoral Fellow September 2016 www.data61.csiro.au
Materials Discovery Process Materials Genome Project Integrating computational methods and information with sophisticated computational and analytical tools to shorten the duration of materials development from 10-20 years to 2 or 3 years. 3 | Data Analytics for Nanomaterials| Michael Fernandez
Materials and Molecular Modeling Sun Baichuan Collaborators • Big data and HPC integration Team leader Piotr Szul Yulia Arzhaeva Amanda Barnard • Deep learning and GPU computations Chris Watkins 2 | Data Analytics for Nanomaterials| Michael Fernandez
Outline • Material Discovery Process • Methods for atomistic simulations of materials • Experimental confirmation • Data-driven Computatioanl Nanomaterials Discovery • Hypothetical material space sampling (structure generation and statistics) • In silico high-throughput characterization (atomistic simulations and machine learning) • Data storage, analytics, exploitation and integration 2 | Data Analytics for Nanomaterials| Michael Fernandez
Atomistic Simulations of Materials Theory can predict properties of materials • Quantum chemistry methods can cover any chemistry • Empirical potentials exist for large number of elements • Computation is scalable and generically deployable 2 + 𝑊 𝑂 𝑓 𝛼 𝑗 𝑗 + 1 1 𝑂 𝑓 𝐼 = 𝐹 𝑜𝑣𝑑𝑚𝑓𝑗 ({ 𝑆 𝐽 })- σ 𝑗=1 2 σ 𝑗≠𝑘 𝑜𝑣𝑑𝑚𝑓𝑗 𝑠 𝑠 𝑗 −𝑠 𝑘 4 | Data Analytics for Nanomaterials| Michael Fernandez
Atomistic Simulations of Materials Computational predictions later confirmed by experiments • Self-assembly mechanism of nanodiamonds Barnard, A. S. et al. Nanoscale 3, 958 – 62 (2011) The arrow indicates (111)|(111) interface between two 4 nm sized DFTB simulations of the surface electrostatic potential of nanodiamonds. dodecahedral diamond nanoparticles of a) 2.2 nm and b) 2.5 nm 5 | Data Analytics for Nanomaterials| Michael Fernandez
Nano- Modern materials discovery cycle Materials libraries • Polydispersive systems • Exponential increase in Hypothetical complexity and diversity materials Experiment Performance • Nearly infinite combinatorial Database design Measurement problem Theory, modeling and In-silico Potyrailo, R. et al. ACS Comb. Sci. 13, informatics screening 579 – 633 (2011). Lead Knowledge Banks of materials for rational materials scale up design 6 | Data Analytics for Nanomaterials| Michael Fernandez
Nanomaterials Screening Departing from the Edisonian approach We can accurately predict a property, so it can be computed for entire materials spaces In silico structure generation vs. Combinatorial In silico design 7 | Data Analytics for Nanomaterials| Michael Fernandez
Nanomaterials Screening • Systematic and extensive materials performance ranking. • “Big data” discovery of structure -property relationships in unknown materials domains. • Accelerated identification of high potential candidates and rational design principles. 7 | Data Analytics for Nanomaterials| Michael Fernandez
Data Analytics Challenges Information representation • Fingerprints Information extraction • Multivariate statistical analysis Knowledge discovery • Data mining and machine learning Knowledge representation • Visualization 7 | Data Analytics for Nanomaterials| Michael Fernandez
Polydispersity Challenge in Nanomaterials $$$ Controlled synthesis Purification Polydispersive sample Quasi-monodisperse sample Polydispersity can be detrimental for high-performing applications Purification of polydispersive nanoparticles samples is expensive 9 | Data Analytics for Nanomaterials| Michael Fernandez
Data Analytics of Nanocarbons Virtual structures relaxed using TB-DFT Nanodiamonds Graphenes 10 | Data Analytics for Nanomaterials| Michael Fernandez
Archetypal Analysis (AA) Finds a k  m matrix Z that corresponds to the archetypal or ”pure patterns” in the data in such a way that each data point can be represented as a mixture of those archetypes. In other words, the archetypal analysis yields the two n  k coefficient matrices α and β , which minimize the residual sum of squares: Cutler, A. & Breiman, L. Archetypal Analysis. Technometrics 36, 338 – 347 (1994). The predictors of X i are finite mixtures of archetypes Z j , which are convex combinations of the observations. 10 | Data Analytics for Nanomaterials| Michael Fernandez
Archetypal Analysis of Nanocarbons Nanodiamonds Graphene nanoflakes Fernandez, M. & Barnard, A. S. ACS Nano 9, 11980 – 11992 (2015). 11 | Data Analytics for Nanomaterials| Michael Fernandez
Nanocarbons Prototypes Nanodiamonds prototypes Graphene prototypes Fernandez, M. & Barnard, A. S. ACS Nano 9, 11980 – 11992 (2015). 13 | Data Analytics for Nanomaterials| Michael Fernandez
Estimation of Nanodiamonds Properties Fernandez, M. & Barnard, A. S. ACS Nano 9, 11980 – 11992 (2015). 13 | Data Analytics for Nanomaterials| Michael Fernandez
Structural Diversity Challenge Graphene nanoflakes Trigonal Rectangular Hexagonal Defects, oxidation and edge passivation yield large structural diversity 13 | Data Analytics for Nanomaterials| Michael Fernandez
Structural Diversity Challenge Silicon qbits P P P Single Si substitution by P yields a large structural diversity 13 | Data Analytics for Nanomaterials| Michael Fernandez
Structural Diversity Challenge Metal-Organic Framework (MOF) ZIF-68 benzimidazole CO 2 capture and sequestration Zn 2+ nitroimidazole (Science, 2008) 13 | Data Analytics for Nanomaterials| Michael Fernandez
Structural Diversity Challenge Metal-Organic Framework (MOF) In-silico Combinatorial design 13 | Data Analytics for Nanomaterials| Michael Fernandez
Structural Diversity Challenge Metal-Organic Framework (MOF) MOF ZBP a) Organic Linkers Modification with of 35 functional groups gives a total of ~1.5 million (35 4 ) unique combinations 13 | Data Analytics for Nanomaterials| Michael Fernandez
Machine Learning Approach Machine learning prediction of functional properties Feature fingerprints Machine learning 13 | Data Analytics for Nanomaterials| Michael Fernandez
Data Analytics Challenge Binary decision tree of the Band Gap of graphene Accuracy Features 80% • Surface area • Number of atoms • Shape aspect ratio Fernandez, M., Shi, H. & Barnard, A. S Carbon (2016). doi:10.1016/j.carbon.2016.03.005 13 | Data Analytics for Nanomaterials| Michael Fernandez
Machine Learning vs. Atomistic Simulations Estimation of the graphene Band Gap from topological features P i and P j , are the values of a bond order of the Fernandez, M. et al. ACS Comb. Sci. (2016) carbon atoms in graphene, while L is the topological doi:10.1021/acscombsci.6b00094 distance, whilst  L ij is a delta function delta function 13 | Data Analytics for Nanomaterials| Michael Fernandez
Machine Learning of Graphene Radial Distribution Function (RDF) scores for graphene the summation is over the N atom pairs in the graphene structure, and r ij is the distance of these pairs and B is a smoothing parameter set to 10. Fernandez, M.; Shi, H.; Barnard, A et al. J. Chem. Inf. Model. (2015), 55, 2500-2506 13 | Data Analytics for Nanomaterials| Michael Fernandez
Machine Learning vs. Atomistic Simulations Energy of the Fermi level Ionization Potential From RDF scores Fernandez, M.; Shi, H.; Barnard, A et al. J. Chem. Inf. Model. (2015), 55, 2500-2506 13 | Data Analytics for Nanomaterials| Michael Fernandez
Machine Learning vs. Atomistic Simulations Machine learning prediction of gas adsorption in MOF Fernandez, M., et al. J. Phys. Chem. C 117, 14095 – 14105 (2013). 13 | Data Analytics for Nanomaterials| Michael Fernandez
Accuracy and Coverage Challenges 5000 atoms Accuracy of electronic calculations TBDF/ 500 atoms methods vs. system size Semiempirical Density System size Functional Quantum 100 atoms Monte Carlo Coupled 20 atoms Cluster Accuracy 13 | Data Analytics for Nanomaterials| Michael Fernandez
Data-driven Challenge Machine learning for large material spaces Full Database Predictions Machine Learning ∆ 3 Partial Database Screening Ramakrishnan, R. et al. J. Chem. Theory Comput. 11, 2087 – 2096 (2015). Fernandez, M et al. J. Chem. Inf. Model. (2015), 55, 2500-2506 Machine learning predictions: -Accuracy of different quantum-chemistry methods  = f ( ) -Functional property value or threshold structure 15 | Data Analytics for Nanomaterials| Michael Fernandez
Challenges and Limitations Accuracy Gap Between Different Levels of Theory 6,095 isomers of C 7 H 10 O 2 B3LYP  E big gap big gap QMC 13 | Data Analytics for Nanomaterials| Michael Fernandez
Accuracy Gap Predictions Predictions Machine learning calibration 13 | Data Analytics for Nanomaterials| Michael Fernandez
Data-driven High-throughput Screening Computational resources Resubmit or kill failed runs Input jobs Queue management Finished runs outputs Data storage Accuracy refinement 13 | Data Analytics for Nanomaterials| Michael Fernandez
Recommend
More recommend