Machine Learning for the Materials Scientist Chris Fischer*, Kevin - - PowerPoint PPT Presentation

machine learning for the materials scientist
SMART_READER_LITE
LIVE PREVIEW

Machine Learning for the Materials Scientist Chris Fischer*, Kevin - - PowerPoint PPT Presentation

Machine Learning for the Materials Scientist Chris Fischer*, Kevin Tibbetts, Gerbrand Ceder Massachusetts Institute of Technology, Cambridge, MA Dane Morgan University of Wisconsin, Madison, WI NGDM, October 10, 2007 Motivation : materials


slide-1
SLIDE 1

NGDM, October 10, 2007

Machine Learning for the Materials Scientist

Chris Fischer*, Kevin Tibbetts, Gerbrand Ceder

Massachusetts Institute of Technology, Cambridge, MA

Dane Morgan

University of Wisconsin, Madison, WI

slide-2
SLIDE 2

NGDM, October 10, 2007

Motivation: materials design through calculation

Run-time: polynomial scaling with number of atoms computing power: exponential scaling with time

Moore, G. ISSCC 2003 slides (http://www.intel.com) Skylaris, C. et. al. J. Phys. Chem. 122, 084119 (2005)

O(N3) O(N)

slide-3
SLIDE 3

NGDM, October 10, 2007

DFT as a predictive tool

Burkett, T. et. al. Phys. Rev. Lett. 93 (2004) Norskov, J. et. al. MRS Bulletin 31 (2006) Marzari, N. MRS Bulletin 31 (2006) courtesy of M. Lazzeri, Paris VI Jussieu Marzari, N. MRS Bulletin 31 (2006) courtesy of D. Scherlis, MIT

slide-4
SLIDE 4

NGDM, October 10, 2007

computational materials design strategies

Galli, G. University of California, Davis Lee, Y. S. et al. PRL 95 076804 (2005)

Calculating properties of realistic nanostructures ab initio

slide-5
SLIDE 5

NGDM, October 10, 2007

computational materials design strategies Which combinations yield the optimal material ?

slide-6
SLIDE 6

NGDM, October 10, 2007

Outline

Machine learning in Computational Materials Design Searching for Structure: combining historical information with Density Functional Theory Data Mining the High-Throughput engine wrap-up

slide-7
SLIDE 7

NGDM, October 10, 2007

computational materials design strategies Which combinations yield the optimal material ?

slide-8
SLIDE 8

NGDM, October 10, 2007

Motivation: searching for new materials

for i in (relevant chemistries) { ... ... getStablePhases(i); ... ... calculateProperty(i); i = nextChemistry();

}

Depends on which phases are stable and their structure

slide-9
SLIDE 9

NGDM, October 10, 2007

for i in (relevant chemistries) { ... ... getStablePhases(i); ... ... calculateProperty(i); i = nextChemistry();

}

Motivation: materials by design

Depends on which phases are stable and their structure Machine Learning needed here !!

slide-10
SLIDE 10

NGDM, October 10, 2007

The need for machine learning DFT Code Material Property Predictions Doesn't know what to calculate next

slide-11
SLIDE 11

NGDM, October 10, 2007

The need for machine learning DFT Code Material Property Predictions

Database

  • f

Computed and Experimental results

Machine Learning Framework

slide-12
SLIDE 12

NGDM, October 10, 2007

Computational Materials Design poised for impact 'Commodity' computational resources Open source electronic structure software ~$200-250k capital investment Computing budget ~50k compounds/year

slide-13
SLIDE 13

NGDM, October 10, 2007

Computational Materials Design poised for impact Computing budget ~50k compounds/year ICSD: World's Largest database of inorganic crystal structures First Entry: 1913 # of entries: 100,243 # usable compounds: 29,962

slide-14
SLIDE 14

NGDM, October 10, 2007

for i in (relevant chemistries) { ... ... getStablePhases(i); ... ... calculateProperty(i); i = nextChemistry();

}

The structure search problem Depends on which phases are stable and their structure Where do we put the atoms if no experimental structure is known ??

slide-15
SLIDE 15

NGDM, October 10, 2007

Strategies to search for structure Coordinate Search:

Optimize energy (or free energy) directly in the space

  • f atomic coordinates

Heuristic Rules

  • r

Chemical Intuition

slide-16
SLIDE 16

NGDM, October 10, 2007

Methods to search for structure Coordinate Search:

Optimize energy (or free energy) directly in the space

  • f atomic coordinates

# of dimensions = 3N – 3 + dim(a,b,c,α,β,γ)

GroundState≡arg min

r 1,  r 2 ,,  r N E 

r 1, r 2 ,, r N 

Doye, J. PRL, 88, 238701, (2002)

complex energy landscape

slide-17
SLIDE 17

NGDM, October 10, 2007

Methods to search for structure Coordinate Search:

Optimize energy (or free energy) directly in the space

  • f atomic coordinates

# of dimensions = 3N – 3 + dim(a,b,c,α,β,γ)

Doye, J. PRL, 88, 238701, (2002)

Proposed Solutions

Calculate energy of a finite set

  • f structure prototypes

GroundState≡arg min

r 1,  r 2 ,,  r N E 

r 1, r 2 ,, r N 

slide-18
SLIDE 18

NGDM, October 10, 2007

Methods to search for structure Coordinate Search:

Optimize energy (or free energy) directly in the space

  • f atomic coordinates

# of dimensions = 3N – 3 + dim(a,b,c,α,β,γ)

Doye, J. PRL, 88, 238701, (2002)

Proposed Solutions

Use a stochastic optimization procedure (hop from basin to basin) e.g., Simulated Annealing Genetic Algorithms

GroundState≡arg min

r 1,  r 2 ,,  r N E 

r 1, r 2 ,, r N 

Calculate energy of a finite set

  • f structure prototypes
slide-19
SLIDE 19

NGDM, October 10, 2007

Doye, J. PRL, 88, 238701, (2002)

Proposed Solutions

Use a stochastic optimization procedure (hop from basin to basin) e.g., Simulated Annealing Genetic Algorithms Calculate energy of a finite set

  • f structure prototypes

Methods to search for structure Coordinate Search:

Optimize energy (or free energy) directly in the space

  • f atomic coordinates

# of dimensions = 3N – 3 + dim(a,b,c,α,β,γ)

GroundState≡arg min

r 1,  r 2 ,,  r N E 

r 1, r 2 ,, r N 

Knowledge is not transferred across chemistries

slide-20
SLIDE 20

NGDM, October 10, 2007

Methods to search for structure

Heuristic Rules

Use previous experiments to suggest what to calculate How ? Identify a set of simple parameters based on alloy constituents 1932: Pauling electronegativity



1935: Laves & Witte

rA ,B

1926,1936-7: Hume-Rothery, Mott & Jones

nat

e

1976: Miedema

nws

e

slide-21
SLIDE 21

NGDM, October 10, 2007

Methods to search for structure

Heuristic Rules

Plot stable structures in space of parameters 1983: Villars



rA ,B nat

e

1986: Pettifor

slide-22
SLIDE 22

NGDM, October 10, 2007

Methods to search for structure

Heuristic Rules

Plot stable structures in space of parameters 1983: Villars



rA ,B nat

e

1986: Pettifor

Heuristic rules efficiently code historical knowledge provide transfer of knowledge Can we leverage historical knowledge to intelligently search for structure ?

slide-23
SLIDE 23

NGDM, October 10, 2007

Knowledge Base Experimental Data

description of knowledge base Pauling File binaries edition (Villars, P. et. al. J. of Alloys and Compounds, (2004))

1335 binary alloys 3975 non-unique compounds 4263 compounds total alloys not containing elements: He, B, C, N, O, F, Ne, Si, P, S, Cl, Ar, As, Se, Br, Kr, Te, I, Xe, At, Rn

slide-24
SLIDE 24

NGDM, October 10, 2007

Low temperature state of alloy

x=xA ,x0,,x1

2

,,xB Data≡{x 1,, xN}

database of N binary alloys Machine learning framework: concepts

slide-25
SLIDE 25

NGDM, October 10, 2007

Low temperature state of alloy Probability of low temperature state (fitted to data)

x=xA ,x0,,x1

2

,,xB px  p  x∣e

Probability of low temperature state conditioned on evidence 'e'

Data≡{x 1,, xN}

database of N binary alloys Machine learning framework: concepts

slide-26
SLIDE 26

NGDM, October 10, 2007

how to use the machine learning framework DFT Code Material Property Predictions

Database

  • f

Computed and Experimental results

Machine Learning Framework p  x∣e

Set of likely structure candidates

slide-27
SLIDE 27

NGDM, October 10, 2007

Preliminaries and open questions Are probabilities consistent with physical intuition ? Do probabilities encode the physics of structure stability ?

slide-28
SLIDE 28

NGDM, October 10, 2007

1 g(2)(xi,xj) anti-correlated correlated uncorrelated

gijxi, xj= pxi ,x j pxipx j

Pair Cumulant

probability that both structures

  • ccur in same system

estimated from database probability that only xi occurs

quantifying correlation in probabilistic framework

slide-29
SLIDE 29

NGDM, October 10, 2007

Do probabilities embody real physical effects ?

gijxi, xj= pxi ,x j pxipx j

Do probabilities embody real physical effects ? Compounds stabilized by “size” effect: Fe3C MgCu2

Data from Pauling File, Binaries Edition

1 2

1

3 4

1 4 2 3

1 3

cB 8.48 how probabilities represent physics of mixing

slide-30
SLIDE 30

NGDM, October 10, 2007

Do probabilities embody real physical effects ? Do probabilities embody real physical effects ? Compounds stabilized by “size” effect: Fe3C MgCu2

Data from Pauling File, Binaries Edition

1 2

1

3 4

1 4 2 3

1 3

cB 8.48 how probabilities represent physics of mixing ~0 Places 'small' atoms

  • n 'large' atom sites
  • G. Ceder

gijxi, xj= pxi ,x j pxipx j

slide-31
SLIDE 31

NGDM, October 10, 2007

how probabilities represent physics of mixing: more interesting correlations Gd2Co7 PuNi3 Both structures share the same local environments AABAAB... stacking B A A B A A ABAB... stacking B A B A

gijxi, xj=54

slide-32
SLIDE 32

NGDM, October 10, 2007

Structure correlation observations Correlation factors are probabilistic analogue

  • f heuristic rules

No explicit reference to physics. Physics is embedded in experimental data

slide-33
SLIDE 33

NGDM, October 10, 2007

Information theory for structure stability Suppose I know Fe3C forms @ c = ¾, how does this change prediction @ c = ½ ?

Mutual Information

Ii , j=∑

xi,x j

pxi,x jlog pxi,x j pxipx j Ii , j=〈log [gijxi ,x j]〉

How much information is carried by knowledge of structure ?

slide-34
SLIDE 34

NGDM, October 10, 2007

degree of correlation

Each element of matrix is correlation between Xi and Xj

Ii , j=∑

xi,x j

pxi,x jlog pxi,x j pxipx j

e.g., Xi=”AB prototype” and Xj=”A2B prototype” Information theory for structure stability

slide-35
SLIDE 35

NGDM, October 10, 2007

Prediction and validation in Li-Pt

slide-36
SLIDE 36

NGDM, October 10, 2007

Predicting structures in Li-Pt

AlB2 LiRh ?? MgCu2 CuPt7 a.k.a. MgPt7

p  x∣e Use these as conditioning evidence for:

slide-37
SLIDE 37

NGDM, October 10, 2007

Predicting structures in Li-Pt

Suggested phases Known phases

slide-38
SLIDE 38

NGDM, October 10, 2007

cross validation to evaluate performance DFT Code Material Property Predictions

Database

  • f

Computed and Experimental results

Machine Learning Framework p  x∣e

Set of likely structure candidates

Success of method depends on how short this list is

slide-39
SLIDE 39

NGDM, October 10, 2007

Independent Variables Including structure correlation

10 candidates

  • -> 95%

chance of seeing GS !!

Nature Materials, 6, 641-646, 2006

~28 candidates req'd for freq.

Length of List = average 'loss'

Cross validation results

slide-40
SLIDE 40

NGDM, October 10, 2007

Some open questions ICSD: World's Largest database of inorganic crystal structures First Entry: 1913 # of entries: 100,243 # usable compounds: 29,962 # structure prototypes: 2,485 What is the information content in a chemical database? How many 'independent' crystal structures exist in nature ?

slide-41
SLIDE 41

NGDM, October 10, 2007

Structure prediction: wrap-up for i in (relevant chemistries) { ... ... getStablePhases(i); ... ... calculateProperty(i); i = nextChemistry();

}

Now have efficient tool for this Much more needed here

slide-42
SLIDE 42

NGDM, October 10, 2007

Directions for future work/collaboration DFT Code Material Property Predictions

Database

  • f

Computed results

Machine Learning Framework

slide-43
SLIDE 43

NGDM, October 10, 2007

Directions for future work/collaboration DFT Code Set of features

  • Charge Density
  • Total energy
  • Bulk moduli
  • Coordination
  • Bond strength
  • Bond character
  • Magnetic moments
  • Polarization
  • ...
slide-44
SLIDE 44

NGDM, October 10, 2007

Directions for future collaboration

Database

  • f

Computed results

DFT Code Machine Learning Framework (functional mapping) Material Properties

  • catalytic activity
  • conductivity
  • plasticity
  • voltage/energy density
slide-45
SLIDE 45

NGDM, October 10, 2007

The End

ITR grant (DMR-031253) http://datamine.mit.edu

Data from High Throughput alloy study Online structure predictor

slide-46
SLIDE 46

NGDM, October 10, 2007

  • introduce CMS, what is it being applied

to ?

  • Data mining and materials design – make

some outline slide ?

  • introduce structure prediction problem,

present our solution

  • discuss higher order property prediction.

data management, dissemination

DELETE ME !!!

slide-47
SLIDE 47

NGDM, October 10, 2007

DATASET NOTES

1335 alloys 3975 non-unique compounds 4263 compounds total alloys not containing elements: He, B, C, N, O, F, Ne, Si, P, S, Cl, Ar, As, Se, Br, Kr, Te, I, Xe, At, Rn