Exploring the Use of TensorFlow to Predict Connection Table - - PowerPoint PPT Presentation

exploring the use of tensorflow to predict connection
SMART_READER_LITE
LIVE PREVIEW

Exploring the Use of TensorFlow to Predict Connection Table - - PowerPoint PPT Presentation

Exploring the Use of TensorFlow to Predict Connection Table Information within Chemical Structures Brodie Schroeder Machine Learning Basics Gives "computers the ability to learn without being explicitly programmed." - Arthur


slide-1
SLIDE 1

Exploring the Use of TensorFlow to Predict Connection Table Information within Chemical Structures

Brodie Schroeder

slide-2
SLIDE 2

Machine Learning Basics

  • Gives "computers the ability to learn without being explicitly programmed."
  • Arthur Samuel
  • Goal is to solve problems with “generalized” algorithms that apply to many

different problems

  • Unsupervised and supervised learning
  • Artificial Neural Networks and Deep Learning
slide-3
SLIDE 3

= ?

slide-4
SLIDE 4
slide-5
SLIDE 5

Basic Artificial Neural Network

slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9

y = <activation func>(Wx + b)

Softmax, Sigmoid, ReLU...

slide-10
SLIDE 10

= ?

slide-11
SLIDE 11

y = [0,0,1,0,0,0,0,0,0,0]

The image appears to be a ‘2’

slide-12
SLIDE 12

Goal for this Project

Given the XYZ coordinates of atoms and their bonding information within a chemical structure, predict a bonding table for all atoms.

slide-13
SLIDE 13

Example Dataset

benzene ACD/Labs0812062058 6 6 0 0 0 0 0 0 0 0 1 V2000 1.9050 -0.7932 0.0000 C 1.9050 -2.1232 0.0000 C 0.7531 -0.1282 0.0000 C 0.7531 -2.7882 0.0000 C
  • 0.3987 -0.7932 0.0000 C
  • 0.3987 -2.1232 0.0000 C
2 1 1 0 0 0 0 3 1 2 0 0 0 0 4 2 2 0 0 0 0 5 3 1 0 0 0 0 6 4 1 0 0 0 0 6 5 2 0 0 0 0 M END $$$$ XYZ coordinates and the atom type. This will be the ‘x’ input in our model. Connection information for bonding between atoms. We will use this to train our model.
slide-14
SLIDE 14

2.345, 1.652, 4.791, C 4.562, 8.345, 2.221, C 9.821, 3.323, 4.124, C 8.421, 5.341, 9.981, O 7.623, 3.253, 7.456, C 4.221, 6.213, 4.343, O

Parsing SDF Files

6.0, 0.0, 4.312, 6.223, 7.321, 3.221, 9.023, 6.0, 1.542, 0.0, 4.222, 8.231, 6.321, 1.999, 6.0, 2.221, 5.012, 0.0, 4.223, 6.723, 7.232, 8.0, 7.010, 3.011, 7.221, 0.0, 5.434, 7.777, 6.0, 4.312, 3.221, 3.563, 7.212, 0.0, 6.521, 8.0, 2.333, 5.321, 6.872, 6.454, 8.991, 0.0,

Atomic Number of Atom Euclidean Distance Between Atoms
slide-15
SLIDE 15

2, 1, 1, 3, 1, 2, 4, 2, 2, 5, 3, 1, 6, 4, 1, 6, 5, 2,

Parsing SDF Files

0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0,

Boolean Array of Connections
slide-16
SLIDE 16

Input and Training Data

6.0, 0.0, 4.312, 6.223, 7.321, 3.221, 9.023, 6.0, 1.542, 0.0, 4.222, 8.231, 6.321, 1.999, 6.0, 2.221, 5.012, 0.0, 4.223, 6.723, 7.232, 8.0, 7.010, 3.011, 7.221, 0.0, 5.434, 7.777, 6.0, 4.312, 3.221, 3.563, 7.212, 0.0, 6.521, 8.0, 2.333, 5.321, 6.872, 6.454, 8.991, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0,

6 x 7 = 42 6 x 6 = 36 x = y_ =

slide-17
SLIDE 17
  • Build two Python lists
  • List ‘a’ is a list of flattened 2d NumPy matrices containing euclidean

distances and atom type

  • List ‘b’ is a list of flattened 2d NumPy matrices containing bonding

information for all atoms

  • Matrix size is capped at 28 x 29 and 28 x 28 respectively (only molecules

with less than or equal to 28 atoms are included)

  • If smaller than 28 atoms the matrix is padded with zeros
  • a[n] corresponds to b[n]

Input and Training Data

slide-18
SLIDE 18

What is TensorFlow ?

  • “Open source software library for numerical computation using data flow
  • graphs. Nodes in the graph represent mathematical operations, while the

graph edges represent the multidimensional data arrays (tensors) communicated between them.”

  • Provides an API that makes it easy to setup, design and train deep learning

models.

slide-19
SLIDE 19

Building the Model

x = tf.placeholder(tf.float32, [None, 812]) W1 = tf.Variable(tf.truncated_normal([812, 784], stddev=0.1)) b1 = tf.Variable(tf.truncated_normal([784], stddev=0.1)) W2 = tf.Variable(tf.truncated_normal([784, 784], stddev=0.1)) b2 = tf.Variable(tf.truncated_normal([784], stddev=0.1)) W3 = tf.Variable(tf.truncated_normal([784, 784], stddev=0.1)) b3 = tf.Variable(tf.truncated_normal([784], stddev=0.1)) W = tf.Variable(tf.truncated_normal([784, 784], stddev=0.1)) b = tf.Variable(tf.truncated_normal([784], stddev=0.1)) layer1 = tf.add(tf.matmul(x, W1), b1) layer1 = tf.nn.relu(layer1) layer2 = tf.add(tf.matmul(layer1, W2), b2) layer2 = tf.nn.relu(layer2) layer3 = tf.add(tf.matmul(layer2, W3), b3) layer3 = tf.nn.relu(layer3) y = tf.add(tf.matmul(layer3, W), b) y = tf.nn.sigmoid(y) y_ = tf.placeholder(tf.float32, [None, 784])
slide-20
SLIDE 20 cross_entropy = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=y_, logits=y)) train_step = tf.train.AdamOptimizer(0.001).minimize(cross_entropy) sess = tf.InteractiveSession() tf.global_variables_initializer().run() a, b = get_batch() train_len = len(a) correct_prediction = tf.equal(y_, y) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) # Training for i in range(train_len): batch_xs = a[i] batch_ys = b[i] _, loss, acc = sess.run([train_step, cross_entropy, accuracy], feed_dict={x: batch_xs, y_: batch_ys}) print("Loss= " + "{:.6f}".format(loss) + " Accuracy= " + "{:.5f}".format(acc))

Building the Model

slide-21
SLIDE 21 # Test trained model cumulative_accuracy = 0.0 for i in range(train_len): acc_batch_xs = a[i] acc_batch_ys = b[i] cumulative_accuracy += accuracy.eval(feed_dict={x: acc_batch_xs, y_: acc_batch_ys}) print("Test Accuracy= {}".format(cumulative_accuracy / train_len))

Building the Model

slide-22
SLIDE 22

Results thus far...

slide-23
SLIDE 23

Test Accuracy = 0.865

  • Apprx. 10,000 training sets
slide-24
SLIDE 24

Future Improvements and Optimization

  • Cache results of parsing SDF file
  • Improve code for calculating distances
  • Improve initial values of weights
  • Overtraining or undertraining?
  • TensorBoard visualization
slide-25
SLIDE 25

Questions?

View the code: https://github.com/Allvitende/chemical-modeling/