ML for Scent Alex Wiltschko, Benjamin Sanchez-Lengeling, Brian Lee, - - PowerPoint PPT Presentation

ml for scent
SMART_READER_LITE
LIVE PREVIEW

ML for Scent Alex Wiltschko, Benjamin Sanchez-Lengeling, Brian Lee, - - PowerPoint PPT Presentation

ML for Scent Alex Wiltschko, Benjamin Sanchez-Lengeling, Brian Lee, Carey Radebaugh, Emily Reif, Jennifer Wei Hi! Im Alex Wiltschko , a scientist at Google Research. I lead a research group within Google Brain that focuses on machine learning


slide-1
SLIDE 1

ML for Scent

Alex Wiltschko, Benjamin Sanchez-Lengeling, Brian Lee, Carey Radebaugh, Emily Reif, Jennifer Wei

slide-2
SLIDE 2

Hi!

I’m Alex Wiltschko, a scientist at Google Research. I lead a research group within Google Brain that focuses on machine learning for olfaction.

slide-3
SLIDE 3

Google Research

3500 Researchers & Engineers 18 offices, 11 countries Make machines intelligent. Improve people’s lives.

slide-4
SLIDE 4
  • Foundational research
  • Building tools to enable research & democratize AI/ML
  • AI-enabling Google products

Our Approach

slide-5
SLIDE 5
slide-6
SLIDE 6

Do for olfaction what machine learning has already done for vision and hearing. To digitize the sense of smell, and make the world’s smells and flavors searchable. Every flower patch, every natural gas leak, every item on every menu in every restaurant. We’re starting at the very beginning, with the simplest problem… but first, some olfaction facts!

What’s our goal?

slide-7
SLIDE 7

Most airflow is not smelled. Passes right

  • n through the lower turbinates to your

lungs. The OSNs are one of two parts of your brain that are exposed to the world (the

  • ther is the pituitary gland, and that’s in

blood, so only half-counts). Taste lives on your tongue. Flavor is both taste and retronasal olfaction, from a “chimney effect”.

slide-8
SLIDE 8

GPCR: G-protein coupled receptor OR: GPCR Olfactory Receptor OSN: Olfactory sensory neuron ~400 ORs expressed in humans (as

  • pposed to 3 types of cones)

~1000 in mice. ~2000 in elephants! One OR per OSN. ORs comprise 2% of your genome, but many are pseudogenes. OR structure is unknown, they are

  • uncrystallized. Further, only ~40

expressed in cell lines. Their ligand responses are broadly tuned, but many ORs (22/400) are still

  • rphans, with no known ligand.
slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11

People do smell difgerent things!

Mainland et al 2015 SNPs in single ORs result in sensory dimorphisms. The most famous

  • nes are:
  • OR7D4 T113M: normally funky beta-androstenone (boar taint)

is rendered pleasant.

  • OR5A1 N183D: nearly completely Mendelian. Carriers of the

mutation can detect beta-ionine at two orders of magnitude lower concentration

  • Olfactory sensory dimorphisms are likely common — humans

differ functionally at 30% of OR alleles.

  • ~4.5% of the world is colorblind (CBA)
  • 13% in the US has selective hearing loss (NIDCD)
  • All this to argue — smell is not defacto finicky or illogical.

Right now, we’re starting with the simplest problem

slide-12
SLIDE 12

Predict

“Smells sweet, with a hint of vanilla, some notes of creamy and back note of chocolate.”

Odor descriptors

slide-13
SLIDE 13

And why is this hard?

slide-14
SLIDE 14

We built a benchmark from perfumery raw materials

slide-15
SLIDE 15

Vanillin

1: sweet, vanilla, creamy, chocolate 2: sweet, vanilla, creamy, phenolic General agreement between repeated

  • ratings. All ratings by perfume experts.

We built a benchmark from perfumery raw materials

slide-16
SLIDE 16

... solvent

  • rangeflower

bready black currant radish

fruity green sweet floral woody

We built a benchmark from perfumery raw materials

slide-17
SLIDE 17

We built a benchmark from perfumery raw materials

  • dors
  • dors
slide-18
SLIDE 18

Historical SOR approaches

Pen & Paper

Rule-based principles for predicting odor. There are as many exceptions as there are rules.

Krafu’s vetiver rule (-)-khusimone 1,7-cyclogermacra-1 (10),4-dien-15-al 4,7,7-Trimethyl-1-methylidene spiro[4.5]decan-2-one Fig 3.22 Scent and Chemistry (Ohloff, Pickenhagen, Krafu) Ohloff’s rule Bajgrowicz and Broger’s ambergris

  • smophore model

Buchbauer’s santalols Boelens’ synthetic muguet

slide-19
SLIDE 19

Traditional Computational Approaches

Predict

  • Toxicity
  • Solubility
  • Photovoltaic

efficiency (solar cell)

  • Chemical potential

(batteries)

  • ...

“bag of sub-graphs” representation AKA molecular fingerprints

slide-20
SLIDE 20

Labeled Photos

“cat” “dog” “car” “apple” “flower”

slide-21
SLIDE 21

Unlabeled Photo

slide-22
SLIDE 22

“Hello, how are you?”

PIXELS AUDIO TEXT PIXELS

“lion” “How cold is it

  • utside?”

“你好,你好吗?” “A blue and yellow train travelling down the tracks”

Input Output

slide-23
SLIDE 23
slide-24
SLIDE 24

Graphs as input to neural networks: not just images, sounds or words

slide-25
SLIDE 25

Inside a GNN Converting a molecule to a graph

slide-26
SLIDE 26

Inside a GNN Propagating information & transforming a graph

slide-27
SLIDE 27

A GNN to predict odor descriptors

slide-28
SLIDE 28

And how well can we predict?

slide-29
SLIDE 29

A representation optimized for odor

Last layer embeddings 63 dimension vector

slide-30
SLIDE 30

Exploring the geometric space of odor

slide-31
SLIDE 31

Exploring the geometric space of odor

slide-32
SLIDE 32

What do nearby molecules look like?

Inspired by word embeddings. Are there “molecular synonyms”?

First, what do “nearest neighbors” look like if you use just structure, and ignore our neural network? Then, what do nearest neighbors look like to our GCN?

slide-33
SLIDE 33

herbal, nutty, coconut, coumarinic, cinnamon, sweet, hay, tobacco

dihydrocoumarin

Molecular neighbors: using structure

Acetyl thymol Tolyl decanoate berry, medicinal, fruity, phenolic medicinal, sweet, fruity, floral smoky, spicy, balsamic sweet, phenolic, floral spicy

  • rtho-cresyl isobutyrate
  • rtho-cresyl acetate

ethyl 3-(2-hydroxyphenyl) propionate

slide-34
SLIDE 34

2-benzofuran carboxaldehyde coumarin green, coumarinic phenolic, hay, lactonic, coconut, coumarinic, almond, sweet, powdery sweet, nutty, almond sweet, coumarinic, hay green, vanilla, nutty, coumarinic, spicy 1,4-benzodioxin-2(3H)-one coumane phthalide

Molecular neighbors: using GCN features

herbal, nutty, coconut, coumarinic, cinnamon, sweet, hay, tobacco

dihydrocoumarin

slide-35
SLIDE 35

You might hear ‘fine-tuning’ referred to as a strategy for ‘transfer learning’. Transfer learning in chemistry, today, rarely works. Do our embeddings transfer learn to other tasks?

Do these representations generalize?

Using a learned model to make predictions on a new task is ‘transfer learning’

slide-36
SLIDE 36

Do these representations generalize?

slide-37
SLIDE 37

DREAM Olfactory Challenge Dravnieks

Transfer-learned to achieve state-of-the-art on the two major olfactory benchmark tasks

slide-38
SLIDE 38

But why is the neural network making these predictions?

Toy test example: classify whether a molecule has benzene. Which atoms contribute to predictions?

Benzene?

This is just one task of potentially hundreds, of varying complexity.

slide-39
SLIDE 39

But why is the neural network making these predictions?

Toy test example: classify whether a molecule has benzene. Which atoms contribute to predictions?

slide-40
SLIDE 40

But why is the neural network making these predictions?

Toy test example: classify whether a molecule has benzene. Which atoms contribute to predictions? Positive examples Negative examples

slide-41
SLIDE 41

But why is the neural network making these predictions?

Odor percept — “garlic”

Positive examples Negative examples

slide-42
SLIDE 42

But why is the neural network making these predictions?

Odor percept — “fatty”

Positive examples Negative examples

slide-43
SLIDE 43

But why is the neural network making these predictions?

Odor percept — “vanilla”

Positive examples Negative examples

slide-44
SLIDE 44

But why is the neural network making these predictions?

Odor percept — “winey”

Positive examples Negative examples

slide-45
SLIDE 45

Σ( ) Σ( )

slide-46
SLIDE 46
  • Test ML-driven molecular design for

humans in a safe context.

  • Build bedrock understanding in

single-molecules before working on

  • dor mixtures
  • Build a foundational dataset for the ML
  • n molecules community.

Collecting interest & those interested in collaborating.

Future Directions

Benjamin Sanchez-Lengeling Brian Lee Carey Radebaugh Emily Reif Jennifer Wei Alex Wiltschko