US-EPA Comptox Chemicals Dashboard to support mass spectrometry - - PowerPoint PPT Presentation

us epa comptox chemicals dashboard to support mass
SMART_READER_LITE
LIVE PREVIEW

US-EPA Comptox Chemicals Dashboard to support mass spectrometry - - PowerPoint PPT Presentation

http://www. orcid.org/0000-0002-2668-4821 US-EPA Comptox Chemicals Dashboard to support mass spectrometry targeted and non-targeted analysis Antony Williams 1 , Alex Chao 2 , Tom Transue 3 , Tommy Cathey 3 , Elin Ulrich 4 and Jon Sobus 4 1)


slide-1
SLIDE 1

US-EPA Comptox Chemicals Dashboard to support mass spectrometry targeted and non-targeted analysis

Antony Williams1, Alex Chao2, Tom Transue3, Tommy Cathey3, Elin Ulrich4 and Jon Sobus4

1) National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC 2) Oak Ridge Institute of Science and Education (ORISE) Research Participant, RTP, NC 3) General Dynamics Information Technology, RTP, NC 4) National Exposure Research Laboratory, U.S. Environmental Protection Agency, RTP, NC

August 2019 ACS Fall Meeting, San Diego http://www.orcid.org/0000-0002-2668-4821

The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA

slide-2
SLIDE 2

CompTox Chemicals Dashboard

https://comptox.epa.gov/dashboard

1

875k Chemical Substances

slide-3
SLIDE 3

Detailed Chemical Pages

2

slide-4
SLIDE 4

Sources of Exposure to Chemicals

3

slide-5
SLIDE 5

Physicochemical properties and environmental fate and transport

4

slide-6
SLIDE 6

CompTox Chemicals Dashboard

  • Can provide access to toxicity, environmental

fate and transport and metabolism data

  • Individual chemicals can map to degradation

products and metabolites

  • Advanced searches support mass and formula

searches

5

slide-7
SLIDE 7

Link farm to public resources

6

slide-8
SLIDE 8

MassBank of North America https://mona.fiehnlab.ucdavis.edu

7

slide-9
SLIDE 9

Toxicity Estimation Software Tool (TEST) Real Time Predictions

8

slide-10
SLIDE 10

Mass & Formula Searching

9

slide-11
SLIDE 11

Advanced Searches Mass Search

10

slide-12
SLIDE 12

Advanced Searches Mass Search

11

slide-13
SLIDE 13

MS-Ready Structures for Formula Search

12

slide-14
SLIDE 14

“MS-Ready Structures”

https://doi.org/10.1186/s13321-018-0299-2

13

slide-15
SLIDE 15

14

slide-16
SLIDE 16

MS-Ready Mappings

15

slide-17
SLIDE 17

MS-Ready Mappings Set

16

slide-18
SLIDE 18

MS-Ready Mappings

  • EXACT Formula: C10H16N2O8: 3 Hits

17

slide-19
SLIDE 19

MS-Ready Mappings

  • Same Input Formula: C10H16N2O8
  • MS Ready Formula Search: 125 Chemicals

18

slide-20
SLIDE 20

MS-Ready Mappings

  • 125 chemicals returned in total

– 8 of the 125 are single component chemicals – 3 of the 8 are isotope-labeled – 3 are neutral compounds and 2 are charged

19

slide-21
SLIDE 21

Candidate ranking

20

slide-22
SLIDE 22

Data Source Ranking of “known unknowns”

21

  • Mass and/or formula is for an

unknown chemical but contained within a reference database

  • Most likely candidate chemicals

have the most associated data sources, most associated lit. articles or both

C14H22N2O3 266.16304 Chemical Reference Database Sorted candidate structures

slide-23
SLIDE 23

Is a bigger database better?

22

  • ChemSpider was 26 million chemicals then
  • Much BIGGER today
  • Is bigger better??
slide-24
SLIDE 24

Using Metadata for Ranking

  • Use available metadata to rank candidates

– Associated data sources

  • Associated lists in the underlying database
  • Associated data sources in PubChem
  • Specific types (e.g. water, surfactants, pesticides etc.)

– Number of associated literature articles (Pubmed) – Chemicals in the environment – the number of products/categories containing the chemical is a very important source of data

23

slide-25
SLIDE 25

Identification ranks for 1783 chemicals using multiple data streams

24

DS: Data Sources PC: PubChem PM: PubMed STOFF: DB KEMI: DB

slide-26
SLIDE 26

Comparing Search Performance

25

  • Dashboard content was 720k chemicals
  • Only 3% of ChemSpider size
  • What was the comparison in performance?
slide-27
SLIDE 27

SAME dataset for comparison

26

slide-28
SLIDE 28

How did performance compare?

27

For the same 162 chemicals, Dashboard outperforms ChemSpider

slide-29
SLIDE 29

How did performance compare?

28

slide-30
SLIDE 30

Will the correct Microcystin LR Stand Up? ChemSpider Skeleton Search

29

slide-31
SLIDE 31

Comparing ChemSpider Structures

30

slide-32
SLIDE 32

Comparing ChemSpider Structures

31

slide-33
SLIDE 33

Other Searches

32

slide-34
SLIDE 34

Batch Searching

  • Singleton searches are useful but we work

with thousands of masses and formulae!

  • Typical questions

– What is the list of chemicals for the formula CxHyOz – What is the list of chemicals for a mass +/- error – Can I get chemical lists in Excel files? In SDF files? – Can I include properties in the download file?

33

slide-35
SLIDE 35

Batch Searching Formula/Mass

34

slide-36
SLIDE 36

Searching batches using MS-Ready Formula (or mass) searching

35

slide-37
SLIDE 37

Related Searches to Support Mass Spectrometry

36

slide-38
SLIDE 38

Find me “related structures” Formula-Based Search

37

slide-39
SLIDE 39

Select Chemicals of Interest

38

slide-40
SLIDE 40

Find me “related structures” Based on Structure Similarity

39

slide-41
SLIDE 41

Find me “related structures” Based on Structure Similarity

40

slide-42
SLIDE 42

Find me “related structures” Structure Similarity – sort on mass

41

slide-43
SLIDE 43

Chemical lists

42

slide-44
SLIDE 44

Chemical Lists

43

slide-45
SLIDE 45

EPAHFR: Hydraulic Fracturing

44

slide-46
SLIDE 46

List of Opioids – Presence in Lists?

45

slide-47
SLIDE 47

Batch Search Names

46

Excel Download

slide-48
SLIDE 48

Batch Search in specific lists

47

slide-49
SLIDE 49

API services and Open Data

  • Available API and web services
  • Open Data available for download

48

slide-50
SLIDE 50

Web Services https://actorws.epa.gov/actorws/

  • Dozens of web services to provide access

to data

  • Data in UI, JSON and XML format

49

slide-51
SLIDE 51

Example: InChIKey to DTXCIDs

50

https://actorws.epa.gov/actorws/dsstox/v02/msready?identifier =UVOFGKIRTCCNKG-UHFFFAOYSA-N

slide-52
SLIDE 52

MassBank mapping to Dashboard

51

slide-53
SLIDE 53

Benefits of Open Data

52

slide-54
SLIDE 54

NORMAN Suspect List Exchange

https://www.norman-network.com/?q=node/236

53

slide-55
SLIDE 55

Integration to MetFrag in place

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0299-2

54

slide-56
SLIDE 56

In Progress

55

slide-57
SLIDE 57

Work in Progress

  • Predicted Spectra for candidate ranking

– Viewing and Downloading pre-predicted spectra – Search spectra against the database

56

slide-58
SLIDE 58

Predicted Mass Spectra

http://cfmid.wishartlab.com/

  • MS/MS spectra prediction for ESI+, ESI-, and EI
  • Predictions generated and stored for >800,000

structures, to be accessible via Dashboard

57

slide-59
SLIDE 59

Search Expt. vs. Predicted Spectra

slide-60
SLIDE 60
  • Predictions generated and stored for >700,000 structures
  • Python code to score experimental vs predicted spectra
  • Cosine dot product match score calculation

August 26, 2019 Nontargeted screening of wastewater for water reuse using mass spectrometry Current Advances in Water Analysis 59

CFM-ID Predicted Library Available

slide-61
SLIDE 61

Prototype Development Structure/substructure search

60

slide-62
SLIDE 62

Conclusion

  • Dashboard access to data for ~875,000 chemicals
  • MS-Ready data facilitates structure identification
  • Related metadata facilitates candidate ranking

61

  • Relationship mappings and

chemical lists of great utility

  • Dashboard and contents

are one part of the solution

  • New API and Web Services

are in development

slide-63
SLIDE 63

Acknowledgements

  • NCCT IT development team
  • Tommy Cathey, ACTOR Web Services
  • Nancy Baker, Abstract Sifter
  • Todd Martin & Valery Tkachenko,

WebTEST

  • Kathie Dionisio & Kristin Isaacs, CPDat
  • Thanks to Emma Schymanski, University
  • f Luxembourg, for coordinating all efforts

with the NORMAN Network for curation of lists on the Suspect Exchange

slide-64
SLIDE 64

Contact

Antony Williams

US EPA Office of Research and Development National Center for Computational Toxicology EMAIL: Williams.Antony@epa.gov ORCID: https://orcid.org/0000-0002-2668-4821

63