Non-targeted analysis supported by data and cheminformatics - - PowerPoint PPT Presentation

non targeted analysis supported by data and
SMART_READER_LITE
LIVE PREVIEW

Non-targeted analysis supported by data and cheminformatics - - PowerPoint PPT Presentation

http://www. orcid.org/0000-0002-2668-4821 Non-targeted analysis supported by data and cheminformatics delivered via the US EPA CompTox Chemicals Dashboard Antony Williams , Alex Chao, Tom Transue, Tommy Cathey, Elin Ulrich and Jon Sobus 1)


slide-1
SLIDE 1

Non-targeted analysis supported by data and cheminformatics delivered via the US EPA CompTox Chemicals Dashboard

Antony Williams, Alex Chao, Tom Transue, Tommy Cathey, Elin Ulrich and Jon Sobus

1) National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC 2) Oak Ridge Institute of Science and Education (ORISE) Research Participant, RTP, NC 3) GDIT, Research Triangle Park, North Carolina, United State 4) National Exposure Research Laboratory, U.S. Environmental Protection Agency, RTP, NC

August 2019 ACS Fall Meeting, San Diego http://www.orcid.org/0000-0002-2668-4821

The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA

slide-2
SLIDE 2

An intro to the Dashboard

  • Freely available web-based database from the

National Center for Computational Toxicology

  • Providing data for 875,000 substances including

– Experimental and predicted physicochemical properties – In vivo toxicity data harvested from dozens of public resources – In vitro bioactivity data for thousands of chemicals and assays – Exposure data including chemicals in consumer products – Real time predictions for >20 physchem and toxicological endpoints

  • Dashboard is used by mass spectrometrists for

chemical identification

  • A quick view of general capabilities…

1

slide-3
SLIDE 3

CompTox Chemicals Dashboard

https://comptox.epa.gov/dashboard

2

875k Chemical Substances

slide-4
SLIDE 4

Detailed Chemical Pages

3

slide-5
SLIDE 5

Access to Chemical Hazard Data

4

slide-6
SLIDE 6

Sources of Exposure to Chemicals

5

slide-7
SLIDE 7

Link Access

6

Links based on chemical identifiers to dozens of

  • nline resources –

including analytical data

slide-8
SLIDE 8

MassBank of North America https://mona.fiehnlab.ucdavis.edu

7

slide-9
SLIDE 9

“MS-ready” structures

8

slide-10
SLIDE 10

Overview of MS-Ready Structures

  • All structure-based chemical substances are

algorithmically processed to

– Split multicomponent chemicals into individual structures – Desalt and neutralize individual structures – Remove stereochemical bonds from all chemicals

  • MS-Ready structures are then mapped to
  • riginal substances to provide a path between

chemicals detected by mass spectrometry to

  • riginal substances

9

slide-11
SLIDE 11

10

slide-12
SLIDE 12

MS-Ready Mappings from Details Page

11

slide-13
SLIDE 13

Two MS-Ready Mappings Set

12

slide-14
SLIDE 14

MS-Ready Mappings Set All substances containing component

13

slide-15
SLIDE 15

Mass/Formula Searching and Metadata Ranking

14

slide-16
SLIDE 16

Advanced Searches Mass Search

15

slide-17
SLIDE 17

Advanced Searches Mass Search

16

slide-18
SLIDE 18

MS-Ready Structures for Formula Search

17

slide-19
SLIDE 19

MS-Ready Mappings

  • EXACT Formula: C10H16N2O8: 3 Hits

18

slide-20
SLIDE 20

MS-Ready Mappings

  • Same Input Formula: C10H16N2O8
  • MS Ready Formula Search: 125 Chemicals

19

slide-21
SLIDE 21

MS-Ready Mappings

  • Exact Formula – 3 hits
  • MS-Ready Formula – 125 hits!!

– ONLY 8 of the 125 are single component chemicals – 3 are neutral compounds and 2 are charged

  • How can we rank the candidates list?

20

slide-22
SLIDE 22

Candidate ranking using metadata

21

slide-23
SLIDE 23

Data Source Ranking of “known unknowns”

22

  • A mass and/or formula search is

for an unknown chemical but it is a known chemical contained within a reference database

  • Most likely candidate chemicals

have the most associated data sources, most associated literature articles or both

C14H22N2O3 266.16304 Chemical Reference Database Sorted candidate structures

slide-24
SLIDE 24

The original ChemSpider work

23

slide-25
SLIDE 25

Is a bigger database better?

24

  • ChemSpider was 26 million chemicals for

the original work

  • Much BIGGER today
  • Is bigger better??
  • Are there other metadata to use for ranking?
slide-26
SLIDE 26

Using Metadata for Ranking

  • Chosen dashboard metadata to rank candidates

– Associated data sources

  • Lists in the underlying database (more about lists later)
  • Associated data sources in PubChem
  • Specific source types (e.g. water, surfactants, pesticides)

– Number of associated literature articles (Pubmed) – Chemicals in the environment – the number of products/categories containing the chemical is an important source of data (from CPDat database)

25

slide-27
SLIDE 27

Identification ranks for 1783 chemicals using multiple data streams

26

DS: Data Sources PC: PubChem PM: PubMed STOFF: DB KEMI: DB

Data Sources alone rank ~75% of the chemicals as Top Hit

slide-28
SLIDE 28

Comparing Search Performance

27

  • When dashboard contained 720k chemicals
  • Only 3% of ChemSpider size
  • What was the comparison in performance?
slide-29
SLIDE 29

SAME dataset for comparison

28

slide-30
SLIDE 30

How did performance compare?

29

For the same 162 chemicals, Dashboard outperforms ChemSpider for both Mass and Formula Ranking

slide-31
SLIDE 31

How did performance compare?

30

slide-32
SLIDE 32

Data Quality is important

  • Data quality in free web-based databases!

31

slide-33
SLIDE 33

Public Databases require curation

  • There is significant bloating in the public

databases because of lack of curation

  • The number of hits retrieved based on

mass or formula searching can explode based on poorly represented chemicals – especially stereochemistry issues

  • MS-Ready structures will map back to

multiple versions of “the same chemical”.

32

slide-34
SLIDE 34

Will the correct Microcystin LR Stand Up? ChemSpider Skeleton Search

33

slide-35
SLIDE 35

Comparing ChemSpider Structures

34

slide-36
SLIDE 36

Comparing ChemSpider Structures

35

slide-37
SLIDE 37

Other Searches

36

slide-38
SLIDE 38

Batch Searching mass and formula

37

slide-39
SLIDE 39

Batch Searching

  • Singleton searches are useful but we work

with thousands of masses and formulae!

  • Typical questions

– What is the list of chemicals for the formula CxHyOz – What is the list of chemicals for a mass +/- error – Can I get chemical lists in Excel files? In SDF files? – Can I include properties in the download file?

38

slide-40
SLIDE 40

Batch Searching Formula/Mass

39

slide-41
SLIDE 41

Searching batches using MS-Ready Formula (or mass) searching

40

slide-42
SLIDE 42

Mass Spectrometry Related Searches

41

slide-43
SLIDE 43

Find me “related structures” Formula-Based Search

42

slide-44
SLIDE 44

Select Chemicals of Interest

43

slide-45
SLIDE 45

Find me “related structures” Based on Structure Similarity

44

slide-46
SLIDE 46

Find me “related structures” Based on Structure Similarity

45

slide-47
SLIDE 47

Find me “related structures” Structure Similarity – sort on mass

46

slide-48
SLIDE 48

Chemical Lists

47

slide-49
SLIDE 49

Chemical Lists

48

slide-50
SLIDE 50

EPAHFR: Hydraulic Fracturing

49

slide-51
SLIDE 51

PFAS lists of Chemicals

50

slide-52
SLIDE 52

Research in Progress

51

slide-53
SLIDE 53

Predicted Mass Spectra

http://cfmid.wishartlab.com/

  • MS/MS spectra prediction for ESI+, ESI-, and EI
  • Predictions generated and stored for >800,000

structures, to be accessible via Dashboard

52

slide-54
SLIDE 54

Search Expt. vs. Predicted Spectra

slide-55
SLIDE 55

Search Expt. vs. Predicted Spectra

slide-56
SLIDE 56

Spectral Viewer Comparison

55

slide-57
SLIDE 57

Prototype Development

56

slide-58
SLIDE 58

Prototype Development

57

slide-59
SLIDE 59

API services and Open Data

  • Present API and web services available at

https://actorws.epa.gov/actorws/ but major redevelopment is underway

  • Downloadable data available via the

downloads page

58

slide-60
SLIDE 60

Web Services https://actorws.epa.gov/actorws/

  • Data in UI, JSON and XML format

59

slide-61
SLIDE 61

InChIKey to DTXCIDs

60

https://actorws.epa.gov/actorws/dsstox/v02/msready?identifier =UVOFGKIRTCCNKG-UHFFFAOYSA-N

slide-62
SLIDE 62

Data and Services used by the Community

61

slide-63
SLIDE 63

NORMAN Suspect List Exchange

https://www.norman-network.com/?q=node/236

62

slide-64
SLIDE 64

Integration to MetFrag in place

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0299-2

63

slide-65
SLIDE 65

MassBank mapping to Dashboard Based on Web Service lookup

64

slide-66
SLIDE 66

Conclusion

  • Dashboard access to data for ~875,000 chemicals
  • MS-Ready data facilitates structure identification
  • Related metadata facilitates candidate ranking

65

  • Relationship mappings and

chemical lists of great utility

  • Dashboard and contents

are one part of the solution

  • New developments in

progress, especially API development, will be very enabling…

slide-67
SLIDE 67

Acknowledgements

  • IT Development team – especially Jeff

Edwards and Jeremy Dunne

  • Chris Grulke for the ChemReg system
  • NERL colleagues – Jon Sobus, Elin Ulrich,

Mark Strynar, Seth Newton, Alex Chao

  • Emma Schymanski, LCSB, Luxembourg
  • NORMAN Network and all contributors

66

slide-68
SLIDE 68

Contact

Antony Williams

US EPA Office of Research and Development National Center for Computational Toxicology EMAIL: Williams.Antony@epa.gov ORCID: https://orcid.org/0000-0002-2668-4821

67