Data Mining of Chemical Compounds Using Functional Groups Ali - - PowerPoint PPT Presentation

data mining of chemical compounds using functional groups
SMART_READER_LITE
LIVE PREVIEW

Data Mining of Chemical Compounds Using Functional Groups Ali - - PowerPoint PPT Presentation

Data Mining of Chemical Compounds Using Functional Groups Ali Rathore Chabot College Electrical Engineering & Computer Science Mentor: Sayan Ranu Advisor: Dr. Ambuj Singh Department: Computer Science Database & Bioinformatics Lab


slide-1
SLIDE 1

Data Mining of Chemical Compounds Using Functional Groups

Ali Rathore Chabot College Electrical Engineering & Computer Science

Mentor: Sayan Ranu Advisor: Dr. Ambuj Singh Department: Computer Science Database & Bioinformatics Lab (DBL) Funding: National Science Foundation

Division of Information & Intelligent Systems

slide-2
SLIDE 2

Data Mining

  • f Chemical Compounds
slide-3
SLIDE 3

Research Goals

Database Datamine Parameters Aniline Phenol Exam ple Significant Substructures Pattern Set Benzene

Neighborhood of Each Atom

Molecule Characterization

Functional Groups

Alkenyl Ethylene Hydroxyl Methanol

slide-4
SLIDE 4

Research Method

Database Significant Substructures Pattern Set

Load Data into Computer Mine the Data Get Results Preprocess Data

Original Method New Method

GraphSig

slide-5
SLIDE 5

Runtim e Com parison of OA, LEAP and GraphSig

Time vs. Database Size

~ 55 mins ~ 42 mins ~ 9 mins

  • S. Ranu, A. Singh. “GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases”

GraphSig Results

78.2 76.7 70.2 Average

73 71 64 Yeast 81 75 65 UACC-257 77 76 70 SW-620 80 80 75 SN12C 80 77 75 SF-295 76 76 66 PC-3 84 84 79 P388 79 78 67 OVCAR-8 80 79 79 NCI-H23 74 72 65 MOLT-4 77 76 68 MCF-7

GraphSig

Scalable Leap Search Optimal Assignment Kernel

Database

Comparison of “Accuracy” (Score out of 100)

slide-6
SLIDE 6

GraphSig Results

AIDS Database GraphSig Parameters 3-azido-thymidine (AZT) Most used medicine for controlling HIV virus. Leukemia Database GraphSig Parameters

  • Only difference is presence of

Antimony (Sb) and Bismuth (Bi)

  • May lead chemists to try other

metals from same group

  • Sb & Bi cannot be mined using
  • ther techniques.
slide-7
SLIDE 7

Preprocessing Method

Replace with “Atoms” Find Functional Groups New Molecules Original Molecules

“Better” Pattern Set Significant Substructures GraphSig

slide-8
SLIDE 8

Data Mining

Automated extraction of implicit information. Discovery of previously unknown patterns. Analysis of databases of chemical compounds. Allows chemists to: Predict behavior of new compounds. Identify compounds with wanted properties. Allows pharmacists to: Create drugs using significant substructures. Classify compounds as active or inactive.

Of Chemical Compounds

Summary

slide-9
SLIDE 9

Acknowledgements

Liu-Yen Kram er, CNSI Education Programs Development Analyst

  • Dr. Evelyn Hu, CNSI Scientific Director

Jens-Uw e Kuhn, INSET Program Coordinator

  • Dr. Nick Arnold, INSET Faculty Coordinator

Sayan Ranu, Graduate Student Mentor

  • Dr. Am buj Singh, Computer Science Faculty Advisor

Everyone at Database & Bioinformatics Lab

slide-10
SLIDE 10

Thank You Questions?

slide-11
SLIDE 11

Research Method

Database Significant Substructures Pattern Set

Load Data into Computer Mine the Data Get Results Preprocess Data

Original Method New Method

GraphSig

slide-12
SLIDE 12

31 58.97 90.757 122.67

20 40 60 80 100 120 140

Seconds

10 20 30 40

Number of Molecules (Thousand)

Preprocessing Timings

Replace Functional Groups Find FGs Total

Preprocessing Results

25.6 12.76 5 10 15 20 25 30 Number of Atoms Original Molecules New "Molecules"

Average Molecule Size

slide-13
SLIDE 13
  • S. Ranu, A. Singh. “GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases”

GraphSig Results

78.2 76.7 70.2 Average

73 71 64 Yeast 81 75 65 UACC-257 77 76 70 SW-620 80 80 75 SN12C 80 77 75 SF-295 76 76 66 PC-3 84 84 79 P388 79 78 67 OVCAR-8 80 79 79 NCI-H23 74 72 65 MOLT-4 77 76 68 MCF-7

GraphSig

Scalable Leap Search Optimal Assignment Kernel

Database

Comparison of “Accuracy”

Runtim e Com parison of OA, LEAP and GraphSig

Time vs. Database Size