[PPT] - Programming in Python Lecture 8: Python Online Michael Schroeder PowerPoint Presentation

SLIDE 1

1

Programming in Python

Michael Schroeder Melissa Adasme

Lecture 8: Python Online

SLIDE 2

Motivation: Access to Web Resources

In most cases NO, since Web GUIs are simplified access points to the data!

Wildcards possible? Can I filter somewhere? Can I combine two different searches?

SLIDE 3

Solution: Programmatic Access

(Use programmatic access via power user gateways)

Wildcards / Search for substrings Filtering by selected properties Combination of different criteria

https://www.ebi.ac.uk/chembl/api/data/molecule?molecule_properties__mw_f reebase__lte=300&pref_name__iendswith=nib

Example Query (URL) 1 1 2 2 3 3

Schema of ChEMBL data https://www.ebi.ac.uk/chembl/api/data/molecule/schema

SLIDE 4

HTTP/REST

HTTP (Hypertext Transfer Protocol) is a protocol/architecture for the internet
specifies how data can be transferred between machines in a network
defines several methods, e.g. GET and POST, DELETE
REST (Representational State Transfer) describes how the architecture of HTTP

can/should be used as a uniform interface

REST or REST-like structures available in many web services APIs
Usually defined by URL (web address) and HTTP method (action on that address)

http://biowebsitexyz.com/pug/proteins GET POST List all proteins Create new protein entry (with data sent to server) http://biowebsitexyz.com/pug/proteins/p21 GET DELETE Get the data for protein 21 Delete entry for protein 21 on the server

Data is sent separately here, server creates new URL

SLIDE 5

Where can I use it?

Uniprot (Sequences)
ENRICHR (Ontology Enrichment)
PubMed (Literature)
PubChem, ChEMBL (chemical structures)
PDB (Structures)
etc.

Non-biologial databases and services Biological databases and services etc.

SLIDE 6

Constructing Queries

http://biowebsitexyz.com/pug/proteins GET List all proteins

Just the base URL for service

http://biowebsitexyz.com/pug/proteins? num_aa_gte=100 GET List all proteins with more than 100 amino acids http://biowebsitexyz.com/pug/proteins? num_aa_gte=100&organism=homo_sapiens GET List all human proteins with more than 100 amino acids

Simple filter Multiple criteria

We will focus on GET queries since you mostly will need to just read data from servers

SLIDE 7

Revision: XML files

■ We can store any data in XML, the eXtensible Mark-up Language, e.g. Medline

■ Logical data organisation: yes, XML schema, which is enforced ■ Physical data organisation: None, we cannot optimise retrieval for common queries ■ Hierarchical organization ■ Commonly used as an exchange format for data

<Article> <Journal> <ISSN>0270-7306</ISSN> <JournalIssue> <Volume>19</Volume> <Issue>11</Issue> <PubDate> <Year>1999</Year> <Month>Nov</Month> </PubDate> </JournalIssue> </Journal> <ArticleTitle>Differential regulation of the cell wall integrity mitogen-activated protein kinase pathway in budding yeast by the protein tyrosine phosphatases Ptp2 and Ptp3. </ArticleTitle> <Pagination> <MedlinePgn>7651-60</MedlinePgn> </Pagination> <Abstract> <AbstractText>Mitogen-activated protein kinases (MAPKs) are inactivated by dual-specificity and protein tyrosine phosphatases (PTPs) in yeasts. In Saccharomyces cerevisiae, two PTPs, Ptp2 and Ptp3, inactivate the MAPKs, Hog1 and Fus3, with different specificities... </AbstractText> </Abstract> <Affiliation>Department of Chemistry, University of Colorado, Boulder, Colorado 80309-0215, USA. </Affiliation>…

Application I:

What‘s the most recent article from the Schroeder group?

https://www.ncbi.nlm.nih.gov/pubmed https://www.ncbi.nlm.nih.gov/home/develop/api/

SLIDE 9

Application I:

What‘s the most recent article from the Schroeder group?

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi? db=pubmed&term=Michael+Schroeder%5Bauthor%5D

1 First we run the main query to obtain all articles from the group (with the author name Michael Schroeder) Documentation at https://www.ncbi.nlm.nih.gov/pmc/tools/developers/

SLIDE 10

Application I:

What‘s the most recent article from the Schroeder group?

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi? db=pubmed&term=Michael+Schroeder%5Bauthor%5D

1 First we run the main query to obtain all articles from the group (with the author name Michael Schroeder)

Documentation at https://www.ncbi.nlm.nih.gov/pmc/tools/developers/

ID of the last article published!

SLIDE 11

Application I:

What‘s the most recent article from the Schroeder group? 2 Then, using the article ID we get the details for it

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi? db=pubmed&id=31811259&format=xml

SLIDE 12

Application I:

What‘s the most recent article from the Schroeder group? 2 Title Then, using the article ID we get the details for it

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi? db=pubmed&id=31811259&format=xml

SLIDE 13

Application II: ChEMBL

Find compounds with desired properties 1 2

https://www.ebi.ac.uk/chembl https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-services

Not the same for all web services!!

SLIDE 14

Application II: ChEMBL

Find compounds with desired properties 1 Let‘s find compounds ending with rin with a MW between 150 and 200

SLIDE 15

Application II: ChEMBL

Find compounds with desired properties 1

Aspirin!!

https://www.ebi.ac.uk/chembl/api/data/molecule? molecule_properties__mw_freebase__gte=150& molecule_properties__mw_freebase__lte=200& pref_name__iendswith=rin

Let‘s find compounds ending with rin with a MW between 150 and 200

SLIDE 16

Application II: ChEMBL

Find compounds with desired properties

Canonical SMILES CC(=O)Oc1ccccc1C(=O)O

1

https://www.ebi.ac.uk/chembl/api/data/molecule? molecule_properties__mw_freebase__gte=150& molecule_properties__mw_freebase__lte=200& pref_name__iendswith=rin

Let‘s find compounds ending with rin with a MW between 150 and 200 :

SLIDE 17

Application II: ChEMBL

Documentation at https://www.ebi.ac.uk/chembl/ws

https://www.ebi.ac.uk/chembl/api/data/substructure/CC(=O)Oc1ccccc1C(=O)O

Find compounds with desired properties 2

(XML result data not shown)

Aspirin

CC(=O)Oc1ccccc1C(=O)O Let‘s find another molecule with aspirin as a substructure:

SLIDE 18

Application II: ChEMBL

Documentation at https://www.ebi.ac.uk/chembl/ws

https://www.ebi.ac.uk/chembl/api/data/substructure/CC(=O)Oc1ccccc1C(=O)O

Find compounds with desired properties 2

(XML result data not shown)

Let‘s find another molecule with aspirin as a substructure:

Second hit (CHEMBL7666) Aspirin

CC(=O)Oc1ccccc1C(=O)O

SLIDE 19

Important Information

Read the document of each service you are using
Sometimes you will need keys to have access
Don‘t send too many requests to the server (you could crash it or be blocked)
some services don‘t allow parallel requests

https://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST.html

With great power comes great responsibility!

USAGE POLICY: Please note that PUG REST is not designed for very large volumes (millions) of requests. We ask that any script or application not make more than 5 requests per second, in order to avoid overloading the PubChem servers. If you have a large data set that you need to compute with, please contact us for help on optimizing your task, as there are likely more efficient ways to approach such bulk queries.

SLIDE 20