Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it 1 - PowerPoint PPT Presentation

Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it 1

HIV life cycle and mechanism 2

Antiretroviral therapy 3

HIV-protease cleavage site Knowledge of the mechanism of HIV protease cleavage specificity is critical to the design of specific and effective HIV inhibitors. Searching for an accurate, robust, and rapid method to correctly predict the cleavage sites in proteins is crucial when searching for possible HIV inhibitors. Scope is to predict if a sequence of aminoacids will constitute a cleavage site Rögnvaldsson, You and Garwicz (2015) "State of the art prediction of HIV-1 protease cleavage sites", Bioinformatics, vol 31 (8), pp. 1204-1210. Kontijevskis, Wikberg and Komorowski (2007) "Computational Proteomics Analysis of HIV-1 Protease Interactome". Proteins: Structure, Function, and Bioinformatics, 68, 305 – 312. You, Garwicz and Rögnvaldsson (2005) "Comprehensive Bioinformatic Analysis of the Specificity of Human Immunodeficiency Virus Type 1 Protease". Journal of Virology, 79, 12477 – 12486.

Learning patterns in cleavage sites Accurate prediction of known cleavage and non- cleavage sites Identifying unknown sites. 5

Candidate sites Possible candidate sites are represented by an octamer within a protein sequence. An octamer is a sequence of 8 essential aminoacids 6

Data There are 2 datasets available: - 746 - 1625 Possible sites are represented as sequence of 8 letters among 20 (‘ARNDCQEGHILKMFPSTWYV’ representing different aminoacids) The known cleavage sites have label 1 The known non-cleavage sites have label -1 7

Problem 1: load the data Octamer are in alphabetic form: they cannot be directly loaded in Matlab!! 1) Scan each line in the file 2) Extract the character sequence 3) Provide a numerical code for each aminoacid 4) Extract the cleavage label 8

Problem 1: load the data % Use Matlab I/O c-like routines % Open I/O file stream datafile='725Data.txt'; F=fopen(datafile); %Read one line at a time until end of file count=0; while(~feof(F)) count=count+1; s=fgets(F); data(count,:)=sscanf(a,'%c%c%c%c%c%c%c%c,%i\n')'; count=count+1; end; 9

Code the sequences Now you have load all data in a 725x9 matrix: - The first 8 numbers of each rows are the ASCII code of a letter represening an aminoacid - The last number in each row is the label - Think of other possible numerical coding for the 20 different aminoacids that you can use 10

Problem 2: train a linear classifier Design a linear classifier to predict the cleavage sites. Evaluate the training error 1. Extract the octamere code 𝑦 𝑗 2. Extract the label: 𝑚(𝑗) 3. Create design matrix 𝑬 (adding the bias to each data point) and the label vector 𝑴 4. Estimate weight vector 𝒙 = 𝑬\𝑴 5. Classify each data point 𝑦 𝑗 = 𝒙 𝑈 1 𝑚(𝑗) = 𝒙 𝑈 𝑦 𝑗 11

Problem 3: estimate 𝐹𝑠𝑠 𝑫𝑾 Run a 10-fold cross validation for the classification. 1) Divide the dataset into 10 folds 1) At each cross-validation iteration 1) Use the current fold for test 2) Use the other 9 folds for train 3) Evaluate the classification error on the test fold 4) Store the test error 2) Evaluate mean and standard deviation of the test error 12

Problem 3: randomize the folds % Shuffle the data r=rand(size(data,1),1); [dummy,ind]=sort(r); data_shuffle=data(ind,1:8); label_shuffle=data(ind,9); %Evaluate numeber of data per fold N_fold=10; fold_data=fix(size(data,1)/N_fold); 13

Problem 3: cross validate %Cross validation for cv=1:10 % Find indexes of test data ntest=(cv-1)*N_fold+1:cv*N_fold; data_test=data_shuffle(ntest,:); % Find indexes of train data ind=ones(size(data,1),1); ind(ntest)=0; ntrain=find(ind); data_train=data_shuffle(ntest,:); % Learn the classifier on the trainig data % Evaluate the error on the test data classifier =... train_err(cv)=... test_err(cv)=... end; 14

Problem 4: change dataset 1) Run the same cross-validation procedure on the 1625 dataset ( 1625Data.txt ) 2) Run the learning on the 725 dataset and the test on the 1625 data set 3) Run the learning on the 1625 dataset and the test on the 725 data set 4) Evaluate and compare the difference errors (cross validation within the same data set, validation using the other data set) 15

Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it 1 - PowerPoint PPT Presentation

Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it 1 HIV life cycle and mechanism 2 Antiretroviral therapy 3 HIV-protease cleavage site Knowledge of the mechanism of HIV protease cleavage specificity is

The Analysis of Biomedical Data - The Analysis of Biomedical Data - - The Analysis of Biomedical

Image Data Stephen Bailey Instructor DataCamp Biomedical Image Analysis in Python Biomedical

ME Biomedical Engineering Prof. Madeleine Lowery UCD School of Electrical and Electronic

WELCOME ALL OUR TEAM. EMILY KWONG Biomedical Engineering DMITRY MALYSHEV Biomedical

Manchester Biomedical Research Centre Professor Ian Bruce, BRC Director Manchester Biomedical

Objects and Labels Stephen Bailey Instructor DataCamp Biomedical Image Analysis in Python

Biomedical Data I Kelly Ruggles, PhD Methods in Quantitative Biology Biomedical Data Types Next

Objectives Understand the breadth of biomedical informatics Know the biomedical

Spatial Transformation Stephen Bailey Instructor DataCamp Biomedical Image Analysis in Python

Biomedical Engineering A New Era of Multi-Disciplinary Research Jackrit Suthakorn, Ph.D. Norased

Biomedical Sciences Biomedical Sciences Year 1 Year 2 Year 3 MT HT TT MT HT TT MT HT

Biomedical applications based on Biomedical applications based on magnetic nanoparticles

Biomedical applications of magnetic Biomedical applications of magnetic nanoparticles:

Sparse stochastic processes and biomedical image reconstruction Michael Unser Biomedical Imaging

On the integration of On the integration of biomedical knowledge bases: biomedical knowledge

Intensity Values Stephen Bailey Instructor DataCamp Biomedical Image Analysis in Python Pixels

1 Non-competitive inhibition - A possible mechanism Non-competitive inhibition - Kinetics

TMC 125-C227 Etravirine versus Protease Inhibitor in ARV-Experienced TMC125-C227: Study Design

THE Theres going to be a fundamental change in the WORLD OF global economy unlike

Coins, Clubs, and Crowds: Coins, Clubs, and Crowds: Scaling and Decentralization in Scaling and

Information-Theoretic Analysis of Molecular (Co)Evolution Using Graphics Processing Units

Current and Future Treatments for COVID-19 Michael P. Veve, PharmD, MPH Assistant Professor,

Applying Motion Planning Techniques to Molecular Docking Mark Moll Physical and Biological

MOL2NET, 2018 , 4, http://sciforum.net/conference/mol2net-04 2 present research, we use the