Computing Server 2008 joint project between Nile University, - PowerPoint PPT Presentation

WinBioinfTools: Bioinformatics Tools for Windows High Performance Computing Server 2008 joint project between Nile University, Microsoft Egypt, and Cairo Microsoft Innovation Center Mohamed Abouelhoda Nile University 1

Nile University • Established in 2006 as a first non-profit research university • Specialized in • Information and Communication Technology and related fields and their applications • Research centers • Center for Informatics Sciences (CIS) • Center for Wireless Intelligent Networks (WINC) • Center for Innovation & Competitiveness (CIC) • Modern Master Programs • 9 Master programs in IT, Micro-electronics, Management, Business, Transportation systems, and construction management • Recent undergraduate program • Engineering and management programs Nile University 2

Research Groups • Established in June 2008 • 9 Senior Scientists , 36 Junior scientists • Mission: Address information rich problems of importance to the region and Egypt Nile University 3 3

State of the art Scientific Discovery & Business Insights Scientists Knowledge Workers Bioinformatics Medical Imaging Data Mining Data Analysis, Decision Making & Collaboration Tools Data Management Local Computing Local Data & Data & Information HPC Integration Tools Resources Software Tools Ubiquitous Networking Distributed Scientific Information & Resources Distributed Computing Distributed Data Sources Distributed 4 Resources & Remote (SQL, Web Sources, Sensors & Devices Software Access Images, Text)

Infrastructure of CIS Local CIS resources (first phase): • 21 Servers with 160 AMD/Intel Bioinformatics Applications cores and total 1TB RAM • 24 TB total Storage Extensible resources via partners Nile University • Microsoft, Imperial College, Bridge Project Shared Middleware: Standardized SOA interfaces, Service Composition, Utility- based Computing, …. Imperial College Other resources Microsoft CMIC Biblioteca London Nile University Alexandrina Bridge-Project Nile University 5

Group Leader: Mohamed Abouelhoda Co-Workers: 7 RAs Projects and Research: • NUBIOS: Nile University Bioinformatics Server • Plant , animal, bacterial, and virus computational genomics • Cancer Bioinformatics • High Performance Computing for Bioinformatics Applications Collaborators: Academic • Imperial College, Prof. Hani Gabra • National Cancer Institute, Egypt http://www.bioinf.nileu.edu.eg • Bielefeld University, Prof. Robert Giegerich • Agriculture Research Institute Industry • Cairo Microsoft Innovation Cenetr (CMIC), Egypt • IBM

WinBioinfTools: Bioinformatics Tools for Windows High Performance Computing Server 2008

Motivation  bioinformatics tools are essential for recent molecular biology research  Obstacles : • Open source bioinformatics tools are usually written for Unix/Linux, which are not so popular in life science community • Data size becomes prohibitively large to analyze on usual PC 8

Project Objectives  Providing WinBioinfTools to the biological community that  - runs under MS-windows - runs under computer cluster (Windows HPC Server 2008)  Primary focus on sequence analysis and comparative genomics - Distributed Sequence Alignment - Distributed BLAST (Basic Local Alignment Search Tool) - CoCoNUT (Computational Comparative GeNomics Utilities Toolkit)  Comparing the performance of the Windows based versions of these tools to the corresponding Linux based versions.

Resources  Human Resources o Mohamed Abouelhoda, Hisham Mohamed (Nile University) o Mohamed Zahran (collaborator, New York City University) o Tamer Shaalan (CMIC)  CMIC Lab: • Cluster of 4 nodes (2 Quad-core 2.6 GHz processors, 16GB RAM, 250 GB HD) • 1 Giga Ethernet Network • Windows HPC server 2008, with HPC Pack 2008 10

Why Sequence Analysis First? - We focused on sequence analysis tools Comparing short sequences  Parallel Sequence Alignment 1. 2. Comparing large genomic sequences  Parallel CoCoNUT 3. Database search  Parallel Blast Database search Genome - Sequence analysis helps in elucidating Comparison, Sequence alignment function and structure of genomic regions Database search - Example pipeline used in practice is HAVANA (Human And Vertebrate Analysis aNd Annotation)

Cluster Modes of Operation 1. Load balancing: task level parallelism – Most bioinformatics problems can be well solved under this category due to decomposability of data 2. (High Performance) Compute cluster: instruction level parallelism - Problems following this are very critical and form a bottleneck 12

Basic features of the Windows (HPC) Server 2008  High performance:  64bit version, accessing large memory, 16, 32, 64, 128 GB RAM  Cluster and multi-core support  Cluster management and monitoring tools  Load balancing: Job scheduler  Parallel computing: MS MPI  Interoperability: SUA (Support for Unix Applications), Cygwin also works  Virtualization: Hyper-V for virtual machines support 13

Sequence Alignment 14

Sequence Alignment mismatch S 1 TACAATCAA T _ ACAA TCA A S TCACTCAC TC AC_ _TCA C 2 Sequence Alignment insertion/deletion 2  Dynamic programming algorithms take time ( k =number of genomes, n =average O ( n ) genome length) Needlemann-Wunch, 1970 15

Dynamic Programming Algorithm  Sequence alignment aims at maximizing the similarities between sequences.  Optimal sequence alignment can be computed using dynamic programming.  For two sequences, the best alignment is computed by filling a 2D matrix, where the score at cell ( i,j ) is computed as follows: score ( i 1 , j 1 ) 1 , if S [ i ] S [ j ] ( 1 , 1 ), [ ] [ ] score i j if S i S j score ( i , j ) min (character deletion cost) score ( i 1 , j ) 1 (character deletion cost) score ( i , j 1 ) 1 16

Parallelization of the DP Algorithm  The cluster nodes cooperate in filling matrix (Compute Cluster Model)  The filling proceeds diagonal-wise, and the master node synchronizes the filling  The complexity reduces to O ( n 2 /k+tk ’ ), where t is the communication time, k is the number of cores , k’ is the number of cluster nodes. node 4 score ( i 1 , j 1 ) 1 , if S [ i ] S [ j ] node 3 score ( i 1 , j 1 ), if S [ i ] S [ j ] ( , ) min node 2 score i j (character deletion cost) score ( i 1 , j ) 1 node 1 (character deletion cost) score ( i , j 1 ) 1 synchronizing line, synchronized by the master node

Experimental Results  The running times (in seconds) for pairwise sequence alignment on one and 4 nodes. Time on 4 nodes Time on one Sequence Length Communication Processing Total node Time time 100 X 100 0.03623 0.000665 0.001765 0.0034 1000 X 1000 0.152653 0.005 0.014 0.04 5000 X 5000 0.142311 0.3 1 3.9 10000 X 10000 1.19 1.1 2.6 8.4 20000 X 20000 3.679 2 8 18 30000 X 30000 4 11 15 40 - In the first column, we list the sequence sizes, where 100x100 for example means that we aligned two sequences, each of100 character length.

Experimental Results - On the x-axis, we list the sequence sizes, where 100x100 for example means that we aligned two sequences, each of100 character length.

Database Search 20

Querying Biological Databases using BLAST Biological database formatting And querying 2 query 1 formatting results 3 1 MKWVWALLLLAAWAAAERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEG 50 RBP . ||| | . |. . . | : .||||.:| : 1 MKWVWALLLLAAWAAAERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEG 50 RBP 1 ...MKCLLLALALTCGAQALIVT..QTMKGLDIQKVAGTWYSLAMAASD. 44 lactoglobulin . ||| | . |. . . | : .||||.:| : 1 MKWVWALLLLAAWAAAERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEG 50 RBP 1 ...MKCLLLALALTCGAQALIVT..QTMKGLDIQKVAGTWYSLAMAASD. 44 lactoglobulin 51 LFLQDNIVAEFSVDETGQMSATAKGRVR.LLNNWD..VCADMVGTFTDTE 97 RBP . ||| | . |. . . | : .||||.:| : : | | | | :: | .| . || |: || |. 1 ...MKCLLLALALTCGAQALIVT..QTMKGLDIQKVAGTWYSLAMAASD. 44 lactoglobulin 51 LFLQDNIVAEFSVDETGQMSATAKGRVR.LLNNWD..VCADMVGTFTDTE 97 RBP 45 ISLLDAQSAPLRV.YVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTK 93 lactoglobulin : | | | | :: | .| . || |: || |. 51 LFLQDNIVAEFSVDETGQMSATAKGRVR.LLNNWD..VCADMVGTFTDTE 97 RBP 45 ISLLDAQSAPLRV.YVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTK 93 lactoglobulin 98 DPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAV...........QYSC 136 RBP : | | | | :: | .| . || |: || |. || ||. | :.|||| | . .| 45 ISLLDAQSAPLRV.YVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTK 93 lactoglobulin 98 DPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAV...........QYSC 136 RBP 94 IPAVFKIDALNENKVL........VLDTDYKKYLLFCMENSAEPEQSLAC 135 lactoglobulin || ||. | :.|||| | . .| 98 DPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAV...........QYSC 136 RBP 94 IPAVFKIDALNENKVL........VLDTDYKKYLLFCMENSAEPEQSLAC 135 lactoglobulin || ||. | :.|||| | . .| 94 IPAVFKIDALNENKVL........VLDTDYKKYLLFCMENSAEPEQSLAC 135 lactoglobulin 21

Large Scale Application of BLAST  BLAST (basic local alignment search tool): given a biological sequence it search for similar (sub) regions in the database Altschul et al. 1997  The database size is extremely large  The search time is proportional to the database length  Computer cluster provides an ideal solution for speeding up BLAST search Internet queries Institution Enterprise 22

Computing Server 2008 joint project between Nile University, - PowerPoint PPT Presentation

WinBioinfTools: Bioinformatics Tools for Windows High Performance Computing Server 2008 joint project between Nile University, Microsoft Egypt, and Cairo Microsoft Innovation Center Mohamed Abouelhoda Nile University 1 Nile University

Server Traffic Management Server Traffic Management Jeff Chase Duke University, Department of

Content Server Caching Network Client Web Server Browser Avoid Network Latency Avoid Queuing

Windows Server 2008 Training Vijay Bhalerao BCS, MCM, CISA, DCL,MCTS, ISO 27001 LA

Windows Server 2003 Windows Server 2008 Windows Server 2012 Hardwar are Innovat ation ion

Server Upgrades 6/25/19 Agenda Existing Server Infrastructure Reasons for upgrading

1 Handling Return Traffic Handling Return Traffic URL Switching URL Switching Idea: switch

Proxy Server, Network Address Translator, Firewall 1 Proxy Server 2 1 Introduction What

Installing a Web Server 1. Install a sample web server, which supports Servlets/JSPs. A light

Installing a Web Server 1. Install a sample web server, which supports Servlets/JSPs. A light

DB server limits (process/sessions) DB server limits (process/sessions) Carlos Fernando Gamboa,

Tutorial on Root Server System Root Server System Advisory Committee | October 2015 Outline 1.

Slow Orbit Feedback - Schematics oco Console Client Event @ 0.5Hz CDEV BPM CORBA Server

Deploying Citrix MetaFrame Presentation Server 3.0 with Windows Server 2003 Terminal Services

Active Server Availability Active Server Availability Feedback Feedback James Hamilton James

arato@biconsulting.hu rstats.budapestbi.hu R and Big Data Master Code Code Code Data Data

Server Design Server Design Srinidhi Varadarajan Topics Topics Types of servers Server

INVESTOR UPDATE JULY 2018 Jatenergy Limited DISCLAIMER This presentation has been prepared

2014 FULL YEAR RESULTS 25 FEBRUARY 2015 ir@glanbia.com www.glanbia.com www.glanbia.com

1H20 Results Presentation February 2020 ASX:BFC 1 BFC has been simplified around a singular

INFLUENCE OF STRUCTURAL ANISOTROPY INFLUENCE OF STRUCTURAL ANISOTROPY ON COMPRESSIVE FRACTURE

Nutrition informatics - data challenges and opportunities in FrieslandCampina Jan Geurts, ELIXIR

THE COSMIC OCTAVE As above so below Healing with planetary frequencies Human Fascination With

Have the Lessons of Lac-Mgantic Been Learned? Presentation, Canadian Nuclear Safety Commission,

The synpad a position sensing midi drum interface I will be talking today about my attempts