Parallel and Hybrid Evolutionary Algorithm in Python E. Kieffer UL - - PowerPoint PPT Presentation

parallel and hybrid evolutionary algorithm in python
SMART_READER_LITE
LIVE PREVIEW

Parallel and Hybrid Evolutionary Algorithm in Python E. Kieffer UL - - PowerPoint PPT Presentation

Parrallel Computing & University of Luxembourg Optimization Group Parallel and Hybrid Evolutionary Algorithm in Python E. Kieffer UL HPC Userssession -- UL HPC school 2017 Contents n Context and motivation n Clustering of the


slide-1
SLIDE 1

University of Luxembourg

Parallel and Hybrid Evolutionary Algorithm in Python

UL HPC Users’session -- UL HPC school 2017

  • E. Kieffer

Parrallel Computing & Optimization Group

slide-2
SLIDE 2

nContext and motivation

n Clustering of the Parkinson Disease Map n Bi-level Clustering approach

nPython tools on the UL HPC Platform

nCPLEX solver nSCOOP library nDEAP library

nExperiments & Validation

n Experiments on the Parkinson Disease Map n Comparison with Hierarchical Clustering

Contents

slide-3
SLIDE 3

CONTEXT & MOTIVATION

slide-4
SLIDE 4

Parkinson Disease Map

  • Large (hyper-)Graph
  • Extract Knowledge
  • First experiments with

standard Clustering approach

  • Hierarchical Clustering
  • Several metric (e.g. GO, NET, EU)
  • Hard to combine
slide-5
SLIDE 5

n Clustering often based on a two phase algorithm:

n Find cluster representatives n Assign data to clusters

n Generally the same metric is used for both steps n Consider these two steps as two nested optimization problems with different

metrics

n Metric:

n Euclidean distance n Network distance n Distance based on Gene/Disease Ontology

n Use Evolutionary Algorithm (EA) to solve the Bi-level Clustering problem n Use MOEA to detect the number of clusters

Bi-level Clustering

slide-6
SLIDE 6

Bi-level Optimization

n Bi-levels ßà Nested problems n A problem constraining another one à NP-hard even for convex levels

Upper-level Lower-level

slide-7
SLIDE 7

Bi-level Clustering

slide-8
SLIDE 8

Parallel and hybrid EA

HPC

slide-9
SLIDE 9

PYTHON TOOLS ON THE UL HPC PLATFORM

slide-10
SLIDE 10

Using CPLEX on the UL HPC

n IBM ILOG CPLEX Optimizer's mathematical programming technology. n One of the most efficient solver on the market: n CPLEX available for HPC user with IBM Academic Initiative membership

n Need first to register to the IBM Academic Initiative: n https://developer.ibm.com/academic/ n Forward the membership confirmation mail to the HPC admins

n To use CPLEX on the cluster:

n $ module use $PROJECTWORK/cplex/soft/modules

$ module load CPLEX

slide-11
SLIDE 11

n Scalable COncurrent Operations in Python

n is a distributed task module n concurrent parallel programming n on various environments, from heterogeneous grids to supercomputers n Command to execute a python script using SCOOP n python -m scoop --hostfile $OAR_NODEFILE -n 16 --ssh-executable “oarsh” hello.py n Parameters: n --hostfile: path to the file contains all hostnames n --ssh-executable: the command to access nodes (here oarsh) n -n: the number of workers

Parallel Evaluations with SCOOP

from __future__ import print_function from scoop import futures import socket def helloWorld(value): return "Hello World from{0}".format(socket.gethostname()) if __name__ == "__main__": returnValues = list(futures.map(helloWorld, range(16))) print("\n".join(returnValues))

Hello.py

slide-12
SLIDE 12

Example

slide-13
SLIDE 13

n https://github.com/DEAP/deap n Rapid prototyping and testing of ideas n Parallelization mechanism based on SCOOP

n CMA-ES algorithm

DEAP library for Evolutionary Computation in Python

slide-14
SLIDE 14

EXPERIMENTS & VALIDATION

slide-15
SLIDE 15

Clustering results

slide-16
SLIDE 16

Bi-level Clustering

Enrichment analysis: hypergeometric test

N genes altogether (background) n genes in a cluster m genes in a GO term k genes in a cluster and in a GO term

𝑄 𝑌 = 𝑙 =

% & '(% )(& ' )

Adapted from: Florian Markowetz Network Biology Lent 2010

Enrichment analysis: hypergeometric test

A cluster represents a sample of n genes from a total population of N genes. It is know that the considered GO term contains m genes. What is the probability to have the same k genes in our cluster and in the considered GO term ?

slide-17
SLIDE 17

Bi-level Clustering

100 150 200 250 300 350 2 10 20 30 40 50 60 70 80 90

clusters unique_terms distance

01_net_go_ward 02_eu_go_ward 03_eu_net_ward 04_clusteringNETEU 05_clusteringEUNET 06_clusteringGOEU 07_clusteringEUGO 08_clusteringGONET 09_clusteringNETGO 10_expert

Enrichment of Disease Ontology terms p value cutoff 0.001

slide-18
SLIDE 18

Conclusions

n Knowledge extraction on the Parkinson Disease MAP n Bi-level clustering model n Solve the model with Hybrid and Parallel EA n Experiments required a lot of resources à UL HPC Platform

n Hybrid à CPLEX solver n Parallel à SCOOP library for parallel evaluations n Evolutionary Computation à DEAP library

slide-19
SLIDE 19

Questions ?

PS9 (13h30 – 15h30): Advanced Prototyping with python presented by Clement Parisot

Thank you for your attention