Linux for Biology
DEDAN GITHAE, BIOINFORMATICIAN BECA-ILRI HUB
Linux for Biology DEDAN GITHAE, BIOINFORMATICIAN BECA-ILRI HUB - - PowerPoint PPT Presentation
Linux for Biology DEDAN GITHAE, BIOINFORMATICIAN BECA-ILRI HUB Impo Importanc nce o e of c f comput mputer ers t s to bi biology Availability of vast research data shared online. Automated analysis leading to generation of massive
DEDAN GITHAE, BIOINFORMATICIAN BECA-ILRI HUB
û Availability of vast research data shared online. û Automated analysis leading to generation of massive data û Interaction with other research communities and shared databases û Speed and efficiency in processing, storage and data mining
BIG BIG Da Data: V : Volume me, V , Vari riety ty, V , Velocity ty & & Ve Veracity
Volume:
Velocity:
Variety
Veracity
Ot Other er computational task sks: s: Analysi sis s and interp erpretation
Biology activities:
a family
a family
Ubuntu? Fedora? Mint? Debian? openSUSE?
anyone is freely licensed to use, copy, study, and change the software in any way
the source code is openly shared so that people are encouraged to voluntarily improve the design of the software
system software that manages computer hardware and software resources and provides common services for computer programs.
part of the operating system that mediates access to system resources eg input/output requests from software, translating them into data-processing instructions for the central processing unit
Repetitive tasks – processing several sequences Automating analysis processes – scripts / piping to programs Text processing Regex; grep; sed;
Th The I ILRI RI H High gh P Perfor
Computing (H g (HPC) Cl C) Cluster
Th The I ILRI RI H High gh P Perfor
Computing (H g (HPC) Cl C) Cluster
users log into HPC (the master) To log in: ssh userX@hpc.ilri.cgiar.org
then “jump”to the rest of the cluster (computing servers).
To do this, type interactive
To know whether a software, and version you need to use is installed, type module avail To use a software, eg BLAST, type module load blast To see what softwares are ready for use (loaded), type module list
SL SLURM: M: Si Simple Linux Utility for r Reso source ce Ma Managem emen ent
Interactive jobs have a time limit of 8 hours. if you are running a longer job, write a batch script to schedule it. How do we write scripts?
Writing a Slurm script
sbatch –u [ man sbatch for detailed explanation of usage ]
Ex Exampl ple of a ba batch h scri ript
#!/usr/bin/env bash #SBATCH -p batch #SBATCH -J blastn #SBATCH -n 4 # load the blast module module load blast/2.6.0+ # run the blast with 4 CPU threads (cores) blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt
To Run the script, type sbatch [ scriptname.sbatch ]
Run the job on the computing node interactive Make a directory in the scratch space; and “go” there mkdir –p /var/scratch/userX ; cd $_ Create the script Run the script sbatch [scriptname.sbatch]