Linux for Biology DEDAN GITHAE, BIOINFORMATICIAN BECA-ILRI HUB - - PowerPoint PPT Presentation

linux for biology
SMART_READER_LITE
LIVE PREVIEW

Linux for Biology DEDAN GITHAE, BIOINFORMATICIAN BECA-ILRI HUB - - PowerPoint PPT Presentation

Linux for Biology DEDAN GITHAE, BIOINFORMATICIAN BECA-ILRI HUB Impo Importanc nce o e of c f comput mputer ers t s to bi biology Availability of vast research data shared online. Automated analysis leading to generation of massive


slide-1
SLIDE 1

Linux for Biology

DEDAN GITHAE, BIOINFORMATICIAN BECA-ILRI HUB

slide-2
SLIDE 2

Impo Importanc nce o e of c f comput mputer ers t s to bi biology

û Availability of vast research data shared online. û Automated analysis leading to generation of massive data û Interaction with other research communities and shared databases û Speed and efficiency in processing, storage and data mining

slide-3
SLIDE 3

BIG BIG Da Data: V : Volume me, V , Vari riety ty, V , Velocity ty & & Ve Veracity

Volume:

  • More content already generated and
  • is available over open access
  • More content being generated per run
  • as a result of technology advancement
  • Costs cheaper over time
slide-4
SLIDE 4

Velocity:

  • Technology making data generation faster and higher efficiency

Variety

  • Sequences, annotation, structures, image processing

Veracity

  • Some ambiguities, Inconsistencies, incomplete, model approximations
slide-5
SLIDE 5

Ot Other er computational task sks: s: Analysi sis s and interp erpretation

Biology activities:

  • Prediction – functional and structural
  • Pattern recognition: Domains, homology
  • Sequence alignments
  • Statistical analysis
  • Structural modelling
  • Genetic diversity and interactions between organisms, between populations
slide-6
SLIDE 6

Lin Linux

slide-7
SLIDE 7

Wha hat i is s lin linux

a family

  • of free and open-source software
  • operating system
  • distributions built around the Linux kernel.
slide-8
SLIDE 8

Wha hat i is s lin linux

a family

Ubuntu? Fedora? Mint? Debian? openSUSE?

  • of free

anyone is freely licensed to use, copy, study, and change the software in any way

  • and open-source software

the source code is openly shared so that people are encouraged to voluntarily improve the design of the software

  • operating system

system software that manages computer hardware and software resources and provides common services for computer programs.

  • distributions built around the Linux kernel.

part of the operating system that mediates access to system resources eg input/output requests from software, translating them into data-processing instructions for the central processing unit

slide-9
SLIDE 9

Ke Kernel

slide-10
SLIDE 10

Som Some ap applic lication ions t to b

  • biologic

iological t al tas asks

Repetitive tasks – processing several sequences Automating analysis processes – scripts / piping to programs Text processing Regex; grep; sed;

  • extracting fields using cut / awk
  • We’ll see more of this on the tutorial
slide-11
SLIDE 11

Th The I ILRI RI H High gh P Perfor

  • rmance Com

Computing (H g (HPC) Cl C) Cluster

slide-12
SLIDE 12

Th The I ILRI RI H High gh P Perfor

  • rmance Com

Computing (H g (HPC) Cl C) Cluster

users log into HPC (the master) To log in: ssh userX@hpc.ilri.cgiar.org

then “jump”to the rest of the cluster (computing servers).

To do this, type interactive

slide-13
SLIDE 13

Soft Softwar ares:

To know whether a software, and version you need to use is installed, type module avail To use a software, eg BLAST, type module load blast To see what softwares are ready for use (loaded), type module list

slide-14
SLIDE 14

SL SLURM: M: Si Simple Linux Utility for r Reso source ce Ma Managem emen ent

Interactive jobs have a time limit of 8 hours. if you are running a longer job, write a batch script to schedule it. How do we write scripts?

slide-15
SLIDE 15

Writing a Slurm script

  • Available options, type

sbatch –u [ man sbatch for detailed explanation of usage ]

slide-16
SLIDE 16

Ex Exampl ple of a ba batch h scri ript

#!/usr/bin/env bash #SBATCH -p batch #SBATCH -J blastn #SBATCH -n 4 # load the blast module module load blast/2.6.0+ # run the blast with 4 CPU threads (cores) blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt

To Run the script, type sbatch [ scriptname.sbatch ]

slide-17
SLIDE 17

Be Best practice; overview

Run the job on the computing node interactive Make a directory in the scratch space; and “go” there mkdir –p /var/scratch/userX ; cd $_ Create the script Run the script sbatch [scriptname.sbatch]

slide-18
SLIDE 18

Enjoy!