linux for biology
play

Linux for Biology DEDAN GITHAE, BIOINFORMATICIAN BECA-ILRI HUB - PowerPoint PPT Presentation

Linux for Biology DEDAN GITHAE, BIOINFORMATICIAN BECA-ILRI HUB Impo Importanc nce o e of c f comput mputer ers t s to bi biology Availability of vast research data shared online. Automated analysis leading to generation of massive


  1. Linux for Biology DEDAN GITHAE, BIOINFORMATICIAN BECA-ILRI HUB

  2. Impo Importanc nce o e of c f comput mputer ers t s to bi biology û Availability of vast research data shared online. û Automated analysis leading to generation of massive data û Interaction with other research communities and shared databases û Speed and efficiency in processing, storage and data mining

  3. BIG BIG Da Data: V : Volume me, V , Vari riety ty, V , Velocity ty & & Ve Veracity Volume: ◦ More content already generated and ◦ is available over open access ◦ More content being generated per run ◦ as a result of technology advancement ◦ Costs cheaper over time

  4. Velocity: ◦ Technology making data generation faster and higher efficiency Variety ◦ Sequences, annotation, structures, image processing Veracity ◦ Some ambiguities, Inconsistencies, incomplete, model approximations

  5. Ot Other er computational task sks: s: Analysi sis s and interp erpretation Biology activities: ◦ Prediction – functional and structural ◦ Pattern recognition: Domains, homology ◦ Sequence alignments ◦ Statistical analysis ◦ Structural modelling ◦ Genetic diversity and interactions between organisms, between populations

  6. Lin Linux

  7. Wha hat i is s lin linux a family ◦ of free and open-source software ◦ operating system ◦ distributions built around the Linux kernel.

  8. Wha hat i is s lin linux a family Ubuntu? Fedora? Mint? Debian? openSUSE? ◦ of free anyone is freely licensed to use, copy, study, and change the software in any way ◦ and open-source software the source code is openly shared so that people are encouraged to voluntarily improve the design of the software ◦ operating system system software that manages computer hardware and software resources and provides common services for computer programs. ◦ distributions built around the Linux kernel. part of the operating system that mediates access to system resources eg input/output requests from software, translating them into data-processing instructions for the central processing unit

  9. Ke Kernel

  10. Som Some ap applic lication ions t to b o biologic iological t al tas asks Repetitive tasks – processing several sequences Automating analysis processes – scripts / piping to programs Text processing Regex; grep; sed; ◦ extracting fields using cut / awk ◦ We’ll see more of this on the tutorial

  11. Th The I ILRI RI H High gh P Perfor ormance Com Computing (H g (HPC) Cl C) Cluster

  12. Th The I ILRI RI H High gh P Perfor ormance Com Computing (H g (HPC) Cl C) Cluster users log into HPC (the master) To log in: ssh userX@hpc.ilri.cgiar.org then “jump”to the rest of the cluster (computing servers). To do this, type interactive

  13. Soft Softwar ares: To know whether a software, and version you need to use is installed, type module avail To use a software, eg BLAST, type module load blast To see what softwares are ready for use (loaded), type module list

  14. SL SLURM: M: Si Simple Linux Utility for r Reso source ce Ma Managem emen ent Interactive jobs have a time limit of 8 hours. if you are running a longer job, write a batch script to schedule it. How do we write scripts?

  15. Writing a Slurm script ◦ Available options, type sbatch –u [ man sbatch for detailed explanation of usage ]

  16. Ex Exampl ple of a ba batch h scri ript #!/usr/bin/env bash #SBATCH -p batch #SBATCH -J blastn #SBATCH -n 4 # load the blast module module load blast/2.6.0+ # run the blast with 4 CPU threads (cores) blastn -query ~/data/sequences/drosoph_14_sequences.seq -db nt To Run the script, type sbatch [ scriptname.sbatch ]

  17. Be Best practice; overview Run the job on the computing node interactive Make a directory in the scratch space; and “go” there mkdir –p /var/scratch/userX ; cd $_ Create the script Run the script sbatch [scriptname.sbatch]

  18. Enjoy!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend