Bioinformatics Outline What is bioinformatics? Who are - - PowerPoint PPT Presentation

bioinformatics outline
SMART_READER_LITE
LIVE PREVIEW

Bioinformatics Outline What is bioinformatics? Who are - - PowerPoint PPT Presentation

Bioinformatics Outline What is bioinformatics? Who are bioinformaticians? Hardware Software What is bioinformatics? What is bioinformatics? Someone to analyze my data The boring stuff I do Someone to help me between experiments


slide-1
SLIDE 1

Bioinformatics Outline

  • What is bioinformatics?

– Who are bioinformaticians?

  • Hardware
  • Software
slide-2
SLIDE 2

What is bioinformatics?

slide-3
SLIDE 3

What is bioinformatics?

Someone to analyze my data Someone to help me think about my data A p e r s

  • n

w h

  • w

r i t e s c

  • m

p l e x a l g

  • r

i t h m s A p e r s

  • n

w h

  • k

n

  • w

s w h a t a n H M M i s That bloke who fixes my computer Someone who builds websites People sitting in a dark room analyzing data The boring stuff I do between experiments perl python R linux java C++ bash ruby HTML

slide-4
SLIDE 4

Who are bioinformaticians?

  • Scientists trying to get tenure, get grants,

publish papers, train students

  • Scientists trying to help others analyze their

data

slide-5
SLIDE 5

Who are bioinformaticians?

YOU!

slide-6
SLIDE 6

Hardware

slide-7
SLIDE 7

Torrent Server Recommended

  • Torrent Server

– Processors - Two Six-core processors – RAM - 48 GB RAM – HDD Capacity - Eight 2 TB Hard drives in RAID 5 with 12 TB usable – Network – Quad port gigabit NIC – GPU - NVIDIA Graphic Processor Unit – Chassis – Dell Precision T7500 tower. No rack mount available. – Monitor⁄Keyboard – not included – fjle access available via SSH or

web service

$12,500

slide-8
SLIDE 8

Computers

  • My cluster

– 51 node cluster – most nodes: 16 cpus, 8 cores each,132 GB RAM, 1TB

local storage (/usr/data), infjniband interconnects

– (6,528 cores; 6,732 GB RAM; 50 TB scratch storage)

  • 192 TB lustre FS

– connected to most nodes via infjniband

slide-9
SLIDE 9

Computers

  • rambox

– 24 processors with 6 cores each – 198 MB RAM

  • edwards.sdsu.edu

– lab web server – 24 processors, 6 cores each – 50M RAM – 19TB RAID 6 storage – 18TB USED

slide-10
SLIDE 10

Computers

  • fjle servers and back up servers

– 4 secret servers! – 48TB backups and archival storage

slide-11
SLIDE 11

Software

slide-12
SLIDE 12

Software

  • Locally installed software
  • Remote (web) software
slide-13
SLIDE 13

Local Software

  • groopm
  • idba_ud
  • jellyfjsh
  • jellyfjsh
  • last
  • masurca
  • mauve
  • metabat
  • metagenemark
  • mira
  • MUMmer
  • bioperl
  • biopython
  • bowtie2
  • cdhit
  • crass
  • diamond
  • fastQC
  • focus
  • FOCUS
  • FragGeneScan
  • genemark
  • Muscle
  • PEAR
  • phylip
  • prinseq
  • qiime
  • qudaich
  • rapsearch
  • scafgold_builder
  • seed-servers
  • spades
  • tagcleaner
  • tRNAscan-SE
  • velvet
slide-14
SLIDE 14

Metagenomics Processing

B i n n i n g r e a d s Contamination removal C

  • n

t i g C l u s t e r i n g F u n c t i

  • n

a l A s s i g n m e n t s G e n e P r e d i c t i

  • n

M e r g e p a i r e d

  • e

n d r e a d s P r e p r

  • c

e s s i n g Taxonomic assignments

slide-15
SLIDE 15

Metagenomics

  • Quality control –

Prinseq

  • Deconseq
  • Annotation

– FOCUS – Real time

metagenomics

– mg-rast – Super FOCUS

  • Statistics

– STAMP

  • Population genomes

– crAss – metabat – ContigClustering

slide-16
SLIDE 16

Metagenomics Processing

AbundanceBin CompostBin concoct crAss tetra Contig clustering FragGeneScan GlimmerMG MetaGeneAnnotator MetaGeneMark MetaGun Orphelia Prodigal Gene Prediction FASTQC FastX Toolkit fjtGCP NGS QC Toolkit Non-pareil Prinseq QC-Chain Streaming Trim Preprocessing CARMA myTaxa FOCUS PhylopythiaS KRAKEN phymmbl LMAT RAIphy MEGAN TACOA Metaplan Taxy Taxonomic assignment CLAMS Sequedex DiScRIBinATE SORT-ITEMS genometa SPANNER GSMer SPHINX PPLACER TaxSOM RTMg Treephyler Functional assignment