1 9"* - - PDF document

1
SMART_READER_LITE
LIVE PREVIEW

1 9"* - - PDF document

*


slide-1
SLIDE 1

1

  • April 11, 2007

Rutgers !"!!# " $ # %&&' !&(!))!&'

  • *

" ##+ ,# ##"

*$-.$-/0

*12"

3"3

124 $-3 5$* 4 " ##+ ,# ##"

*$-.$-/0

*12"

3"3

124 $-3 5$* 4

  • *6

56

  • *
  • "

7#2 ##""

  • ,#88#

812

slide-2
SLIDE 2

2

  • 9"*

Users per day

300,000 200,000 100,000 400,000

1998 1999 2000 2001 2002 2003 2004

500,000 600,000

2005

Christmas and New Year’s Day

  • (.com, .net,

(.com, .net, .org,

.org,

. .gov gov, .us) , .us) 40% 40% Japan 6% Italy 4% Canada 3% Germany 3%

United Kingdom 3% Netherlands 2% Spain 2% Brazil 2% Sweden 1% Switzerland 1% Belgium1% Other Other 14% 14%

  • (.com, .net,

(.com, .net, .org,

.org,

. .gov gov, .us) , .us) 40% 40% Japan 6% Italy 4% Canada 3% Germany 3%

United Kingdom 3% Netherlands 2% Spain 2% Brazil 2% Sweden 1% Switzerland 1% Belgium1% Other Other 14% 14%

  • 9":; :78<%&&'
  • 9":; 678<%&&'
  • 63

all[filter]

slide-3
SLIDE 3

3

  • &) %) %&&'

&) %) %&&' = =) &' = =) &'

  • !"#!

$$

%#&'

()!*!("#!+!#&! "!$ !"#!

$$

%#&'

()!*!("#!+!#&! "!$

  • ftp://ftp.ncbi.nih.gov/genbank/

ftp://genbank.sdsc.edu/pub ftp://bio-mirror.net/biomirror/genbank Release 158 February 2007

87 x 106 Records 157 x 109 Nucleotides

263 Gb (non-WGS) 1115 files .39$/

  • full release every two months
  • incremental and cumulative updates daily
  • available only via ftp
  • release notes: gbrel.txt
  • Aug-97 Aug-98 Aug-99 Aug-00 Aug-01 Aug-02 Aug-03 Aug-04 Aug-05 Aug-06

20 40 60 80 100 120 140 160

Bases (billions)

*+

Non-WGS: 69.0 billion bases Non-WGS: 69.0 billion bases WGS: 81.6 billion bases WGS: 81.6 billion bases Release 157 Release 157 Doubling time 12-14 months

slide-4
SLIDE 4

4

  • , -
  • "

#

6 #"#+."0#/

7

7"./ ".1$*8$$8$*$/ ./

*""

77"<.77</ 15".15/7"

  • "

#

6 #"#+."0#/

7

7"./ ".1$*8$$8$*$/ ./

*""

77"<.77</ 15".15/7"

  • 7#

“Organismal”

(Traditional)

PRI (28) Primate ROD (15) Rodent PLN (20) Plant and Fungal BCT (18) Bacterial/Archeal INV (7) Invertebrate VRT (7) Other Vertebrate VRL (4) Viral MAM (2) Mammalian PHG (1) Phage SYN (1) Synthetic ENV (4) Envir. samples UNA (1) Unannotated

“Functional”

(Bulk)

EST (570) Expressed Sequence Tag GSS (197) Genome Survey Sequence HTG (88) High Throughput Genomic PAT (27) Patent STS (9) Sequence Tagged Site CON (1) Contigs, virtual

  • Organized by taxonomy (sort of)
  • Direct submissions (Sequin/Bankit)
  • Accurate (~1 error per 10,000 bp)
  • Well characterized
  • Organized by sequence type
  • Batch submissions (ftp/email)
  • Less accurate
  • Poorly characterized
  • ./7#
  • 14$-*

14$-* 14$-* 14$-*

)7

  • $#$-

$#$- $#$- $#$-

)7

  • 6*

6* 6* 6*

  • $-*$

$-*$ $-*$ $-*$

,3"

9$ 9$ 9$ 9$

  • 1$*7#

1$*7# 1$*7# 1$*7#1 1 1 14 4 4 4$ $ $ $-

  • *

* * *

  • RNA

gene products

  • 80-100,000 unique

cDNA clones in library

  • isolate unique clones
  • sequence once from

each end

make cDNA library

5’ 3’

>IMAGE:275615 3', mRNA sequence NNTCAAGTTTTATGATTTATTTAACTTGTGGAACAAAAATAAACCAGATTAACCACAACCATGCCTTA TTATCAAATGTATAAGANGTAAATATGAATCTTATATGACAAAATGTTTCATTCATTATAACAAATTT AATAATCCTGTCAATNATATTTCTAAATTTTCCCCCAAATTCTAAGCAGAGTATGTAAATTGGAAGTT CTTATGCACGCTTAACTATCTTAACAAGCTTTGAGTGCAAGAGATTGANGAGTTCAAATCTGACCAAG GTTGATGTTGGATAAGAGAATTCTCTGCTCCCCACCTCTANGTTGCCAGCCCTC >IMAGE:275615 5' mRNA sequence GACAGCATTCGGGCCGAGATGTCTCGCTCCGTGGCCTTAGCTGTGCTCGCGCTACTCTCTCTTTCTGG TGGAGGTATCCAGCGTACTCCAAAGATTCAGGTTTACTCACGTCATCCAGCAGAGAATGGAAAGTCAA TTCCTGAATTGCTATGTGTCTGGGTTTCATCCATCCGACATTGAAGTTGACTTACTGAAGAATGGAGA GAATTGAAAAAGTGGAGCATTCAGACTTGTCTTTCAGCAAGGACTGGTCTTTCTATCTCTTGTACTAC TGAATTCACCCCCACTGAAAAAGATGAGTATGCCTGCCGTGTTGAACCATGTNGACTTTGTCACAGNC AAGTTNAGTTTAAGTGGGNATCGAGACATGTAAGGCAGGCATCATGGGAGGTTTTGAAGNATGCCGCN TTGGATTGGGATGAATTCCAAATTTCTGGTTTGCTTGNTTTTTTAATATTGGATATGCTTTTG

slide-5
SLIDE 5

5

  • $-1$*
  • .
  • .
  • $$86*89$
  • Whole BAC insert (or genome)
  • $$#

#

7-.6*#/

"

+ ".+0/

  • /.

> > -.,=/# # > > -.,=/# #

  • ,?@7"#?,@

0$01

slide-6
SLIDE 6

6

  • 9$,0

ABC0

.=%&/ 1#-.)(/ .B/ 1.)(&/8

8887.%/8286 , .%/ 6"88.=/8$+ .%/ D.B/8 .%/ .%/

ABC0

.=%&/ 1#-.)(/ .B/ 1.)(&/8

8887.%/8286 , .%/ 6"88.=/8$+ .%/ D.B/8 .%/ .%/

  • 9$.9$/,0

wgs master[properties]

ftp://ftp.ncbi.nih.gov/genbank/wgs/

  • *

" ##+ ,# ##"

*$-.$-/0

*12"

3"3

124 $-3 5$* 4 " ##+ ,# ##"

*$-.$-/0

*12"

3"3

124 $-3 5$* 4

  • 7## 7"
  • !!""#

$%

"&'$( %)%*

  • +, & +$'
  • ,'. . + .$

ATT GA ATT C GA C GA C C C ATT TA A C T

" %)',

$-

slide-7
SLIDE 7

7

  • 9$-E

12- ?@7?@

  • 9$-E

12- ?@7?@

  • ?@7?@7?@

=F%'" ()C&" =F%'" %=%=" %A)" ?@7?@7?@

  • > 3G"#

>- " >8"

slide-8
SLIDE 8

8

  • %#0234567'
  • %8#02345678#,0'
  • %#90234567:;<'
  • %#90234567:;<'

%=90234567:;<' %=0234567:;<'

  • $-$-
  • $-*

$-* $-* $-* NM_123456789

  • NP_123456789

8H NR_123456 3 XM_123456

  • XP_123456
  • XR_123456

3 ZP_12345678 IH NC_123456 8!!8 NG_123455 8 NT_123456 8" NW_123456 89$" NZ_ABCD12345678 , 9$

  • Genomic DNA

Genomic DNA ( (NC NC, , NT NT, , NW NW) ) Model mRNA Model mRNA (XM) (XM)

(XR) (XR)

Curated mRNA Curated mRNA (NM) (NM)

(NR) (NR)

Model protein Model protein (XP) (XP)

,

Curated Protein Curated Protein (NP) (NP)

Scanning....

Genbank Sequences RefSeq

  • H

# 7

  • 5

#) #% #=

slide-9
SLIDE 9

9

  • *

" ##+ ,# ##"

*$-.$-/0

*12"

3"3

124 $-3 5$* 4 " ##+ ,# ##"

*$-.$-/0

*12"

3"3

124 $-3 5$* 4

  • 2
  • GENSAT

*12$

Entrez

Nucleotide PubMed Protein Structure Domains 3D Domains Taxonomy Journals PMC OMIM Books PopSet SNP UniGene UniSTS Genome Gene GEO MeSH CancerChromosomes Homologene PubChem Probe

  • 127"

7"2

".*47"/

.*47"/ .*47"/ .*47"/!

1:7!

J-K

14"!

?@8?@8?@8

1#7$!

.7$/

1

":7 "!

slide-10
SLIDE 10

10

  • 125

Links

  • L+"$,
  • 127"

:

1$*8

"$,

$, M

77

#77"

.N ON/ .,8$*87/ :

1$*8

"$,

$, M

77

#77"

.N ON/ .,8$*87/

  • Gene-oriented clusters of expressed sequences

>

> 1- > " > > :#

  • :
slide-11
SLIDE 11

11

  • 1$*

mRNA query 5’ EST hits 3’ EST hits

  • :
  • :

$ : 1

  • : 6")F(.+P%&)/
slide-12
SLIDE 12

12

  • : 6!ACAFB&

583#.5,1/

  • : 6!ACAFB&
  • : 6!FC=C)4
  • : 6!FC=C)4
slide-13
SLIDE 13

13

  • : 6!FC=C)-
  • $-

web page web page

!"!!# : 6H !"!!# : 6H

  • 127"

:

1$*8

"$,

$, M

77

#77"

.N ON/ .,8$*87/ :

1$*8

"$,

$, M

77

#77"

.N ON/ .,8$*87/

  • $,7"

,##.$,/

$ 3

N#=&$,.QQQQQQQ/ ,##.$,/

$ 3

N#=&$,.QQQQQQQ/

slide-14
SLIDE 14

14

  • $"$,
  • $,

$"$,

  • $,

$"$,

  • $,
slide-15
SLIDE 15

15

  • 127"

:

1$*8

"$,

$, M

77

#77"

.N ON/ .,8$*87/ :

1$*8

"$,

$, M

77

#77"

.N ON/ .,8$*87/

  • #77"

,3.,$$/ $$*8,8N8ON8

.3 /

,3.,$$/ $$*8,8N8ON8

.3 /

  • 77
  • ".

R

slide-16
SLIDE 16

16

  • 77

ST(CC(F()BT"T$A'A=(!)T*,'?$ 4@ LDU,65*L11OOUOL,OOU,OD5O5715O,LO$$1$UU$,$ $*7$OL*5*76$L$1$5$*56DL$$LL$5UO$OD$L*,15 OO1$,UDL$*$1L1$*$$,$$$$UO,5LL$U,5*UL*L*$ LU$1L$OOLO$UL$5*L1D7,55*$,151

">

  • 77

"- 7 "- 7

  • 77

">*

  • 77

Pfam COG CD

?(

slide-17
SLIDE 17

17

  • 77

*

  • 77

7*#7#*

  • 77

7*#7#*

  • $+$

77

slide-18
SLIDE 18

18

  • $; =7
  • *

" ##+ ,# ##"

*$-.$-/0

$12"

3"3

124 $ $-3 5$* 4 " ##+ ,# ##"

*$-.$-/0

$12"

3"3

124 $ $-3 5$* 4

  • 55
  • NNO$7"
slide-19
SLIDE 19

19

  • @"/
  • NNO$7"
  • NNO$7"
  • M
slide-20
SLIDE 20

20