Environmental Genomics Thematic Programme Data Centre
http://envgen.nox.ac.uk
EGTDC Database Course 2004 Biological Databases Online (practical) - - PowerPoint PPT Presentation
EGTDC Database Course 2004 Biological Databases Online (practical) Tim Booth : tbooth@ceh.ac.uk Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk Introduction To EnsEMBL EnsEMBL is a project to develop a software
Environmental Genomics Thematic Programme Data Centre
http://envgen.nox.ac.uk
Environmental Genomics Thematic Programme Data Centre
http://envgen.nox.ac.uk
Environmental Genomics Thematic Programme Data Centre
http://envgen.nox.ac.uk
Environmental Genomics Thematic Programme Data Centre
http://envgen.nox.ac.uk
...you should now be logged into EnsEMBL...
Environmental Genomics Thematic Programme Data Centre
http://envgen.nox.ac.uk
1 int
int cmp_end int cmp_start int asm_end int asm_start int cmp_seq_region_id int asm_seq_region_id
assembly
int coord_system_id int length varchar name int seq_region_id
seq_region
varchar value int attrib_type_id int seq_region_id
seq_region_attrib
varchar name text description varchar code int attrib_type_id
attrib_type
mediumtext sequence int seq_region_id
dna
mediumblob sequence text n_line int seq_region_id
dnac
“default_version”, “sequence_level” attrib varchar version int rank varchar name int coord_system_id
coord_system
0..1 0..n 0..1 0..n 1 1 0..1 0..1 1 1…n 1 0..n 1 0..n
Environmental Genomics Thematic Programme Data Centre
http://envgen.nox.ac.uk
tinyint phase tinyint end_phase int seq_region_id int seq_region_start int seq_region_end tinyint seq_region_strand int exon_id
exon
varchar stable_id int version int exon_id
exon_stable_id
int transcript_id int rank int exon_id
exon_transcript
varchar type int analysis_id int display_xref_id int seq_region_id int seq_region_start int seq_region_end tinyint seq_region_strand int gene_id
gene
varchar stable_id int version int gene_id
gene_stable_id
int gene_id int display_xref_id int seq_region_id int seq_region_start int seq_region_end tinyint seq_region_strand int transcript_id
transcript
varchar stable_id int version int transcript_id
transcript_stable_id
varchar stable_id int version int translation_id
translation_stable_id
int transcript_id int seq_start int start_exon_id int seq_end int end_exon_id int translation_id
translation
text description int gene_id
gene_description
1 0..1 0..1 0..1 1 1 1 0..n 0..n 1 0..1 1..n 1 1 0..1 1 1 0..n 0..n 1 0..1
Environmental Genomics Thematic Programme Data Centre
http://envgen.nox.ac.uk
int ensembl_id “Translation”, “Gene”, “Transcript” ensembl_object_type int xref_id int
double score double evalue int analysis_id int target_identity int hit_start int hit_end int translation_start int translation_end text cigar_line int query_identity int
identity_xref
varchar description varchar dbprimary_acc int xref_id varchar version varchar display_label int external_db_id
xref
varchar release “KNOWN”, “PRED”, “ORTH”,… status varchar dbname int external_db_id
external_db
“IC”,”IDA”,”IEA”,”IEP”,”I GI”,”IMP,”IPI”,… linkage_type int
go_xref
varchar synonym int xref_id
external_synonym
1 1..n 1 0..n 1 0..1 1 0..1 1 1..n
Environmental Genomics Thematic Programme Data Centre
http://envgen.nox.ac.uk
select gene.seq_region_id, gene_stable_id.stable_id, transcript_stable_id.stable_id, xref.display_label from external_db inner join xref on external_db.external_db_id = xref.external_db_id inner join object_xref on xref.xref_id = object_xref.xref_id inner join translation on object_xref.ensembl_id = translation.translation_id inner join transcript on translation.transcript_id = transcript.transcript_id inner join gene on gene.gene_id = transcript.gene_id inner join seq_region on gene.seq_region_id = seq_region.seq_region_id inner join coord_system on seq_region.coord_system_id = coord_system.coord_system_id left outer join gene_stable_id on gene_stable_id.gene_id = gene.gene_id left outer join transcript_stable_id on transcript.transcript_id = transcript_stable_id.transcript_id where coord_system.name = 'chromosome' and seq_region.name = '1' and gene.seq_region_end < 30000000 and external_db.db_name = 'SWISSPROT';
Environmental Genomics Thematic Programme Data Centre
http://envgen.nox.ac.uk
Environmental Genomics Thematic Programme Data Centre
http://envgen.nox.ac.uk
–
show tables
–
select * from baculo_genomes
–
describe repeats
–
select repeat_name, motif, genome from repeats where repeat_type = 'penta'
–
select organism, genome, count(*) as count from repeats where repeat_type = 'penta'
Environmental Genomics Thematic Programme Data Centre
http://envgen.nox.ac.uk
Environmental Genomics Thematic Programme Data Centre
http://envgen.nox.ac.uk
http://www.ensembl.org/Docs/wiki/html/EnsemblDocs/EnsemblCore.html