SLIDE 1 Analysis of DNA Sequence Data
Using Freeware Programs: Sequencher 4.9 Sequence Scanner 1.0
Daniel Williams willida@shelterisland.k12.ny.us
SLIDE 2 Its possible that you have received your sequence data directly from the Sequencing Lab in Word Pad FastA format. FastA is a standard gene sequence format that allows certain gene analysis programs to recognize the gene name and the gene sequence.
Everything that follows the ‘>’ is considered the gene or sequence name. The Next line is recognized as a DNA sequence.
You may also wish to receive the trace files for your own analysis.
SLIDE 3
Cold Spring Harbor
If you send your DNA to CSHL to be sequenced it is published on their server.
SLIDE 4 Cold Spring Harbor
Your sequence data is published on their database. You can either highlight the text of the sequence and copy it into Word or Word Pad and reformat into FastA or You can download a copy of the Sequence Trace File for your own analysis.
Use the Dropdown menu to select different samples Right Click and choose “Save As” to save a copy of your Trace File
SLIDE 5
Sequencher 4.9 Demo
http://www.genecodes.com/
You can download a unlimited Freeware Demo Version that has all the functionality of the software used in the lab except you can not save or print.
SLIDE 6
Open Sequencher
Open Sequencher 4.9 Demo Software A Dialog Box will open alerting you to the fact that this is a “Demo Version” Press “OK” You will now have an empty project window.
SLIDE 7
Import Sequence Data
File Menu –choose “Import” and from list choose “Folder of Sequences”. Select the folder that contains your DNA data. A Dialog Box will open prompting you import ## files. Select the “Import All Files in Folder” command button.
SLIDE 8 First Check Your Data
The imported DNA sequences are listed by name. Important Fields:
At Brookhaven Mike Blewitt Names all sequences so that they can be easily combined and examined.
Indicates the number of bases in your sequence.
IMPORTANT FIELD indicates how reliable your data is. You want to have sequences as close to 90% as possible.
If you have received analyzed sequence files
- ften comments are inserted to highlight
irregularities
SLIDE 9
Data Clean Up…
Delete any sequences that have extremely low quality numbers (ie. 5%) or if the sequenced fragment is too small for analysis. Sort Data by clicking on a Column Heading Select Sequences by Left Click (You can Hold the Shift Key for Multiple Sequences) Delete Sequences Right Click to obtain drop down menu. Choose “Remove From Project” Sequencher will ask you if you are sure you wish to delete –Select “Throw Them Away”
SLIDE 10 What Data Should Look Like.
A Chromatogram is a graphical representation of the results of the sequence reaction. You should see evenly-spaced peaks, each with only one
- color. Peak heights may vary 3-fold, which is normal.
"Noise" (baseline) peaks may be present, but with good template and primer they will be quite minimal.
SLIDE 11
Trim Sequences
Usually the sequence reaction does not work well (Poor Quality) on either extreme end of your DNA sequence. Therefore your should “Trim” the ends to exclude them from your analysis.
SLIDE 12 Trimming and Aligning Raw DNA
- 1. Select all of the imported sequences. From the Select menu choose
“Select All”
- 2. Select the “Sequence” menu. From the menu list choose “Trim”.
- 3. A window will open showing all of your sequences with a blue line
indicating “good bases” and red lines indicating “poor bases”. 4. Press the “Trim Checked Items” box on the icon bar above if you agree with the trimming. If not you must change the trimming parameters. 5. The window will show the trimmed sequences in blue. 6. After reviewing the sequences you can close the window by pressing the ‘X’ in the upper right corner
SLIDE 13
Establish a Reference Sequence
A reference sequence is like a tie breaker. When you are analyzing a chromatogram sometimes the base appears ambiguous, if you have a well known reference sequence it can help determine what the ambiguous base was supposed to be. Select the sequence you wish to establish as a Reference Sequence. Right Click choose “Reference Sequence” from the drop down menu.
SLIDE 14
Assemble Sequences for Analysis
As long as you use a consistent naming scheme with your sequencing reactions, you should be able to use the Assemble by Name function to assemble your fragments.
SLIDE 15 Naming
An example of a naming scheme uses “Dragonfly-Sample-12S_Ai_01” Sample indicates the organism that the DNA came from. 12S_Ai indicates the forward primer 12sai “01” indicates the well number. There should be a corresponding 12S_Bi_01 for the same DNA with a reverse primer. Therefore we want all the “_01” names to be combined –forward sequence and reverse.
SLIDE 16 Assemble By Name
- 1. Select all of your sequences press Select Menu choose
“Select All”
- 2. Press the “ABN” icon on the tool bar –ABN Assemble By
- Name. The icon toolbar will change.
- 3. Press Auto Assemble By Name –a preview dialog box
will appear. Scroll through the expected ‘contigs’ to ensure everything paired up correctly.
- 4. Press the Assemble button.
If your sequences are named correctly and they have pairs, they will be assembled into “contigs”
SLIDE 17
View Contig Assembly
Double-click on the Contig[0001] icon to open the Contig Overview window To view Restriction Enzyme Sites Select the View Menu choose Bases Map Overview and select Restriction Map
SLIDE 18
Contig Overview window
The Overview contains three sections. The top section displays a schematic of how the fragments are assembled in this contig. The arrows indicate the direction of the fragment in relation to the assembly.
SLIDE 19
Contig Overview window
The next section provides coverage information.
SLIDE 20
Contig Overview window
Below the coverage bar is the open reading frame map. Three bars marked with green flags and red lines, representing start and stop codons respectively. Press the Bases icon to edit/view the base sequence
SLIDE 21
Edit Assembled Chromatograms
The Contig Editor provides the tools for checking and editing sequences. It is divided into four quadrants. To begin the editing process, move your selection to base one in the consensus.
SLIDE 22
Edit Chromatograms
Press the show Chromatogram Icon. Chromatograms will appear. If you see only one chromatogram, that is because you have not select a region of the Contig that relates to three chromatograms. To examine a different base YOU MUST SELECT A BASE IN THE CONSENSUS SEQUENCE
SLIDE 23
Errors in Base Calling
Mis-spaced peaks:
One good way to detect artifacts or errors in a sequencing chromatogram is to scan through it, looking for mis-spaced peaks. At the same time, watch for mis- spaced letters in the text sequence along the top. Sometimes, however, those spaces get mis-interpreted as missing nucleotides and an ‘N’ is inserted.
SLIDE 24
Heterozygous (double) peaks
A single peak position within a trace may have but two peaks of different colors instead of just one. Rule of thumb appears to be if the second peak is 35% or less than the major peak you can call the base for the major peak.
SLIDE 25 https://www2.appliedbiosystems.com/support/software_community/free_ab_s
Sequence Scanner
You can SAVE and PRINT
SLIDE 26 To Copy Gene Sequence
Go to the Trace Menu
Select Copy Basecalls Paste into WordPad Edit to FastA
SLIDE 27 Have a Blast!
Select Nucleotide Blast to examine your sequence
SLIDE 28 Enter your FastA Sequence
Press the Blast Icon at the bottom
you wish to execute your search
SLIDE 29
Compare Organisms
Scroll down page to make a phylogenetic tree
SLIDE 30