Information on Pairwise Sequence Alignment Programs
NOTE: If the computation takes more than a few minutes, we suggest that
the user obtain results via email by choosing the email option and
providing an email address. Otherwise, an error may occur due to
connection timeout.
FASTA Format:
The first line begins with the symbol '>' followed by the name of
the sequence. The sequence is on the remaining lines.
The sequence must not contain blanks.
The sequence could be in upper or lower case.
Below is an example sequence in FASTA format:
>DNA sequence
GCCCCCGGCCCCGCCCCGGCCCCGCCCCCGGCCCCGCCCCGCAAGGGTC
ACAGGTCACGGGGCGGGGCCGAGGCGGAAGCGCCCGCAGCCCGGTACCGG
CTCCTCCTGGGCTCCCTCTAGCGCCTTCCCCCCGGCCCGACTCCGCTGGT
CAGCGCCAAGTGACTTACGCCCCCGACCTCTGAGCCCGGACCGCTAGGCGA
GGAGGATCAGATCTCGCTCGAGAATCTGAAGGTGCCCTGGTCCTGGAGG
AGTTCCGTCCCAGCCCGCGGTCTCCCGGTACTGTCGGGCCCCGGCCCTCT
Loading a large sequence into the server:
The server allows the user to load a sequence into the server
by providing the name of the sequence file.
The server requires that the sequence file, its parent directory,
its grant parent directory, ... and the home directory
be all readable by the world.
One simple way for meeting this requirement is to move the sequence file
into the home directory and make both the file and the home directory
readable by the world.
To load the sequence file into the server,
start the Netscape at the home directory, click the "Browse" button,
and provide your file name.
SIM:
The SIM program finds k best non-intersecting local alignments
between two sequences. The two sequences must be of the same type,
that is, both are DNA sequences or both are protein sequences.
Using dynamic programming techniques, SIM is guaranteed to find optimal
alignments. The alignments are reported in order of similarity score,
with the highest scoring alignment first. The k best alignments share no
aligned pairs. SIM requires space proportional to the sum of the
input sequence lengths and the output alignment lengths. Thus
SIM can handle sequences of tens of thousands of base pairs.
SIM is described in the following papers:
Huang, X. and Miller, W. (1991)
A Time-Efficient, Linear-Space Local Similarity Algorithm.
Advances in Applied Mathematics 12, 337-357.
Huang, X., Hardison, R. C. and Miller, W. (1990)
A Space-Efficient Algorithm for Local Similarities.
Computer Applications in the Biosciences 6, 373-381.
GAP:
The GAP program computes an optimal global alignment of two sequences
without penalizing terminal gaps. A long gap in the shorter sequence
is given a constant penalty. The two sequences must be of the same type,
that is, both are DNA sequences or both are protein sequences.
GAP delivers the alignment in linear space, so long sequences can be aligned.
GAP is described in the following paper:
Huang, X. (1994)
On Global Sequence Alignment.
Computer Applications in the Biosciences 10, 227-235.
NAP:
The NAP program computes a global alignment of a DNA sequence
and a protein sequence without penalizing terminal gaps.
NAP handles frameshifts and long introns in the DNA sequence.
The program delivers the alignment in linear space, so long sequences can be aligned.
It makes use of splice site consensuses in alignment computation.
Both strands of the DNA sequence are compared with the protein sequence and
one of the two alignments with the larger score is reported.
NAP is described in the following paper:
Huang, X. and Zhang, J. (1996)
Methods for comparing a DNA sequence with a protein sequence,
Computer Applications in the Biosciences 12(6), 497-506.
LAP2:
The LAP2 program finds k best non-intersecting local alignments
between a DNA sequence and a protein sequence.
LAP2 handles frameshifts and long introns in the DNA sequence.
It delivers alignments in linear space, so long sequences can be aligned.
It makes use of splice site consensuses in alignment computation.
The reference information for LAP2:
Zhou, H., Joshi, C. P. and Huang, X. (1997).
A local alignment algorithm for comparing a DNA sequence with
a protein sequence, in preparation.
GAP2:
The GAP2 program computes an optimal global alignment of a genomic sequence
and a cDNA sequence without penalizing terminal gaps. A long gap in the cDNA
sequence is given a constant penalty. The GAP2 program makes use of
splice site consensuses in alignment computation.
GAP2 delivers the alignment in linear space, so long sequences can be aligned.
The reference information for GAP2:
Huang, X. (1994)
On Global Sequence Alignment.
Computer Applications in the Biosciences 10, 227-235.
Suggestions/Comments
Please contact Xiaoqiu Huang at xqhuang@cs.iastate.edu