Finding CpG Islands using MAVG |
||
MAVG programMAVG is a software tool for finding K non-overlapping maximum-average segments of length at least L in a given sequence of numbers, for any K > 0 and L > 0 (Lin et al, 2002). CpG islands of a genomic sequence (Gardiner-Garden and Frommer, 1987) are computed with the MAVG program as follows. The input genomic sequence is converted into a sequence of numbers using a dinucleotide table (Durbin et al., 1998). The table, for each of the 16 different dinucleotides, gives the log likelihood ratio of the frequencies of the dinucleotide in CpG islands and in non-CpG regions. The average score of a segment of the number sequence is the sum of the numbers in the segment divided by the length of the segment. Then the MAVG program is used on the number sequence. . Input to MAVGInput sequences file formatMAVG takes as input a file of sequence reads in FASTA format.FASTA Format:The first line begins with the symbol '>' followed by the name of the sequence. The sequence is on the remaining lines. The sequence must not contain blanks. The sequence could be in upper or lower case. Below is an example sequence in FASTA format:>DNA sequence GCCCCCGGCCCCGCCCCGGCCCCGCCCCCGGCCCCGCCCCGCAAGGGTC ACAGGTCACGGGGCGGGGCCGAGGCGGAAGCGCCCGCAGCCCGGTACCG CTCCTCCTGGGCTCCCTCTAGCGCCTTCCCCCCGGCCCGACTCCGCTGG CAGCGCCAAGTGACTTACGCCCCCGACCTCTGAGCCCGGACCGCTAGGC GGAGGATCAGATCTCGCTCGAGAATCTGAAGGTGCCCTGGTCCTGGAGG AGTTCCGTCCCAGCCCGCGGTCTCCCGGTACTGTCGGGCCCCGGCCCTC Parameters of MAVGThe parameter K in MAVG should be set sufficiently large so that the K best regions reported by MAVG contain at least one region of average score less than the cutoff. This guarantees that no region with average score above the cutoff is missed. The parameter L should be set to the minimum length of CpG islands. This MAVG web server was constructed by Liang Ye. ReferencesDurbin, R., Eddy, S., Krogh, A., and
Mitchison, G. (1998) |
||