Copy from web.archive.org.
CigarFormat
CigarFormat is one of the output formats which can be generated
by Exonerate
It is also used in the feature tables in the Ensembl database, but in an altered form.
It is designed to contain the minimal information necessary for the reconstruction of an alignment. One alignment is described per line, to allow easy manipulation with UNIX tools.
Cigar is an acronym for Concise Idiosyncratic Gapped Alignment Report. (There is also the related Sugar - Simple Ungapped Alignment Report, and Vulgar - Verbose Ugly Labelled Gapped Alignment Report :)
Cigar format looks like this:
cigar: hs989235.cds 5 468 + hsnfg9.embl 25689 27450 + 1916 M 13 I 1 M 35 I 1 M 4 I 1 M 13 D 1 M 4 I 1 M 115 D 404 M 37 D 1 M 164 I 1 M 12 D 898 M 16 I 1 M 12 I 1 M 21 D 1 M 10
The fields are as follows:
The remaining fields are in pairs, describing the edit path throught the alignment. These contain a M,I,D or N corresponding to a Match, Insert, Delete or iNtron, followed by the length.
Below is an the alignment corresponding to the cigar line show above:
C4 Alignment display: Model: est2genome Raw score: 1916 Aligned positions 5->468 of query Aligned positions 25689->27450 of target Query: hs989235.cds Target: hsnfg9.embl 6 : AAGCTCANCTTGGACCACCGACTCTCGANTGNNTCGCCGCGGGAGCCGGNTGGANAACCT : 64 ||||||| ||||| ||||||||||||| || ||||||||||||||| |||| ||||| 25690 : AAGCTCATCTTGG-CCACCGACTCTCGCTTGCGCCGCCGCGGGAGCCGG-TGGA-AACCT : 25745 65 : GAGCGGGA-CTGGNAGAAGGAGCAGAGGGAGGCAGCACCCGGCGTGACGGNAGTGTGTGG : 123 |||||||| |||| |||||||||||||||||||||||||||||||||||| ||||||||| 25746 : GAGCGGGAGCTGG-AGAAGGAGCAGAGGGAGGCAGCACCCGGCGTGACGGGAGTGTGTGG : 25804 124 : GGCACTCAGGCCTTCCGCAGTGTCATCTGCCACACGGAAGGCACGGCCACGGGCAGGGGG : 183 |||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| 25805 : GGCACTCAGGCCTTCCGCAGTGTCATCTGCCACACGGAAGGCACGGCCACGGGCCAGGGG : 25864 184 : GTCTATGAT <<<< Intron 1 <<<< CTTCTGCATGCCCAGCTGGCATGGCCCCA : 221 ||||||||| 404 bp ||||||||||||||||||||||||||||| 25865 : GTCTATGATct..................acCTTCTGCATGCCCAGCTGGCATGGCCCCA : 26306 222 : CGTAGAGT-GGNNTGGCGTCTCGGTGCTGGTCAGCGACACGTTGTCCTGGCTGGGCAGGT : 280 |||||||| || ||||||||||||||||||||||||||||||||||||||||||||||| 26307 : CGTAGAGTGGGGGTGGCGTCTCGGTGCTGGTCAGCGACACGTTGTCCTGGCTGGGCAGGT : 26366 281 : CCAGCTCCCGGAGGACCTGGGGCTTCAGCTTCCCGTAGCGCTGGCTGCAGTGACGGATGC : 340 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 26367 : CCAGCTCCCGGAGGACCTGGGGCTTCAGCTTCCCGTAGCGCTGGCTGCAGTGACGGATGC : 26426 341 : TCTTGCGCTGCCATTTCTGGGTGCTGTCACTGTCCTTGCTCACTCCAAACCAGTTCGGCG : 400 ||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||| 26427 : TCTTGCGCTGCCATTTCTGGGTGCTGTCACTGTCCTTGCTCACTCCAAACCAG-TCGGCG : 26485 401 : GTCCCC <<<< Intron 2 <<<< CTGCGGATGGTCTGTGTTGATGGACGTTTGGG : 438 |||||| 898 bp |||||||||||||||| |||||||||| | || 26486 : GTCCCCct..................acCTGCGGATGGTCTGTG-TGATGGACGTCT-GG : 27419 439 : CTTTGCAGCACCGGCCGCC-GAGTTCATGG : 468 | ||||||||||||||||| ||| |||||| 27420 : CGTTGCAGCACCGGCCGCCGGAGCTCATGG : 27450
In the Ensembl CIGAR format the numbers and letters are switched, and there are no gaps in the string. So the above example in Ensembl would appear in a feature table in three rows with these CIGAR strings:
13M1I35M1I4M1I13M1D4M1I115M
37M1D164M1I12M
16M1I12M1I21M1D10M