Copy from web.archive.org.
CigarFormat
CigarFormat is one of the output formats which can be generated
by Exonerate
It is also used in the feature tables in the Ensembl database, but in an altered form.
It is designed to contain the minimal information necessary for the reconstruction of an alignment. One alignment is described per line, to allow easy manipulation with UNIX tools.
Cigar is an acronym for Concise Idiosyncratic Gapped Alignment Report. (There is also the related Sugar - Simple Ungapped Alignment Report, and Vulgar - Verbose Ugly Labelled Gapped Alignment Report :)
Cigar format looks like this:
cigar: hs989235.cds 5 468 + hsnfg9.embl 25689 27450 + 1916 M 13 I 1 M 35 I 1 M 4 I 1 M 13 D 1 M 4 I 1 M 115 D 404 M 37 D 1 M 164 I 1 M 12 D 898 M 16 I 1 M 12 I 1 M 21 D 1 M 10
The fields are as follows:
The remaining fields are in pairs, describing the edit path throught the alignment. These contain a M,I,D or N corresponding to a Match, Insert, Delete or iNtron, followed by the length.
Below is an the alignment corresponding to the cigar line show above:
C4 Alignment display:
Model: est2genome
Raw score: 1916
Aligned positions 5->468 of query
Aligned positions 25689->27450 of target
Query: hs989235.cds
Target: hsnfg9.embl
6 : AAGCTCANCTTGGACCACCGACTCTCGANTGNNTCGCCGCGGGAGCCGGNTGGANAACCT : 64
||||||| ||||| ||||||||||||| || ||||||||||||||| |||| |||||
25690 : AAGCTCATCTTGG-CCACCGACTCTCGCTTGCGCCGCCGCGGGAGCCGG-TGGA-AACCT : 25745
65 : GAGCGGGA-CTGGNAGAAGGAGCAGAGGGAGGCAGCACCCGGCGTGACGGNAGTGTGTGG : 123
|||||||| |||| |||||||||||||||||||||||||||||||||||| |||||||||
25746 : GAGCGGGAGCTGG-AGAAGGAGCAGAGGGAGGCAGCACCCGGCGTGACGGGAGTGTGTGG : 25804
124 : GGCACTCAGGCCTTCCGCAGTGTCATCTGCCACACGGAAGGCACGGCCACGGGCAGGGGG : 183
|||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||
25805 : GGCACTCAGGCCTTCCGCAGTGTCATCTGCCACACGGAAGGCACGGCCACGGGCCAGGGG : 25864
184 : GTCTATGAT <<<< Intron 1 <<<< CTTCTGCATGCCCAGCTGGCATGGCCCCA : 221
||||||||| 404 bp |||||||||||||||||||||||||||||
25865 : GTCTATGATct..................acCTTCTGCATGCCCAGCTGGCATGGCCCCA : 26306
222 : CGTAGAGT-GGNNTGGCGTCTCGGTGCTGGTCAGCGACACGTTGTCCTGGCTGGGCAGGT : 280
|||||||| || |||||||||||||||||||||||||||||||||||||||||||||||
26307 : CGTAGAGTGGGGGTGGCGTCTCGGTGCTGGTCAGCGACACGTTGTCCTGGCTGGGCAGGT : 26366
281 : CCAGCTCCCGGAGGACCTGGGGCTTCAGCTTCCCGTAGCGCTGGCTGCAGTGACGGATGC : 340
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
26367 : CCAGCTCCCGGAGGACCTGGGGCTTCAGCTTCCCGTAGCGCTGGCTGCAGTGACGGATGC : 26426
341 : TCTTGCGCTGCCATTTCTGGGTGCTGTCACTGTCCTTGCTCACTCCAAACCAGTTCGGCG : 400
||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||
26427 : TCTTGCGCTGCCATTTCTGGGTGCTGTCACTGTCCTTGCTCACTCCAAACCAG-TCGGCG : 26485
401 : GTCCCC <<<< Intron 2 <<<< CTGCGGATGGTCTGTGTTGATGGACGTTTGGG : 438
|||||| 898 bp |||||||||||||||| |||||||||| | ||
26486 : GTCCCCct..................acCTGCGGATGGTCTGTG-TGATGGACGTCT-GG : 27419
439 : CTTTGCAGCACCGGCCGCC-GAGTTCATGG : 468
| ||||||||||||||||| ||| ||||||
27420 : CGTTGCAGCACCGGCCGCCGGAGCTCATGG : 27450
In the Ensembl CIGAR format the numbers and letters are switched, and there are no gaps in the string. So the above example in Ensembl would appear in a feature table in three rows with these CIGAR strings:
13M1I35M1I4M1I13M1D4M1I115M
37M1D164M1I12M
16M1I12M1I21M1D10M