Copy from web.archive.org.

CigarFormat

CigarFormat is one of the output formats which can be generated by Exonerate

It is also used in the feature tables in the Ensembl database, but in an altered form.

It is designed to contain the minimal information necessary for the reconstruction of an alignment. One alignment is described per line, to allow easy manipulation with UNIX tools.

Cigar is an acronym for Concise Idiosyncratic Gapped Alignment Report. (There is also the related Sugar - Simple Ungapped Alignment Report, and Vulgar - Verbose Ugly Labelled Gapped Alignment Report :)


Format

Cigar format looks like this:

cigar: hs989235.cds 5 468 + hsnfg9.embl 25689 27450 + 1916 M 13 I 1 M 35 I 1 M 4 I 1 M 13 D 1 M 4 I 1 M 115 D 404 M 37 D 1 M 164 I 1 M 12 D 898 M 16 I 1 M 12 I 1 M 21 D 1 M 10

The fields are as follows:

  1. query identifier
  2. query start position
  3. query stop position
  4. query strand
  5. target identifier
  6. target start position
  7. target stop position
  8. target strand
  9. score

The remaining fields are in pairs, describing the edit path throught the alignment. These contain a M,I,D or N corresponding to a Match, Insert, Delete or iNtron, followed by the length.


Example

Below is an the alignment corresponding to the cigar line show above:


C4 Alignment display:
  Model: est2genome
  Raw score: 1916
  Aligned positions 5->468 of query
  Aligned positions 25689->27450 of target

Query: hs989235.cds
Target: hsnfg9.embl

     6 : AAGCTCANCTTGGACCACCGACTCTCGANTGNNTCGCCGCGGGAGCCGGNTGGANAACCT :    64
         ||||||| ||||| |||||||||||||  ||   ||||||||||||||| |||| |||||
 25690 : AAGCTCATCTTGG-CCACCGACTCTCGCTTGCGCCGCCGCGGGAGCCGG-TGGA-AACCT : 25745

    65 : GAGCGGGA-CTGGNAGAAGGAGCAGAGGGAGGCAGCACCCGGCGTGACGGNAGTGTGTGG :   123
         |||||||| |||| |||||||||||||||||||||||||||||||||||| |||||||||
 25746 : GAGCGGGAGCTGG-AGAAGGAGCAGAGGGAGGCAGCACCCGGCGTGACGGGAGTGTGTGG : 25804

   124 : GGCACTCAGGCCTTCCGCAGTGTCATCTGCCACACGGAAGGCACGGCCACGGGCAGGGGG :   183
         ||||||||||||||||||||||||||||||||||||||||||||||||||||||  ||||
 25805 : GGCACTCAGGCCTTCCGCAGTGTCATCTGCCACACGGAAGGCACGGCCACGGGCCAGGGG : 25864

   184 : GTCTATGAT  <<<< Intron 1 <<<<  CTTCTGCATGCCCAGCTGGCATGGCCCCA :   221
         |||||||||        404 bp        |||||||||||||||||||||||||||||
 25865 : GTCTATGATct..................acCTTCTGCATGCCCAGCTGGCATGGCCCCA : 26306

   222 : CGTAGAGT-GGNNTGGCGTCTCGGTGCTGGTCAGCGACACGTTGTCCTGGCTGGGCAGGT :   280
         |||||||| ||  |||||||||||||||||||||||||||||||||||||||||||||||
 26307 : CGTAGAGTGGGGGTGGCGTCTCGGTGCTGGTCAGCGACACGTTGTCCTGGCTGGGCAGGT : 26366

   281 : CCAGCTCCCGGAGGACCTGGGGCTTCAGCTTCCCGTAGCGCTGGCTGCAGTGACGGATGC :   340
         ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
 26367 : CCAGCTCCCGGAGGACCTGGGGCTTCAGCTTCCCGTAGCGCTGGCTGCAGTGACGGATGC : 26426

   341 : TCTTGCGCTGCCATTTCTGGGTGCTGTCACTGTCCTTGCTCACTCCAAACCAGTTCGGCG :   400
         ||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||
 26427 : TCTTGCGCTGCCATTTCTGGGTGCTGTCACTGTCCTTGCTCACTCCAAACCAG-TCGGCG : 26485

   401 : GTCCCC  <<<< Intron 2 <<<<  CTGCGGATGGTCTGTGTTGATGGACGTTTGGG :   438
         ||||||        898 bp        |||||||||||||||| |||||||||| | ||
 26486 : GTCCCCct..................acCTGCGGATGGTCTGTG-TGATGGACGTCT-GG : 27419

   439 : CTTTGCAGCACCGGCCGCC-GAGTTCATGG :   468
         | ||||||||||||||||| ||| ||||||
 27420 : CGTTGCAGCACCGGCCGCCGGAGCTCATGG : 27450




In the Ensembl CIGAR format the numbers and letters are switched, and there are no gaps in the string. So the above example in Ensembl would appear in a feature table in three rows with these CIGAR strings:

13M1I35M1I4M1I13M1D4M1I115M
37M1D164M1I12M
16M1I12M1I21M1D10M


Related pages: Exonerate