Used in: gor only

CIGARSEGS

The CIGARSEGS command takes the sequence read from a BAM-like stream and splits them into multiple reads based on the CIGAR string. As such, the input must have a column named CIGAR. The -gc option can be used to annotate the reads with other columns from the input.

CIGAR is a string which describes how an individual read aligns with the larger reference sequence. A CIGAR may consist of one or many components, with each component having an operator and a number of bases that the operator applies to. Operators can be DHIMNPSX or =. These are explained in the following table:

Operator

Description

D

Deletion, i.e. the nucleotide is _not_ present in the read, but is present in the reference.

H

Hard Clipping; the clipped nucleotides are not present in the read.

I

Insertion, i.e. the nucleotide is present in the read, but is _not_ present in the reference.

M

Match, i.e. the nucleotide is present in both the read and the reference.

N

Skipped region, where a whole region of nucleotides is not present in the read.

P

Padding, where there exists a padded area in the read but not in the reference.

S

Soft Clipping; the clipped nucleotides are present in the read.

X

Read mismatch, where the nucleotide is present in the reference.

=

Read match, where the nucleotide is present in the reference.

Usage

gor *.bam ... | CIGARSEGS [-seq] [-gc Cols -readlength size (def. 1000bp)]

Options

-gc cols

Annotate the reads with the specified columns from the reads.

-seq

Output the sequence of the segment.

-readlength s

The max read length.

Examples

Following is an example that finds the distribution of how RNA reads map to 0, 1, 2, …, N exons:

gor file.bam | ROWNUM | RENAME rownum readID | CIGARSEGS -gc pos,readID | SORT 10000 | JOIN -segseg #exons# -l
| CALC overlap IF(genes != '',1,0) | SELECT 1,pos,readid,overlap | SORT 10000 | GROUP 1 -gc readID -sum -ic overlap
| GROUP genome -gc sum_overlap -count

See also BASES and VARIANTS, but variants is equivalent to the deprecated -ref option in CIGARSEGS.