GRANNO¶
The GRANNO command is a “single-pass” aggregation and annotation. GRANNO adds annotation columns to the output stream for all the rows that fall into the given binsize in addition to those that would normally be added by the GROUP command.
When using GRANNO in a gor query, the command takes a binsize parameter, which divides the entire range of values into a series of intervals. The command then annotates the values falling into each interval.
Note
Using GRANNO in a nor query omits the binsize
input parameter.
Usage¶
gor ... | GRANNO binsize [ attributes ]
nor ... | GRANNO [ attributes ]
Options¶
|
Return the count for each bin. |
|
Return the number of distinct rows for each bin. |
|
Grouping columns (other than bin). |
|
String columns (-ac has been deprecated). |
|
Integer columns. |
|
Floating valued columns. |
|
Calculate the min for any type of column. |
|
Calculate the median for any type of column. |
|
Calculate the max for any type of column. |
|
Calculate the number of distinct values for any type of column. |
|
Return a comma separated set with the distinct values in the column. |
|
Return a comma separated list with the values in the column. |
|
Specify the maximum column length of a set. Defaults to 200 chars. |
|
Calculate the avg of all numeric columns. |
|
Calculate the std of all numeric columns. |
|
Calculate the sum of all numeric columns. |
|
The number of sliding steps per group window. |
|
The separator for elements in lists and sets. |
|
This interpretes the binsize as the maximum range or span for which the group (as specified with the -gc option) extends. Groups that extend beyond the specified range will not be properly aggregated and annotations of row belonging to those group may be incorrect. A special binsize value, “gene”, can be used to denote 3Mbp in conjunction with the range option. |
|
Assume the grouping columns are ordered. |
When using GRANNO in a NOR context, the ordered flag can both speed up the operation and reduce the memory usage significantly. Note that there are no checks to see if the order is correct - only use this option if the input stream is correctly ordered.
Examples¶
gor #dbSNP# | JOIN -snpseg #genes#
| GRANNO chrom -gc genestart,genestop,genename -count
is equivalent to the more verbose
gor #dbSNP# | join -snpseg <(gor #dbSNP# | JOIN -snpseg #genes#
| GROUP chrom -gc genestart,genestop,genename
| SELECT chrom,genestart,genestop,allCount | SORT chrom)
The range option can be used like this:
gor #dbSNP# | JOIN -snpseg #genes# | GRANNO gene -range -gc gene_symbol -count
will add a column (allCount) to each SNP row with, representing the number of rows (SNPs) that belong to each gene.
See also the RANK command which is of similar nature as the GRANNO (group anno) command