GROUP¶
The GROUP command allows you to group the output of your query according to a number of different criteria. The GROUP command is very similar to the “group by” function in SQL.
This command groups a set of rows into a set of summary row by values of columns. It returns one row for each group. We often use the GROUP command with aggregate functions such as count
, cdist
, min
, med
, max
, dis
, set
, lis
, avg
, std
, and sum
.
The aggregate functions allow us to perform a calculation of a set of rows while grouping.
Using GROUP in a nor query will give an error if binsize
input parameter is used.
Usage¶
gor ...  GROUP binsize [ attributes ]
nor ...  GROUP [ attributes ]
Options¶

Return the count for each bin. 

Return the number of distinct rows for each bin. 

Grouping columns (other than bin). 

String columns (ac has been deprecated). 

Integer columns. 

Floating valued columns. 

Calculate the min for any type of column. 

Calculate the median for any type of column. 

Calculate the max for any type of column. 

Calculate the number of distinct values for any type of column. 

Return a comma separated set with the distinct values in the column. 

Return a comma separated list with the values in the column. 

Specify the maximum column length of a set and a list. Defaults to 10000 chars. 

Throw an exception rather than truncating the list or set if it exceeds the maximum length. 

Calculate the avg of all numeric columns. 

Calculate the std of all numeric columns. 

Calculate the sum of all numeric columns. 

The number of sliding steps per group window. 

The separator for elements in lists and sets. 

Assume the grouping columns are ordered. 
Attributes ic
and fc
explicitly define columns of either integer type or floating point type, while the sc
attribute defines it as a string type. Any column defined with the gc
option and is a part of the following aggregation
options sc
, ic
or fc
will result in an error

Works with max, min and med. 

Works with max, min, med, avg, std and sum. 

Works with max, min, med, avg, std and sum. 
Ensure that there are no spaces between the numbers. Use SELECT to pick a subset of columns from the output.
Use binsize = chrom
to aggregate for a whole chromosome and binsize = genome
to aggregate for the entire genome.
When using GROUP in a NOR context, the ordered flag can both speed up the operation and reduce the memory usage significantly. Note that there are no checks to see if the order is correct  only use this option if the input stream is correctly ordered.
Examples¶
Group columns a and b on each chromosome and counting the number of groups.
gor ...  group chrom gc a,b count
Group column a for the whole genome and calculate the sum and average for columns b and c (which are defined as floating point columns).
gor ...  group genome gc a fc b,c ave sum
Group #dnsnp#
on reference for the whole genome and return it as a set with set attribute. This query should return a set
of A,C,G,T
(in undefined order)
gor #dbsnp#  where len(reference) = 1  group genome sc reference set
In some cases, we may wish to know which individual rows have been grouped together. In that case, the lis
argument comes in handy:
gor source/var/wgs_varcalls.gord s PN
 group 1 gc reference,call lis sc PN
 top 100
The query above will output a list on each variant showing which PN carried that variant.