Used in: gor/nor


The GROUP command allows you to group the output of your query according to a number of different criteria. The GROUP command is very similar to the “group by” function in SQL.

This command groups a set of rows into a set of summary row by values of columns. It returns one row for each group. We often use the GROUP command with aggregate functions such as -count, -cdist, -min, -med, -max, -dis, -set, -lis, -avg, -std, and -sum.

The aggregate functions allow us to perform a calculation of a set of rows while grouping.

Using GROUP in a nor query will give an error if binsize input parameter is used.


gor ... | GROUP binsize [ attributes ]
nor ... | GROUP [ attributes ]



Return the count for each bin.


Return the number of distinct rows for each bin.

-gc cols

Grouping columns (other than bin).

-sc cols

String columns (-ac has been deprecated).

-ic cols

Integer columns.

-fc cols

Floating valued columns.


Calculate the min for any type of column.


Calculate the median for any type of column.


Calculate the max for any type of column.


Calculate the number of distinct values for any type of column.


Return a comma separated set with the distinct values in the column.


Return a comma separated list with the values in the column.

-len number

Specify the maximum column length of a set and a list. Defaults to 10000 chars.


Throw an exception rather than truncating the list or set if it exceeds the maximum length.


Calculate the avg of all numeric columns.


Calculate the std of all numeric columns.


Calculate the sum of all numeric columns.

-steps number

The number of sliding steps per group window.

-s 'sep'

The separator for elements in lists and sets.


Assume the grouping columns are ordered.

Attributes -ic and -fc explicitly define columns of either integer type or floating point type, while the -sc attribute defines it as a string type. Any column defined with the -gc option and is a part of the following aggregation options -sc, -ic or -fc will result in an error


Works with -max, -min and -med.


Works with -max, -min, -med, -avg, -std and -sum.


Works with -max, -min, -med, -avg, -std and -sum.

Ensure that there are no spaces between the numbers. Use SELECT to pick a subset of columns from the output.

Use binsize = chrom to aggregate for a whole chromosome and binsize = genome to aggregate for the entire genome.

When using GROUP in a NOR context, the ordered flag can both speed up the operation and reduce the memory usage significantly. Note that there are no checks to see if the order is correct - only use this option if the input stream is correctly ordered.


Group columns a and b on each chromosome and counting the number of groups.

gor ... | group chrom -gc a,b -count

Group column a for the whole genome and calculate the sum and average for columns b and c (which are defined as floating point columns).

gor ... | group genome -gc a -fc b,c -ave -sum

Group #dnsnp# on reference for the whole genome and return it as a set with -set attribute. This query should return a set of A,C,G,T (in undefined order)

gor #dbsnp# | where len(reference) = 1 | group genome -sc reference -set

In some cases, we may wish to know which individual rows have been grouped together. In that case, the -lis argument comes in handy:

gor source/var/wgs_varcalls.gord -s PN
    | group 1 -gc reference,call -lis -sc PN
    | top 100

The query above will output a list on each variant showing which PN carried that variant.