Functions

The following tables contain all of the functions that can be used in WHERE and CALC expressions in GOR queries.

EVAL(string) : string

Returns the result of a GOR/NOR query

Boolean Functions

Certain functions in GOR can be used to return a Boolean result (i.e. 0 for false, 1 for true) or a conditional result. By their nature, Boolean-type functions cannot be used with all CALC and WHERE commands (as other functions can). For example, the IF function returns one of two values based on a condition set in the function. As a result, IF can only be used with CALC commands.

Boolean functions in GOR query language

Function

Description

More info

IF(cond,any,any) : any

Returns the second parameter if the condition in the first parameter is true, otherwise the third parameter.

IF

ISINT(col) : bool

Test if a column has integer value.

ISINT

ISFLOAT(col) : bool

Test if a column has floating point value.

ISFLOAT

LISTHASANY(str,lit-list) : bool

Returns true if the comma separated list has an element in the lit-list.

CONTAINSANY

CSLISTHASANY(str,lit-list) : bool

Case-sensitive version of CONTAINSANY

CONTAINSANY

CONTAINS(str,lit-list) : bool

Returns true if the string contains all the string literals in the list. Not case-sensitive.

CONTAINS, CONTAINSALL

CONTAINSALL(str,lit-list) : bool

Returns true if the string contains all the string literals in the list. Not case-sensitive. Same as CONTAINS, CONTAINSALL.

CONTAINS, CONTAINSALL

CONTAINSANY(str,lit-list) : bool

Returns true if the string contains any of the string literals in the list. Not case-sensitive. Same as CONTAINS, CONTAINSALL.

CONTAINS, CONTAINSALL

CONTAINSCOUNT(str,lit-list) : int

Returns the number of elements in the comma separated list that are in the lit-list.

CONTAINSCOUNT

CSCONTAINS(str,lit-list) : bool

Case-sensitive version of CONTAINS, CONTAINSALL.

CONTAINS, CONTAINSALL

CSCONTAINSALL(str,lit-list) : bool

Returns true if the string contains all the string literals in the list. Not case-sensitive. Same as CONTAINS, CONTAINSALL.

CONTAINS, CONTAINSALL

CSCONTAINSANY(str,lit-list) : bool

Returns true if the string contains any of the string literals in the list. Not case-sensitive. Same as CONTAINS, CONTAINSALL.

CONTAINS, CONTAINSALL

Type and String Conversion

Type and string conversion in GOR query language

Function

Description

More info

STR(num) : str

Convert number to string.

STR

STRING(num) : str

Same as STR.

STR

FLOAT(str) : float

Convert string to float.

FLOAT

FLOAT(str, float) : float

Convert string to float with default value.

FLOAT

NUMBER(str) : float

Same as FLOAT.

FLOAT

INT(str) : int

Convert string to integer.

INT

BASE26(num) : str

Converts an integer number to base 26 representation.

BASE26

BASEPN(num) : str

Converts an integer number to PN-like ID, e.g. AAAAAAA

BASEPN

LEN(str) : int

The length of the string.

LEN

REVERSE(str) : str

The string reversed.

REVERSE

TRIM(str) : str

Trims whitespace from the beginning and end of a string.

TRIM

MD5(str) : str

The MD5 message-digest of the string.

MD5

IOOA(str) : int

In order of appearance, i.e. returns an integer number based on the order in which str value shows up.

IOOA

UPPER(str) : str

Upper-case a string.

UPPER

LOWER(str) : str

Lower-case a string.

LOWER

LEFT(str,int) : str

Return the specified left-most characters in a string.

LEFT

RIGHT(str,int) : str

Return the specified right-most characters in a string.

RIGHT

SUBSTR(str,int,int) : str

A substring. A zero-based range, not including the last position.

SUBSTR

MID(str,int,int) : str

A substring. A zero-based starting position and size.

MID

REPLACE(str,str,str) : str

Replace from a string a pattern with the new string.

REPLACE

POSOF(str,str) : int

The position within a string of a string. -1 if not found.

POSOF

BRACKETS(string) : string

Adds brackets around the string, e.g. LISTMAP(LISTZIP(col1,col2),’BRACKETS(x)’).

BRACKETS

UNBRACKET(string) : string

Removes surrounding brackets from a string.

UNBRACKET

SQUOTE(string) : string

Adds single quotes around the string, e.g. LISTMAP(LISTZIP(col1,col2),’SQUOTE(x)’).

SQUOTE

SUNQUOTE(string) : string

Removes surrounding single quotes from a string.

SUNQUOTE

DQUOTE(string) : string

Adds double quotes around the string, e.g. LISTMAP(LISTZIP(col1,col2),’DQUOTE(x)’).

DQUOTE

DUNQUOTE(string) : string

Removes surrounding double quotes from a string.

DUNQUOTE

FORM(num,int,int) : str

Format a number, specifying the space and decimals (C-style %x.yf format style).

FORM

EFORM(num,int,int) : str

Format a number, specifying the space and decimals (C-style %x.ye format style).

EFORM

GFORM(num,int,int) : str

Format a number, specifying the space and decimals (C-style %x.yg format style).

GFORM

REGSEL(str,str-const) : str

Extract a single substring based on regular expression binding pattern, e.g. with single brackets.

REGSEL

DECODE(str,str) : str

Decode (map) the string, given key-value mapping pairs.

DECODE

Algebraic Functions

Algebraic functions in GOR query language

Function

Description

More info

FLOOR() : int

Floor value, e.g. the largest integer that is smaller or equal.

FLOOR

CEIL(num) : int

Ceiling value, e.g. the smallest integer that is larger or equal.

CEIL

ROUND(num) : int

The closest integer.

ROUND

DIV(int,int) : int

The quotient in integer division.

DIV

MOD(int,int) : int

The remainder in integer division.

MOD

POW(num,num) : float

Raise the first parameter to the power of the second.

POW

MIN(num,num) : num

Minimum.

MIN

MAX(num,num) : num

Maximum.

MAX

SQRT(num) : float

Square root.

SQRT

SQR(num) : float

Square.

SQR

ABS(num) : num

The absolute value, e.g. abs(x) = |x|.

ABS

LOG(num) : float

10-based logarithm.

LOG

LN(num) : float

Natural logarithm.

LN

EXP(num) : float

The natural exponent, e.g. exp(x) = e^x.

EXP

RANDOM() : num

A random floating point number.

RANDOM

SEGDIST(x,y,a,b) : num

The distance between two segments. Returns 0 if segments overlap.

SEGDIST

SEGOVERLAP(x,y,a,b) : num

Determines if two segments overlap.

SEGOVERLAP

Trigonometric Functions

Trigonometric functions in GOR query language

Function

Description

More info

SIN(num) : float

Sine based on radians.

SIN

COS(num) : float

Cosine based on radians.

COS

ASIN(num) : float

Arc-sine or inverse sine.

ASIN

ACOS(num) : float

Arc-cosine.

ACOS

TAN(num) : float

Tangents based on radians.

TAN

ATAN(num) : float

Arc-tangents.

ATAN

Statistical Functions

Algebraic functions in GOR query language

Function

Description

More info

CHI(int,int,int,int) : float

The chi-squared value in a 2x2 table.

chi_2x2

CHI2(float,float) : float

Chi-square complement.

chi2

CHISQUARE(float,float) : float

Chi-square

CHISQUARE

CHISQUARECOMPL(float,float)

Chi-square complement. Same as CHI2.

chi2

PVAL(int,int,int,int) : float

The 2-sided Fisher-exact or Chi-square based p-val for a 2x2 table.

pval

PVALONE(int,..,int) : float

The one-sided Fisher-exact or Chi-square based p-val for a 2x2 table.

pvalone

STUDENT(int,float) : float

Student T distribution. (degrees of freedom, integration end point)

student

INVSTUDENT(float,int) : float

Inverse Student T distribution (alpha, size).

invstudent

NORMAL(float) : float

Normal distribution

normal

INVNORMAL(float) : float

Inverse Normal distribution.

invnormal

POISSON(int,float) : float

Poisson distribution.

poission

POISSONC(int,float) : float

Poisson complement. See http://acs.lbl.gov/software/colt/api/cern/jet/stat/Probability.html

poissonc

List Functions

The table below shows a complete list of functions for dealing with lists in the GOR query language.

The first input value for any list function is the source of the list. This can be explicitly set by either by listing elements in quotation marks or by referencing a column in the input stream. The other input values vary depending on the function. String inputs must be input

With functions that allow a sep or separator value, note that the separator value in the list is assumed to be a comma and the separator value is optional. If a separator other than a comma is used in the list, the separator value should be defined in quotation marks.

Click on “More info” for any of the functions below to get more information about how to use the function in a GOR query.

List functions in GOR query language

Function

Description

More info

LISTHASANY(str,lit-list) : bool

Returns true if the comma separated list has an element in the literal list.

LISTHASANY

LISTHASCOUNT(str,lit-list) : int

Returns a number indicating how many items in the literal list are contained in the list defined in the first parameter.

LISTHASCOUNT

CONTAINS(str,lit-list) : bool

Returns true if the string contains all the string literals in the list. Not case-sensitive.

CONTAINS, CONTAINSALL

CSCONTAINS(str,lit-list) : bool

Returns true if the string contains all the string literals in the list. Case-sensitive.

CONTAINS, CONTAINSALL

CONTAINSCOUNT(str,lit-list) : int

Returns a number indicating how many items in the literal list are contained in the string. Also try CONTAINSANY, CSCONTAINSANY, CSCONTAINSCOUNT, LISTHASCOUNT, and CSLISTHASCOUNT.

CONTAINSCOUNT

LISTFIRST(str[,sep]) : str

Retrieves the first element (i.e. the head) in a comma-separated list, with an optional separator argument.

listfirst

LISTSECOND(str[,sep]) : str

Second element in a comma separated list, with an optional separator argument.

LISTSECOND

LISTNTH(str,int[,sep]) : str

Nth element in a comma separated list, with an optional separator argument.

listnth

LISTLAST(str[,sep]) : str

Last element in a comma separated list, with an optional separator argument.

LISTLAST

LISTTAIL(str[,sep]) : str

The tail (the list minus the first element), with an optional separator argument.

listtail

LISTREVERSE(str[,sep]) : str

The list reversed, with an optional separator argument.

LISTREVERSE

LISTSORTASC(str) : str

The list sorted alphabetically in a ascending order.

LISTSORTASC

LISTSORTDESC(str) : str

The list sorted alphabetically in a descending order.

listsortdesc

LISTNUMSORTASC(str) : str

The list sorted numerically in a ascending order.

listnumsortasc

LISTNUMSORTDESC(str): str

The list sorted numerically in a descending order

listnumsortdesc

LISTTRIM(str) : str

A comma separated list trimmed from white-spaces.

listtrim

LISTDIST(str) : str

The distinct elements in the list, i.e. corresponding set.

listdist

LISTMAX(str) : str

The maximum element (element as string).

listmax

LISTMIN(str): str

The minimum element (element as string).

listmin

LISTSIZE(str[,sep]) : int

The size of the list, with an optional separator argument.

listsize

LISTCOUNT(str[,sep]) : int

Count frequency of elements in a list, returning the pairs (e1;count1,..,en;countn), with an optional separator argument.

listcount

LISTNUMMAX(str) : float

The maximum element (element as number).

listnummax

LISTNUMMIN(str): float

The minimum element (element as number).

listnummin

LISTNUMSUM(str) : float

The sum of the elements (element as numbers).

listnumsum

LISTNUMAVG(str) : float

The average of the elements (element as numbers).

listnumavg

LISTNUMSTD(str) : float

The unbiased standard deviation of the elements (element as numbers).

listnumstd

LISTINDEX(str,str[,sep]) : int

The one based index to a list of elements where a target is found. The function will search the list defined in parameter 1 for the first instance of parameter 2. Optional separator argument.

listindex

LISTMAP(str,str-con[,sep]) : str

The list translated using expression provided in second argument. Element denoted with x. Optional separator argument.

LISTMAP

LISTFILTER(str,str-con[,sep]) : str

The list filtered using expression provided in second argument. Element as x, index as i. Optional separator argument. Example LISTFILTER(col,’x != 1’) or LISTFILTER(col,’i > 2’)

LISTFILTER

LISTZIP(str,str[,sep,delim]) : str

Two lists zipped together, each pair of elements separated with a semicolon. Optional separator and delimter arguments.

LISTZIP

LISTZIPFILTER(str,str,str-con[,sep,delim]) : str

Filter the first list by the content of the second list, with an optional separator argument. Optional separator and delimter arguments.

listzipfilter

LISTCOMB(str,int,int) : str

Returns a semi-comma-separated list of all combinations of elements in the input list of length within the interval specified by the input integers.

LISTCOMB

LISTADD(str,str[,sep]) : str

Returns a list with the given item added to the end, with an optional separator argument.

LISTADD

FSVMAP(str,int,str-con,str) : str

The list of equally separated values (second argument) translated using expression provided in third argument. Fourth argument is the result separator. Element denoted with x. Example FSVMAP(col,2,’x+1’,’,’). Also see LISTMAP.

fsvmap

COLS2LIST(str[,sep]) : str

Collapse values from multiple columns into a single list, with an optional separator argument.

COLS2LIST

COLS2LISTMAP(str,str[,sep]) : str

Collapse values from multiple columns into a single list with an expression applied, with optional separator argument

COLS2LISTMAP

Genomic-Specific Functions

Genomic-Specific functions in GOR query language

Function

Description

More info

HAPLDIFF(str,str) : int

The Hamming-like distance between two haplotype strings.

hapldiff

VARSIG(str,str) : str

Variant signature.

varsig

REVCOMPL(str) : str

Reverse complement of a DNA sequence string.

revcompl

RC(str) : str

Shorthand for REVCOMPL(str).

revcompl

REVCIGAR(str) : str

BAM cigar string for the corresponding reverse complement sequence.

revcigar

REFBASE(str,int) : str

The reference base at the given locus, based on the build specified in the gor_config.txt file.

refbase

REFBASES(str,int,int) : str

The reference bases, based on the build specified in the gor_config.txt file.

refbases

BAMTAG(col,str) : str

Extract a single substring from an attribute value TAG_VALUE-like field (as in BAM files).

bamtag

TAG(col,str,sep) : str

Extract a single substring from an attribute value field (as in GFF or VCF files, e.g. use semicolon ‘;’ as separator).

tag

VCFFORMATTAG(str,str,str) : str

Gets a value from a vcf file. Parameters are: <name of the format column that contains the format of the data column>, <the name of the data column that contains data for a given PN, the column name is the same as the pn>, <the tag to get value for>.

vcfformattag

IHA(str,str) : int

Returns 1 if IUPAC genotype string contains SNP allele str, zero otherwise.

iha

IUPAC2GT(str) : str

Converts IUPAC genotype to ‘A1/A2’ genotype.

iupac2gt

IUPACGTSTAT(str,str) : str

Input IUPAC genotypes for subject, father and mother. Returns ‘0’ if IHE, ‘1’ if OK, and ‘2’ if OK and phase-able.

iupacgtstat

IUPACFA(str,str,str) : str

Returns the SNP allele of the father. Only valid if IUPACGTSTAT returns 2.

iupacfa

IUPACMA(str,str,str) : str

Returns the SNP allele of the mother. Only valid if IUPACGTSTAT returns 2.

iupacma

GTSHARE(str,int,str,str,int,str,str) : int

Input two (pos,Ref,Alleles) genotypes where Alleles = ‘All1,All2,..’ or ‘All1/All2/..’ or ‘All1|All2’. Returns the number of identical allels based on all pairwise comparisons between Alleles1 and Alleles2. First parameter is chr.

gtshare

GTSTAT(int,str,str,int,str,str,int,str,str) : str

Input (pos,Ref,Alt) genotypes for subject, father and mother. Returns ‘0’ if IHE, ‘1’ if OK, and ‘2’ if OK and phase-able.

gtstat

GTFA(int,str,str,int,str,str,int,str,str) : str

Returns the Alt allele of the father. Only valid if GTSTAT returns ‘2’.

gtfa

GTMA(int,str,str,int,str,str,int,str,str) : str

Returns the Alt allele of the mother. Only valid if GTSTAT returns ‘2’.

gtma

INDAG(dag file,str-cont) : bool

Not a standard function. Should be used as | where go_id INDAG(‘go.txt’,’GO:111111’) or INDAG([#temp#],’GO:111111’)

indag

Genotype-quality Functions

Genotype-quality functions in GOR query language

Function

Description

More info

CHARS2PRHOM(str) : float

Turn 2 chars into homozygous genotype prob

chars2prom

CHARS2PRHET(str) : float

Turn 2 chars into heterozygous genotype prob

chars2prhet

CHARS2DOSE(str) : float

Turn 2 chars into genotype dosage

chars2dose

CHARS2PRPRPR(str) : str

Turn 2 chars, round((1.0-pr)*93.0)+33 , into genotype prob triplet (Pr(gt=0),Pr(gt=1),Pr(gt=2)). Two spaces (c32c32) map to ‘0;0;0’ (unknown). Assumes non-phased.

chars2prprpr

CHARS2PRPR(str) : str

Turn 2 chars, round((1.0-pr)*93.0)+33 , into genotype prob doublet (Pr(gt=1),Pr(gt=2)).

chars2prpr

CHAR2PR(str) : str

Turn on char into probability, e.g CHAR2PR(‘!’) = 1.0, CHAR2PR(‘~’)=0.

chars2pr

PR2CHAR(str) : str

Turn probability to characters, i.e. the semi-inverse of CHAR2PR.

pr2char

PRPR2CHARS(str) : str

Turn probability pair to chars, e.g. PRPR2CHAR(‘1.0;0.0’) = ‘!~’

prpr2chars

PRPRPR2CHARS(str) : str

Turn probability triplet to chars, e.g. PRPR2CHAR(‘1.0;0.0;0.0’) = ‘!~’

prprpr2chars

PRPR2CHARS(str,sep) : str

Turn probability pair to chars, e.g. PRPR2CHAR(‘1.0;0.0’) = ‘!~’ with custom separator

prpr2chars

PRPRPR2CHARS(str,sep) : str

Turn probability triplet to chars, e.g. PRPRPR2CHAR(‘1.0;0.0’) = ‘!~’ with custom separator

prprpr2chars

CHARS2GT(str,float) : str

Turn 2 chars, round((1.0-pr)*93.0)+33 , into genotype 0,1,2 if prob >= thresh else 3. Two spaces (c32c32) map to 3 (unknown). Assumes non-phased probabilities.

chars2gt

CHARSPHASED2GT(str,float) : str

Turn 2 chars, round((1.0-pr)*93.0)+33 , into phased genotype 0,1,2 if prob >= thresh else 3. Two spaces (c32c32) map to 3 (unknown). Assumes probabilities of phased haplotypes. Pr(gt=0) = (1.0 - pfalt) * (1.0 - pmalt), Pr(gt=1) = (1.0 - pfalt) * pmalt + pfalt * (1.0 - pmalt), Pr(gt=2) = pfalt * pmalt, where pfalt and pmalt are the chars probabilities values.

charsphased2gt

Date Functions

Date functions in GOR query language

Function

Description

More info

ADDDAYS(string, string, int) : string

Add some days to the given date.

ADDDAYS

ADDMONTHS(string, string, int) : string

Add some months to the given date.

ADDMONTHS

ADDYEARS(string, string, int) : string

Add some years to the given date.

ADDYEARS

CURRENTDATE() : string

The current date and time in the format ‘yyyy-MM-dd HH:mm:ss’ - same as DATE.

CURRENTDATE

CURRENTDATE(string) : string

The current date and time in a specific format - same as DATE.

DATE

DATE() : string

The current date and time in the format ‘yyyy-MM-dd HH:mm:ss’.

DATE

DATE(string) : string

The current date and time in a specific format, defined by a string of characters that represent date/time units.

DATE

DAYDIFF(string, string, string) : int

The difference, in days, between two dates.

DAYDIFF

DAYOFWEEK(string, string) : int

The day of week of the given date.

DAYOFWEEK

DAYOFMONTH(string, string) : int

The day of month of the given date.

DAYOFMONTH

DAYOFYEAR(string, string) : int

The day of year of the given date.

DAYOFYEAR

EDATE(long) : string

A specific time, indicated by a timestamp, in the format ‘yyyy-MM-dd HH:mm:ss’.

edate

EDATE(long,string) : string

A specific time, indicated by a timestamp, in a specific format. The format is defined in the same way as with date(string).

edate

EPOCH() : long

A timestamp of the current time.

epoch

EPOCH(string,string) : long

A timestamp of a specific time, indicated with a specified format. The format is defined in the same way as with date(string) and edate(long, string). Example: epoch(‘16/06/2017’,’dd/MM/yyyy’).

epoch

MONTH(string, string) : int

The month of the given date.

MONTH

MONTHDIFF(string, string, string) : int

The difference, in months, between two dates.

MONTHDIFF

YEAR(string, string) : int

The year of the given date.

YEAR

YEARDIFF(string, string, string) : int

The difference, in years, between two dates.

YEARDIFF

Administration Functions

Diagnostic Functions

Diagnostic functions in GOR query language

Function

Description

More info

TIME() : int

The time in milli seconds since the query started.

time

SLEEP(int) : string

Sleep for given milliseconds while processing each row

sleep

HOSTNAME() : string

Name of the host running the query

hostname

IP() : string

IP number of the host running the query

ip

ARCH() : string

CPU architecture of the host running the query

arch

THREADID() : int

Thread id of the thread running the query

threadid

CPULOAD() : float

The cpuload of the process running the query

cpuload

SYSCPULOAD() : float

The cpuload on the system running the query

syscpuload

FREE() : float

Free physical memory on the system running the query

free

FREEMEM() : float

Free memory on the system running the query

freemem

TOTALMEM() : float

Total memory on the system running the query

totalmem

MAXMEM() : float

Maximum memory of the process running the query

maxmem

AVAILCPU() : int

Number of available cpus on the system

availcpu

OPENFILES() : int

Number of open filedescriptors on the system

openfiles

MAXFILES() : int

Maximum number of file descriptors

maxfiles

SYSTEM(string) : string

Returns one line from the stdout of a whitelisted system command

system

AVGSEEKTIMEMILLIS() : float

Returns the average seektime for the current rowSource in milliseconds

avgseektimemillis

AVGROWSPERMILLIS() : float

Returns average rows per millisecond for the current rowSource

avgrowspermillis`

AVGBASESPERMILLIS() : float

Returns average bases per millisecond for the current rowSource

avgbasespermillis

Version Information

Version information in GOR query language

Function

Description

More info

GORVERSION() : str

Returns the GOR version

gorversion

MAJORVERSION() : int

Returns the major version of GOR

majorversion

MINORVERSION() : int

Returns the minor version of GOR

minorversion

JAVAVERSION() : str

Returns the JRE version

javaversion