Glossary¶

base-quality

The base quality or Phred quality score is a measure of the quality of the identification of the nucleobases generated by automated DNA sequencing. Essentially, this is an indication of the likelihood of this base call being correct.

bioinformatician

People who apply information technology to biological, medical, ad health research, using computational tools to gather and analyze data in fields such as genomics. In other words, you or someone sitting next to you.

Cartesian join

A join of every row of one table to every row of another table. For example, if table A has 100 rows and is joined with table B, which has 1,000 rows, a Cartesian join will result is 100,000 rows.

CIGAR

CIGAR is a string which describes how an individual read aligns with the larger reference sequence. A CIGAR may consist of one or many components, with each component having an operator and a number of bases that the operator applies to. Operators can be DHIMNPSX or =.

Clinical Sequence Analyzer

The Clinical Sequence Analyzer (CSA) is a GUI tool used for data mining and report generation from raw genetic data.

equi-join

A join that contains a condition with an equality operator. An equi-join will return only those rows with equivalent values in the columns specified. These are denoted in GOR with the options -xl and -xr in a JOIN command.

genome

The complete set of genes or genetic material present in a cell or organism.

genomic ordered data

Genomic data ordered by the genomic position of the data.

GOR

Genomic Ordered Relation. The main component of GOR is the GOR data processing language, but other components are the GORServer, GORWorker, and AppServer.

The term “GOR” may be used in reference to:

The declarative query language used to structure commands to access information in the GOR database.
The GOR database itself.
The GOR architecture as a whole.
The act of merging two streams together, e.g. to “gor” together two files.

GORpipe

A command line interface for the GOR language.

GOR stream

A stream of data that is genomic-ordered. In other words, the output from a GOR query.

GOR Query Language

The subject of this manual. A query language for processing, filtering, and outputting genomic-ordered (and non-ordered) relational data.

literal list

A list of items contained in quotation marks and separated by a comma. E.g., ‘apples’, ‘bananas’, ‘oranges’.

locus

Pl. loci. A fixed position on a chromosome, like the position of a gene or a marker (genetic marker).

mapfile

A tab-separated value file, preferably with a header, that is used with the MAP and MULTIMAP commands to annotate data in the gor system.

NOR

Non-Ordered Relations. A subset of the commands in the GOR query language is also usable in NOR.

Phred scale

The Phred scale or “Phred Quality Score” is a unit of measurement for base quality. See also: base-quality.

Pipe steps

Commands in a GOR query that manipulate the data that are returned by a source command.

proband

definition

Sequence Miner

A GUI tool that enables deep data mining and custom queries on top of raw genetic data as well as derived data.

Source commands

Commands in the GOR query language that start GOR queries and generate some data to work with.

stream

A stream in terms of the GOR query language is a set of data that is being output. The terminology comes from the GOR architectures link to pipe syntax, where commands are visualised as sections of pipe (like plumbing) and the data going through them are thought of as “streams” (as in water). (also: stream of segments, input stream, upstream/downstream)

variants

All the different ways that one person’s DNA sequence can differ from the reference DNA sequence (e.g. Single nucleotide polymorphisms, insertions, deletions, substitutions, structural variants).

VEP

The Variant Effect Predictor, or VEP, determines the effect of your variants (see above) on genes, transcripts, and protein sequence, as well as regulatory regions.

zero-based position

A numbering format that starts from zero where individual bases in the genomic sequence actually occupy the spaces between the numbers. 0-based systems include UCSC, where other systems like Ensembl use 1-based. GOR is 1-based.