RELREMOVE¶
The RELREMOVE command is used to remove related samples/individuals (PNs) from a phenotype relation.
The leftmost column must be the sample identifier (e.g. PN) and the other columns should represent one or more phenotypes with casecontrolunknown statuses or quantitative trait (QT). QTs use ‘NA’ to represent missing value while casecontrol phenotypes can be in Plink compatible format (case/ctrl/unknown) = (2/1/9), (2/1/NA), (2/1/0) or in a format like (CASE/CTRL/EXCL) or (CASE/CTRL/NA).
Individuals that are “eliminated” are by default set to the unknown value, as defined in the column at hand, however,
this can be overwritten by the rsymb
option. This can also be useful to inspect which samples are eliminated.
Cases and control are by default treated as one group, however, when there is a relationship between a case and a control,
the control is eliminated first. The option sepcc
can be used to ignore relationships across the case and the control group.
The algorithm that eliminates relatives is greedy, i.e. eliminates the sample with the most relative first and then updates the relative count after each elimination, continuing until there are no related pairs in each phenotype.
The relatives are supplied as a binary relation, e.g. (pn1,pn2). Note that this relation does not have to be symmetric because transitivity in relationships is assumed.
Usage¶
nor phenotypes.tsv  RELREMOVE relative_relation [sepcc] [rsymb value]
Options¶

Symbol to overwrite the definition of unknown/exclusion. 

Treat cases and controls as separate groups when analyzing relationships. 

A extra column with interger weights from 0 to 100. Samples with larger weight are less likely to be removed. Noninteger numbers are treated as zero and number larger than 100 as 100 
Examples¶
Eliminating relatives for 100 phenotypes in two steps, first closely related individuals  then less closely related individuals.
nor pheno.tsv  select PN,pheno1pheno100
 relremove <(nor relatives.tsv  where kinship >= 0.2  select pn1,pn2)
 relremove <(nor relatives.tsv  where kinship >= 0.05  select pn1,pn2)
An short example showing how sample 2 and 3 are kept, 2 because of weight and 3 because of sepcc
option.
norrows 1  calc pn '1,2,3'  calc weight '10,20,10'  select pn,weight  calc pheno '2,2,1' split pn,weight,pheno
 relremove <(norrows 1  calc pn1 '1,3'  calc pn2 '2'  select pn1,pn2  split pn1) rsymb elim sepcc weightcol weight
Below are examples of selfcontained tests that may help to explain the command.
/* Generate 100k samples */
create #pns# = norrows 100000  calc pn #1  select pn;
/* Generate artifical relationships */
create #r# = nor [#pns#]  multimap cartesian <(norrows 100  group lis sc #1)
 replace #2 listfilter(listmap(#2,'round(10000*random())'),'random()<0.05')  rename #1 pn1  rename #2 pn2
 split pn2  where pn2 != '' and pn1 != pn2;
/* Create several phenotypes */
create #pheno# = nor [#pns#]
 calc pheno1 if(random()<0.01,'NA',str(random()))
 calc pheno2 mod(pn,3)
 calc pheno3 pheno1
 calc pheno4 decode(pheno2,'0,NA,1,0,2,1')
 calc pheno5 decode(pheno2,'0,9,1,0,2,1');
/* Test if identical columns are treated in same way */
create #t1# = nor [#pheno#]  relremove [#r#] rsymb hakon  throwif pheno3 != pheno1  top 1;
/* Test if identical columns are treated in same way with sepcc option */
create #t2# = nor [#pheno#]  relremove [#r#] rsymb hakon sepcc  throwif pheno3 != pheno1  top 1;
/* Test if identical columns are treated in same way with no option */
create #t3# = nor [#pheno#]  relremove [#r#]  throwif pheno3 != pheno1  top 1;
/* Test if elim counts of identical columns are the same and that sepcc option reduces the number of eliminated rows for casecontrol */
create #t4# = nor [#pheno#]  relremove [#r#] rsymb elim  unpivot 2  where col_value = 'elim'  group gc col_name count
 pivot col_name v pheno1,pheno2,pheno3,pheno4,pheno5  rename (.*)_allcount #{1}
 multimap cartesian <(nor [#pheno#]  relremove [#r#] sepcc rsymb elim  unpivot 2  where col_value = 'elim'  group gc col_name count
 pivot col_name v pheno1,pheno2,pheno3,pheno4,pheno5  rename (.*)_allcount #{1})
 throwif pheno1 != pheno3 or pheno2!=pheno4 or pheno2 != pheno5 or pheno2 != pheno5 or pheno2 < pheno2x or pheno4 < pheno4x or pheno5 < pheno5x;
/* Test if there are relatives after elimination */
create #t5# = nor [#pheno#]  select pn,pheno1  relremove [#r#] rsymb elim  where pheno1 != 'elim' and pheno1 != 'NA'  multimap c pn [#r#]
 multimap c pn2 <(nor [#pheno#]  select pn,pheno1  relremove [#r#] rsymb elim  where pheno1 != 'elim' and pheno1 != 'NA')  throwif 2=2;
/* Test if there are relatives after elimination within either case or ctrl groups */
create #t6# = nor [#pheno#]  select pn,pheno4  relremove [#r#] rsymb elim  where pheno4 != 'elim' and pheno4 != 'NA'  multimap c pn [#r#]
 multimap c pn2 <(nor [#pheno#]  select pn,pheno4  relremove [#r#] rsymb elim  where pheno4 != 'elim' and pheno4 != 'NA')  throwif 2=2;
/* Test if there are relatives after elimination within same casectrl group */
create #t7# = nor [#pheno#]  select pn,pheno4  relremove [#r#] rsymb elim sepcc  where pheno4 != 'elim' and pheno4 != 'NA'
 multimap c pn [#r#]  multimap c pn2 <(nor [#pheno#]  select pn,pheno4  relremove [#r#] rsymb elim sepcc
 where pheno4 != 'elim' and pheno4 != 'NA')  where pheno4 = pheno4x  throwif 2=2;
/* Test if there are fewer eliminations than with a simple method */
create #t8# = nor [#pheno#]  select pn,pheno4  relremove [#r#] rsymb elim  where pheno4 = 'elim'  group count  calc method 'relremove'
 merge <(nor [#pheno#]  select pn,pheno4
 inset c pn b <(nor [#r#]  calc pn pn1+','+pn2  select pn  split pn )  where inset = 1  group count  calc method 'simple')
 pivot method v relremove,simple  throwif relremove_allcount > simple_allcount;
/* Check that samples with the max number of relatives are eliminated and that samples with no relatives are kept */
create #t9# = nor [#pheno#]  select pn  calc pheno random()  relremove [#r#] rsymb elim
 map c pn m 1000 <(nor [#r#]  calc pn pn1+','+pn2  select pn  split pn  group gc pn count  rank allcount o desc)
 throwif rank_allcount > 0 and rank_allcount < 5 and pheno != 'elim' or allcount = 0 and pheno = 'elim'  top 10;
nor [#t9#]  top 1