Relationship analysis with King and Queen¶

Relcheck_KingQueen

In [1]:

%%capture
# load the magic extension and imports
%reload_ext nextcode
import pandas as pd
import GOR_query_helper as GQH
%env LOG_QUERY=1

project = "clin-hg19"
%env GOR_API_PROJECT={project}

Relationship analysis¶

An important part of case-study analysis is to perform QC on the samples and understand the relationship between the samples. In order to estimate the relationship, we use the KING algorithm (by Manichaikul A, etal. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010 Nov 15;26(22):2867-73, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3025716/). The KING algorithm uses common variants to provide a robust estimator for kinship as well as fraction of no genetic sharing. With high number of quality variants (e.g. 100k+) it makes it possible to separate relationsips in two-dimensional space between homozygous-twins, parent-child, siblings, second- and third-degree relationships.

However, with limited number of common variants (50k-ish), as is the case in targeted exon sequencing, these estimates are not as robust as one would like to think, e.g. due to imperfect AF estimation and ethnicity mixture. Therefore, in addition to common variant analysis, we also perform rare variant analysis, coined Queen. It looks for potential sharing of variants that could hardly be shared by chance and therefore imply IBD and not IBS as is the case for the sharing in KING.

By combining this approch with KING, we get additional robustness in the relationship estimate. Furthermore, even though GOR has a very efficient low-memory footprint implementation of the KING algorithm, that can be used to find relationship between hundreds of samples in few minutes, it is nevertheless an algorithm that scales as O(#samples^2) x #variants. While QUEEN has similar scaling properties, even though it requires more variants (because they are supposed to be rare) it requires much-much fewer sample comparisons per variants. Therefore, one can use QUEEN to get a rough relationship estimate very fast for all pairs and then only evaluate kinship for significant pairs by using KING2, that uses different memory and sample comparison logic than KING. With QUEEN and KING2, we have for instance evaluated the full relationship of the complete UKBB samples (600k+) in few hours on the GORdb cluster. Similarly, QUEEN, KING and KING2, can easily be setup for parallel processing and incremental processing, e.g. in order to track relationship of all GDX samples and not just within case-studies. For the discussion here, we will not implement the parallel processing, however, those who are interested can see the query template report_builders/card_view/queen.ftl.yml().

Here we will do the following:

run relationship analysis on all of the 575 samples in clin-hg19
pick out few unrelated samples and simulate few family relationships
see if we can identify the family relationship we simulated

Defining allele frequency information for common and rare variants¶

We simly use gnomAD, stored in ref/freq_max.gorz, as the source for our AF information. Then we setup few sources that define SNPs that we want to use for King and Queen. The columns tpq and kpq in ##kingaf## are used by the KING command while the relation ##queenaf## is just for filtering the variant into Queen.

In [2]:

qh = GQH.GOR_Query_Helper()
mydefs = ""

In [3]:

mydefs = qh.add_many("""
create #target_segments# = gor user_data/case_data/coverageRef/hg19_refGene_primaryNM_coding.gorz | segproj;

create #af# = pgor ref/freq_max.gorz | select 1-5 | rename ref_af af
| join -snpseg -i -maxseg 1000000 [#target_segments#];

create ##queenaf## = pgor [#af#]
| where len(#3)=len(#4) and af < 0.05 and af > 0.00001
| select 1-4,af;

create ##kingaf## = pgor [#af#]
| where len(#3)=len(#4) and abs(af-0.5)<0.45
| atmin 100 pos
| select 1-4,af
| calc tpq 2*af*af*(1.0-af)*(1.0-af)
| calc kpq 2.0*af*(1-af);
""")

Converting row-level variants into horizontal genotypes¶

For demonstration purposes, we simply filter the variants in source/var/wes_varcalls.gord by the GDX exon targest and keep them in #allvars#. For large scale analysis we would NOT clone the data like this, however, this makes it easier to change the code when we start simulating few samples later. Also notice that we are therefore NOT using good_cov.gord or segment_cov.gord for the coverage, and simply assume perfect coverage across the genome, as defined with #allcov# below:

In [4]:

mydefs = qh.add_many("""
create #allvars# = pgor source/var/wes_varcalls.gord -s PN | select 1-4,callcopies,pn
| join -snpseg -i -maxseg 1000000 [#target_segments#];

create ##buckets## = nor <(gor -p chr22 [#allvars#]) | group -gc pn | calc bucket 'b1';

create #allcov# = pgor [#allvars#] | top 1 | hide pn | multimap -cartesian [##buckets##] | group chrom -gc pn;

create ##hgt## = pgor [#allvars#]
| join -snpseg -i -maxseg 1000000 [#target_segments#]
| where len(#3)=len(#4)
| calc gt callcopies
| merge <(gor [##kingaf##] | rename #3 Reference | rename #4 Call)
| gtgen -gc #3,#4 [##buckets##] -maxseg 250000000 [#allcov#];
""")

Notice that in #hgt# we merge the ##kingaf## relation to the genotypes. This is to ensure that we store genotypes for all the samples we plan to use in the KING command, irrespectively of how many samples are put into the ##hgt## "mini-freeze", i.e. analogous to #allvars# in the notebook Incremental_Freeze.ipynb. The QUEEN algorithm is however not sensitive to this because it does not use homozygous-reference genotypes.

Calculating Queen and King¶

The statements below execute the rare variant sharing analysis and the kinship analysis based on the common variants.

In [5]:

mydefs = qh.add_many("""
create ##queen## = nor <(gor [##hgt##] 
| until chrom = 'chrX' | varjoin -i [##queenaf##]
| queen [##buckets##] <(nor [##buckets##] | select pn) <(nor [##buckets##] | select pn) -vs 1 /* -minSharing 0.01 */)
| select pn1-;

create ##king## = nor <(gor [##hgt##] | where len(#3)=len(#4)  
| until chrom = 'chrX'  | varjoin -r [##kingaf##] 
| king -gc af -maxvars 3000000 [##buckets##] <(nor [##buckets##] | select pn) <(nor [##buckets##] | select pn) -vs 1 )
| select pn1-;
""")

The following code does two types of classification:

first based on the ranges set forth in the KING paper mentioned above
second based on Gaussian-mixture modeling of data from the UKBB and established relationships there

Important to keep in mind that the Gaussian-mixture data "training" was based on whole genome array genotype data (120k variants in LD). Thus it may NOT be perfect for our exon targets. We nevertheless keep it as is. In the code below, we swap phi with theta (i.e. a robust estimator of phi), because the code below was originally written using phi. Further below we then swap it back. The kinship we report is then the average of phi and theta.

In [6]:

mydefs = qh.add_many("""
def expo($1,$2,$3) = exp(-sqr( $1 - $2 )/float(2* $3 * $3 )) / $3;

create ##king_final## = nor [##king##]
| calc temp_phi phi 
| replace phi theta /* (theta+phi)/2 */
| calc monozygotic if(phi > pow(2.0,-1.5) and pi0 < 0.1,1,0)
| calc parent_offspring if(phi > pow(2.0,-2.5) and phi < pow(2.0,-1.5) and pi0 < 0.1,1,0)
| calc full_sib if(phi > pow(2.0,-2.5) and phi < pow(2.0,-1.5) and pi0 > 0.1  and pi0 < 0.365,1,0)
| calc second_degree if(phi > pow(2.0,-3.5) and phi < pow(2.0,-2.5) and pi0 > 0.365 and pi0 < 1.0-pow(2,-1.5),1,0)
| calc third_degree if(phi > pow(2.0,-4.5) and phi < pow(2.0,-3.5) and pi0 > 1.0-pow(2,-1.5) and pi0 < 1.0-pow(2,-2.5),1,0)
| calc temp if(monozygotic=1,'Monozygotic twin,','')+if(parent_offspring=1,'Parent-offspring,','')+if(full_sib=1,'Full sib,','')+if(second_degree=1,'2nd Degree,','')+if(third_degree=1,'3rd Degree,','')
| calc relationship listfilter(temp,'len(x)>1')
| calc monozygotic_dist ln( expo(pi0,0.0,0.001) * expo(phi,0.498,0.0013) )
| calc parent_offspring_dist ln( expo(pi0,0.00297,0.025) * expo(phi,0.248,0.018) )
| calc full_sib_dist ln( expo(pi0,0.248,0.0406) * expo(phi,0.248,0.0187) )
| calc second_degree_dist ln( expo(pi0,0.4954,0.0608) * expo(phi,0.127,0.027) )
| calc third_degree_dist ln( expo(pi0,0.745,0.0428) * expo(phi,0.059,0.02) )
| calc rel 'Monozygotic twin,Parent-offspring,Full sib,2nd Degree,3rd Degree'
| calc relp str(monozygotic_dist)+','+str(parent_offspring_dist)+','+str(full_sib_dist)+','+str(second_degree_dist)+','+str(third_degree_dist)
| calc relmax listnummax(relp)
| calc GaussRel listzipfilter(rel,relp,'abs(float(x)-relmax)<0.001')
| replace phi temp_phi
| calc kinship (phi+theta)/2
| select pn1,pn2,relationship,GaussRel,pi0,phi,theta,count,kinship
| rename count kingcount;

create ##king_queen## = nor [##king_final##] | multimap -c pn1,pn2 -m 0 [##queen##]
| where pn1!=pn2
| rename pi0 IBD0
| rename avgSharing rareSharing
| rename Count RareCount
| select pn1,pn2,relationship,gaussrel,kinship,phi,theta,ibd0,kingcount,rareSharing,rareCount 
| replace kinship-ibd0,raresharing form(float(#rc),4,3);
""")

Finally, we add create a "consensus" relationship based on the standard King classification, the Gaussian mixture estimate, and the rare sharing from QUEEN:

In [7]:

mydefs = qh.add_many("""
create #final# = nor [##king_queen##]
| calc consensus if(relationship=gaussrel and raresharing > 0.05,relationship,
                 if((relationship='Monozygotic twin' or gaussrel='Monozygotic twin') and raresharing > 0.9,'Monozygotic twin',
                 if((relationship='Parent-offspring' or gaussrel='Parent-offspring') and IBD0 < 0.75,'Parent-offspring',
                 if((relationship='Full sib' or gaussrel='Full sib') and IBD0 <= 0.45 and raresharing > 0.2,'Full sib',
                 if((relationship in ('2nd Degree','Full sib') or gaussrel in ('2nd Degree','Full sib')) and IBD0 <= 0.6 and raresharing > 0.125,'2nd Degree',
                 if(relationship='' and gaussrel = '3rd Degree' and raresharing > 0.065 and IBD0 < 0.8,'3rd Degree',
                 if(raresharing <= 0.05,'','')))))));
""")

In [8]:

%%gor
$mydefs
nor [#final#] | group -gc consensus -count | sort -c allcount:n

Query ran in 2.95 sec
Query fetched 5 rows in 66.83 sec (total time 69.78 sec)

Out[8]:

	consensus	allCount
0	Full sib	4
1	Parent-offspring	582
2	2nd Degree	6473
3	3rd Degree	9397
4	NaN	325184

We notice that in the project we have mostly parent-offspring relationships and few siblings. The 2nd and 3rd degree relationships may or may not be "real".

Simulating relationships¶

We pick few of the samples that don't have any relationship in order to establish "founders". Notice that due to the nature of the Gaussrel classification, it always reports something as the closes relationship.

Removing relative¶

To find the least related individuals, we use the RELREMOVE command and provide it with the pairs that have significant relationship. We use the kinship coefficient as a measure to define that cutoff:

In [9]:

%%gor pns <<
$mydefs
nor [##buckets##] | select pn | calc pheno 'case'
| relremove <(nor [##king_queen##] | where kinship > 0.09 or gaussrel = '2nd Degree')
| where pheno != 'NA'
| group -lis -sc pn
| replace lis_pn listmap(#rc,'squote(x)')

Query ran in 0.33 sec
Query fetched 1 rows in 4.38 sec (total time 4.71 sec)

In [10]:

print(pns.at[0,'lis_PN'])

'GDX_2901707','GDX_2914007','GDX_2927939','GDX_2934165','GDX_2939330'

We can now inspect the relationships between these samples that we found after RELREMOVE:

In [11]:

%%gor
$mydefs
def #pns# = 'GDX_2901707','GDX_2914007','GDX_2927939','GDX_2935421','GDX_2943176';

nor [#final#]
| where PN1 in ( #pns# )
| where PN2 in ( #pns# ) 
| where pn1 != pn2

Query ran in 0.19 sec
Query fetched 20 rows in 1.24 sec (total time 1.43 sec)

Out[11]:

	PN1	PN2	relationship	GaussRel	kinship	phi	theta	IBD0	kingcount	rareSharing	RareCount	consensus
0	GDX_2943176	GDX_2901707	NaN	3rd Degree	0.013	0.189	-0.162	0.795	61856	0.003	2783	NaN
1	GDX_2935421	GDX_2901707	NaN	3rd Degree	0.066	0.253	-0.122	0.635	61856	0.020	1152	NaN
2	GDX_2943176	GDX_2914007	NaN	3rd Degree	0.028	0.196	-0.140	0.768	61856	0.036	2783	NaN
3	GDX_2935421	GDX_2914007	NaN	3rd Degree	0.058	0.246	-0.129	0.673	61856	0.086	1152	3rd Degree
4	GDX_2943176	GDX_2927939	NaN	3rd Degree	0.051	0.209	-0.107	0.698	61856	0.035	2783	NaN
5	GDX_2935421	GDX_2927939	NaN	3rd Degree	0.089	0.262	-0.084	0.589	61856	0.101	1152	3rd Degree
6	GDX_2943176	GDX_2935421	NaN	3rd Degree	0.059	0.216	-0.099	0.667	61856	0.041	2783	NaN
7	GDX_2935421	GDX_2943176	NaN	3rd Degree	0.059	0.216	-0.099	0.667	61856	0.099	1152	3rd Degree
8	GDX_2914007	GDX_2901707	NaN	3rd Degree	0.068	0.254	-0.118	0.630	61856	0.008	1223	NaN
9	GDX_2914007	GDX_2935421	NaN	3rd Degree	0.058	0.246	-0.129	0.673	61856	0.081	1223	3rd Degree
10	GDX_2914007	GDX_2927939	NaN	3rd Degree	0.081	0.257	-0.095	0.615	61856	0.092	1223	3rd Degree
11	GDX_2914007	GDX_2943176	NaN	3rd Degree	0.028	0.196	-0.140	0.768	61856	0.081	1223	3rd Degree
12	GDX_2927939	GDX_2901707	NaN	3rd Degree	0.059	0.247	-0.128	0.650	61856	0.006	1350	NaN
13	GDX_2927939	GDX_2914007	NaN	3rd Degree	0.081	0.257	-0.095	0.615	61856	0.084	1350	3rd Degree
14	GDX_2927939	GDX_2935421	NaN	3rd Degree	0.089	0.262	-0.084	0.589	61856	0.086	1350	3rd Degree
15	GDX_2927939	GDX_2943176	NaN	3rd Degree	0.051	0.209	-0.107	0.698	61856	0.073	1350	3rd Degree
16	GDX_2901707	GDX_2914007	NaN	3rd Degree	0.068	0.254	-0.118	0.630	61856	0.015	667	NaN
17	GDX_2901707	GDX_2943176	NaN	3rd Degree	0.013	0.189	-0.162	0.795	61856	0.012	667	NaN
18	GDX_2901707	GDX_2935421	NaN	3rd Degree	0.066	0.253	-0.122	0.635	61856	0.034	667	NaN
19	GDX_2901707	GDX_2927939	NaN	3rd Degree	0.059	0.247	-0.128	0.650	61856	0.012	667	NaN

Generating childrens!¶

We are now ready to create our founders and mate them:

In [12]:

mydefs = qh.add_many("""
def #gen($1) = gor source/var/wes_varcalls.gord -s PN -f $1
| select 1-4,callcopies | join -snpseg -i -maxseg 1000000 [#target_segments#];

create #a1# = #gen('GDX_2901707') | calc pn 'a1';
create #a2# = #gen('GDX_2914007') | calc pn 'a2';
create #a3# = #gen('GDX_2927939') | calc pn 'a3';
create #a4# = #gen('GDX_2935421') | calc pn 'a4';
create #a5# = #gen('GDX_2943176') | calc pn 'a5';
""")

Our mating proceedure simply follows the Mendelian rules per variant and ignores recombination, phase and certainly any romance!

In [13]:

mydefs = qh.add_many("""
def #mate# = rename callcopies c
| hide pn 
| pivot -gc 3,4 source -v L,R -e 0 
| calc f if(l_c=2,1,if(l_c=1,if(random()<0.5,1,0),0))
| calc m if(r_c=2,1,if(r_c=1,if(random()<0.5,1,0),0))
| calc callcopies f+m
| where callcopies > 0
| select 1-4,callcopies;

create #c1a1a2# = gor <(gor [#a1#] | merge [#a2#] -s) | #mate# | calc pn 'c1a1a2';
create #c2a1a2# = gor <(gor [#a1#] | merge [#a2#] -s) | #mate# | calc pn 'c2a1a2';
create #c1a1a2twin# = gor [#c1a1a2#] | replace pn 'c1a1a2twin';
create #c1a3a4# = gor <(gor [#a3#] | merge [#a4#] -s) | #mate# | calc pn 'c1a3a4';
create #c2a3a4# = gor <(gor [#a3#] | merge [#a4#] -s) | #mate# | calc pn 'c2a3a4';

create #c1c1a1a2c2a1a2# = gor <(gor [#c1a1a2#] | merge [#c2a1a2#] -s) | #mate# | calc pn 'c1c1a1a2c2a1a2';
create #c1a1a4# = gor <(gor [#a1#] | merge [#a4#] -s) | #mate# | calc pn 'c1a1a4';
create #c1a3a2# = gor <(gor [#a3#] | merge [#a2#] -s) | #mate# | calc pn 'c1a3a2';
create #c1c1a1a2c1a3a4# = gor <(gor [#c1a1a2#] | merge [#c1a3a4#] -s) | #mate# | calc pn 'c1c1a1a2c1a3a4';
create #c2c1a1a2c1a3a4# = gor <(gor [#c1a1a2#] | merge [#c1a3a4#] -s) | #mate# | calc pn 'c2c1a1a2c1a3a4';
create #c1c1c1a1a2c1a3a4a5# = gor <(gor [#c1c1a1a2c1a3a4#] | merge [#a5#] -s) | #mate# | calc pn 'c1c1c1a1a2c1a3a4a5';
""")

We define a pedigree based on these mating patterns and redefine #allvars# and coverage:

In [14]:

mydefs = qh.add("""
create #ped# = norrows 1 
| calc pfm 'c1a1a2,a1,a2;c2a1a2,a1,a2;c1a1a2twin,a1,a2;c1a3a4,a3,a4;c2a3a4,a3,a4;c1c1a1a2c2a1a2,c1a1a2,c2a1a2;c1a1a4,a1,a4;c1a3a2,a3,a2;c1c1a1a2c1a3a4,c1a1a2,c1a3a4;c2c1a1a2c1a3a4,c1a1a2,c1a3a4;c1c1c1a1a2c1a3a4a5,c1c1a1a2c1a3a4,a5' 
| split -s ';' pfm | colsplit pfm 3 x -s ',' | rename x_1 p | rename x_2 f | rename x_3 m | select p,f,m;
""")
mydefs = qh.add_many("""
create ##buckets## = nor [#ped#] | calc pn p+','+f+','+m | select PN  | split PN 
| distinct | calc bucket 'b1' | select pn,bucket;

create #allvars# = pgor [#a1#] [#a2#] [#a3#] [#a4#] [#a5#] [#c1a1a2#]
[#c2a1a2#] [#c1a1a2twin#] [#c1a3a4#] [#c2a3a4#] [#c1c1a1a2c2a1a2#] [#c1a1a4#]
[#c1a3a2#] [#c1c1a1a2c1a3a4#] [#c2c1a1a2c1a3a4#] [#c1c1c1a1a2c1a3a4a5#];

create #allcov# = pgor [#allvars#] | top 1 | hide pn | multimap -cartesian [##buckets##] | group chrom -gc pn;
""")

We can list the pedigree:

In [15]:

%%gor
$mydefs
nor [#ped#]

Query ran in 0.56 sec
Query fetched 11 rows in 0.08 sec (total time 0.64 sec)

Out[15]:

	p	f	m
0	c1a1a2	a1	a2
1	c2a1a2	a1	a2
2	c1a1a2twin	a1	a2
3	c1a3a4	a3	a4
4	c2a3a4	a3	a4
5	c1c1a1a2c2a1a2	c1a1a2	c2a1a2
6	c1a1a4	a1	a4
7	c1a3a2	a3	a2
8	c1c1a1a2c1a3a4	c1a1a2	c1a3a4
9	c2c1a1a2c1a3a4	c1a1a2	c1a3a4
10	c1c1c1a1a2c1a3a4a5	c1c1a1a2c1a3a4	a5

We can also calculate the het-hom ratio for the samples to see if they are realistic:

In [16]:

%%gor
$mydefs
create #temp# = gor [#allvars#] | calc het if(callcopies=1,1,0) | calc hom if(callcopies=2,1,0) | group genome -gc pn -sum -ic het-;
nor [#temp#] | select pn- | calc f sum_het/sum_hom

Query ran in 0.25 sec
Query fetched 16 rows in 0.00 sec (total time 0.25 sec)

Out[16]:

	pn	sum_het	sum_hom	f
0	a1	12976	9659	1.343410
1	a2	15065	9557	1.576331
2	a3	15848	9271	1.709416
3	a4	15286	8785	1.740011
4	a5	19988	8817	2.266984
5	c1a1a2	16064	8488	1.892554
6	c1a1a2twin	16064	8488	1.892554
7	c1a1a4	16228	8206	1.977577
8	c1a3a2	16371	9034	1.812154
9	c1a3a4	16156	8784	1.839253
10	c1c1a1a2c1a3a4	16425	8420	1.950713
11	c1c1a1a2c2a1a2	11409	10856	1.050940
12	c1c1c1a1a2c1a3a4a5	19172	8069	2.376007
13	c2a1a2	16083	8556	1.879734
14	c2a3a4	16093	8791	1.830622
15	c2c1a1a2c1a3a4	16307	8465	1.926403

We see that everything looks normal although there is some variability in the ratio - founders even being outliers.

Using the pedigree as a graph to derive extended relationships¶

Now we will add logic to calculate genetic gender and also graph-based analysis that uses the pedigree to get better feeling for the relationships between our simulated samples.

In [17]:

mydefs = qh.add("""
create #rel# = nor [#ped#] | select p,f | calc r 'f' | rename #1 PN | rename #2 RN
| merge <(nor [#ped#] | select p,m | calc r 'm' | rename #1 PN | rename #2 RN) 
| distinct;

create #pns# = nor [##buckets##] | select pn;

create #graph# = nor [#pns#] | dagmap -c pn [#rel#] -dp
| multimap -c dag_node <(nor [#pns#] | dagmap -c pn [#rel#] -dp | replace dag_path listtail(listreverse(dag_path,'->'),'->') | select dag_node,dag_path,pn)
| calc length listsize(dag_path,'->')+listsize(dag_pathx,'->')
| granno -gc pn,pnx -min -ic length
| where length = min_length
| calc rel dag_path+if(len(dag_pathx)>1,'<-','')+replace(dag_pathx,'->','<-')
| select pn,rel,pnx,length | rename pn PN1 | rename pnx PN2
| where pn1 != pn2
| group -gc pn1,pn2 -set -sc rel;

create #gen_gender# = nor <(gor -p chrX:2781480-155701382 [##hgt##]
| csvsel -vs 1 -gc #3,#4 -tag pn -hide 0,3 [##buckets##] [#pns#]
| group genome -gc pn,value -count)
| select pn,value,allcount | group -gc pn,value -sum -ic allcount
| pivot value -v 1,2 -gc pn -e 0  
| rename 1_sum_allcount hetCount  
| rename 2_sum_allcount homCount  
| calc gender if(hetcount/(hetcount+homcount) < 0.5,'male','female')
| select pn,gender;
""")

Finally, do a slight modification to our #final# statement and add extra classification as if we know beforehand the proband (index-case) and the parents. We also add the relationships derived in #graph# and run our full analysis again on our simulated samples:

In [18]:

mydefs = qh.add_many("""
def #index# = 'c1a1a2';
def #parents# = 'a1','a2';

create #final# = nor [##king_queen##]
| calc consensus if(relationship=gaussrel and raresharing > 0.05,relationship,
                 if((relationship='Monozygotic twin' or gaussrel='Monozygotic twin') and raresharing > 0.9,'Monozygotic twin',
                 if((relationship='Parent-offspring' or gaussrel='Parent-offspring') and IBD0 < 0.075,'Parent-offspring',
                 if((relationship='Full sib' or gaussrel='Full sib') and IBD0 <= 0.45 and raresharing > 0.2,'Full sib',
                 if((relationship in ('2nd Degree','Full sib') or gaussrel in ('2nd Degree','Full sib')) and IBD0 <= 0.6 and raresharing > 0.125,'2nd Degree',
                 if(relationship='' and gaussrel = '3rd Degree' and raresharing > 0.0625 and IBD0 < 0.8,'3rd Degree',
                 if(raresharing <= 0.05,'','')))))))
| map -c pn2 -m '' [#gen_gender#]
| calc index_rel if(pn1= #index# and pn2 in ( #parents#) and consensus='Parent-offspring',
                    if(gender='female','Mother',if(gender='male','Father',consensus)),
                    if(pn1= #index# and consensus='Parent-offspring','Child',consensus))
| map -c pn1,pn2 -m '' [#graph#]
""")

In [19]:

%%gor
$mydefs
nor [#final#] | select pn1,pn2,consensus,index_rel,set_rel

Query ran in 0.46 sec
Query fetched 240 rows in 0.01 sec (total time 0.46 sec)

Out[19]:

	PN1	PN2	consensus	index_rel	set_rel
0	c2c1a1a2c1a3a4	a5	NaN	NaN	NaN
1	c2c1a1a2c1a3a4	c1c1c1a1a2c1a3a4a5	2nd Degree	2nd Degree	c2c1a1a2c1a3a4->c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1...
2	c2c1a1a2c1a3a4	c1c1a1a2c1a3a4	Full sib	Full sib	c2c1a1a2c1a3a4->c1a1a2<-c1c1a1a2c1a3a4,c2c1a1a...
3	c2c1a1a2c1a3a4	c1a3a2	2nd Degree	2nd Degree	c2c1a1a2c1a3a4->c1a1a2->a2<-c1a3a2,c2c1a1a2c1a...
4	c2c1a1a2c1a3a4	c1a1a4	2nd Degree	2nd Degree	c2c1a1a2c1a3a4->c1a1a2->a1<-c1a1a4,c2c1a1a2c1a...
...	...	...	...	...	...
235	c1c1a1a2c2a1a2	c1a1a2twin	Parent-offspring	Parent-offspring	c1c1a1a2c2a1a2->c1a1a2->a1<-c1a1a2twin,c1c1a1a...
236	c1c1a1a2c2a1a2	c2a1a2	Parent-offspring	Parent-offspring	c1c1a1a2c2a1a2->c2a1a2
237	c1c1a1a2c2a1a2	a2	2nd Degree	2nd Degree	c1c1a1a2c2a1a2->c1a1a2->a2,c1c1a1a2c2a1a2->c2a...
238	c1c1a1a2c2a1a2	a1	2nd Degree	2nd Degree	c1c1a1a2c2a1a2->c1a1a2->a1,c1c1a1a2c2a1a2->c2a...
239	c1c1a1a2c2a1a2	c1a1a2	Parent-offspring	Parent-offspring	c1c1a1a2c2a1a2->c1a1a2

240 rows × 5 columns

In [20]:

%%gor
$mydefs
nor [#final#] | select pn1,pn2,consensus,index_rel,set_rel,kinship,theta,ibd0 | where index_rel in ('Father','Mother','Child')

Query ran in 0.23 sec
Query fetched 6 rows in 0.01 sec (total time 0.23 sec)

Out[20]:

	PN1	PN2	consensus	index_rel	set_rel	kinship	theta	IBD0
0	c1a1a2	c2c1a1a2c1a3a4	Parent-offspring	Child	c1a1a2<-c2c1a1a2c1a3a4	0.315	0.244	0.000
1	c1a1a2	c1c1a1a2c1a3a4	Parent-offspring	Child	c1a1a2<-c1c1a1a2c1a3a4	0.321	0.253	0.000
2	c1a1a2	c1c1a1a2c2a1a2	Parent-offspring	Child	c1a1a2<-c1c1a1a2c2a1a2	0.354	0.289	0.000
3	c1a1a2	c2a1a2	Parent-offspring	Child	c1a1a2->a1<-c2a1a2,c1a1a2->a2<-c2a1a2	0.338	0.276	0.085
4	c1a1a2	a2	Parent-offspring	Mother	c1a1a2->a2	0.314	0.239	0.000
5	c1a1a2	a1	Parent-offspring	Mother	c1a1a2->a1	0.310	0.233	0.000

We notice that one of our Sibling sample is incorrectly classified as parent in relation to the index case, hence labeled as Child. In other words, because of surprisingly low IBD0 (lack of sharing) it looks as if a parent and not a full sibling. Gaussrel and standard King relationship agree and hence misclassified by our consensus rule. We see few other examples of this type of misclassification, e.g. c1a1a2 vs c2a1a2 shown below. c1a3a4 vs c2a3a4 is however correctly classified as full-sib.

In [21]:

%%gor dfFinal <<
$mydefs
nor [#final#] | select pn1,pn2,consensus,index_rel,set_rel | where set_rel != '' | sort -c consensus:r,pn1

Query ran in 0.21 sec
Query fetched 150 rows in 0.01 sec (total time 0.22 sec)

In [22]:

for i in range(0,len(dfFinal)):
    print(f"{i+1}\t{dfFinal.at[i,'PN1']} vs {dfFinal.at[i,'PN2']}\t{dfFinal.at[i,'consensus']}\t{dfFinal.at[i,'set_rel']}")
    print("----------------")

1	a1 vs c1a1a4	Parent-offspring	a1<-c1a1a4
----------------
2	a1 vs c1a1a2twin	Parent-offspring	a1<-c1a1a2twin
----------------
3	a1 vs c2a1a2	Parent-offspring	a1<-c2a1a2
----------------
4	a1 vs c1a1a2	Parent-offspring	a1<-c1a1a2
----------------
5	a2 vs c1a3a2	Parent-offspring	a2<-c1a3a2
----------------
6	a2 vs c1a1a2twin	Parent-offspring	a2<-c1a1a2twin
----------------
7	a2 vs c2a1a2	Parent-offspring	a2<-c2a1a2
----------------
8	a2 vs c1a1a2	Parent-offspring	a2<-c1a1a2
----------------
9	a3 vs c1a3a2	Parent-offspring	a3<-c1a3a2
----------------
10	a3 vs c2a3a4	Parent-offspring	a3<-c2a3a4
----------------
11	a3 vs c1a3a4	Parent-offspring	a3<-c1a3a4
----------------
12	a4 vs c1a1a4	Parent-offspring	a4<-c1a1a4
----------------
13	a4 vs c2a3a4	Parent-offspring	a4<-c2a3a4
----------------
14	a4 vs c1a3a4	Parent-offspring	a4<-c1a3a4
----------------
15	a5 vs c1c1c1a1a2c1a3a4a5	Parent-offspring	a5<-c1c1c1a1a2c1a3a4a5
----------------
16	c1a1a2 vs c2c1a1a2c1a3a4	Parent-offspring	c1a1a2<-c2c1a1a2c1a3a4
----------------
17	c1a1a2 vs c1c1a1a2c1a3a4	Parent-offspring	c1a1a2<-c1c1a1a2c1a3a4
----------------
18	c1a1a2 vs c1c1a1a2c2a1a2	Parent-offspring	c1a1a2<-c1c1a1a2c2a1a2
----------------
19	c1a1a2 vs c2a1a2	Parent-offspring	c1a1a2->a1<-c2a1a2,c1a1a2->a2<-c2a1a2
----------------
20	c1a1a2 vs a2	Parent-offspring	c1a1a2->a2
----------------
21	c1a1a2 vs a1	Parent-offspring	c1a1a2->a1
----------------
22	c1a1a2twin vs c2c1a1a2c1a3a4	Parent-offspring	c1a1a2twin->a1<-c1a1a2<-c2c1a1a2c1a3a4,c1a1a2twin->a2<-c1a1a2<-c2c1a1a2c1a3a4
----------------
23	c1a1a2twin vs c1c1a1a2c1a3a4	Parent-offspring	c1a1a2twin->a1<-c1a1a2<-c1c1a1a2c1a3a4,c1a1a2twin->a2<-c1a1a2<-c1c1a1a2c1a3a4
----------------
24	c1a1a2twin vs c1c1a1a2c2a1a2	Parent-offspring	c1a1a2twin->a1<-c1a1a2<-c1c1a1a2c2a1a2,c1a1a2twin->a1<-c2a1a2<-c1c1a1a2c2a1a2,c1a1a2twin->a2<-c1a1a2<-c1c1a1a2c2a1a2,c1a1a2twin->a2<-c2a1a2<-c1c1a1a2c2a1a2
----------------
25	c1a1a2twin vs c2a1a2	Parent-offspring	c1a1a2twin->a1<-c2a1a2,c1a1a2twin->a2<-c2a1a2
----------------
26	c1a1a2twin vs a2	Parent-offspring	c1a1a2twin->a2
----------------
27	c1a1a2twin vs a1	Parent-offspring	c1a1a2twin->a1
----------------
28	c1a1a4 vs a4	Parent-offspring	c1a1a4->a4
----------------
29	c1a1a4 vs a1	Parent-offspring	c1a1a4->a1
----------------
30	c1a3a2 vs a3	Parent-offspring	c1a3a2->a3
----------------
31	c1a3a2 vs a2	Parent-offspring	c1a3a2->a2
----------------
32	c1a3a4 vs c2c1a1a2c1a3a4	Parent-offspring	c1a3a4<-c2c1a1a2c1a3a4
----------------
33	c1a3a4 vs c1c1a1a2c1a3a4	Parent-offspring	c1a3a4<-c1c1a1a2c1a3a4
----------------
34	c1a3a4 vs a4	Parent-offspring	c1a3a4->a4
----------------
35	c1a3a4 vs a3	Parent-offspring	c1a3a4->a3
----------------
36	c1c1a1a2c1a3a4 vs c1c1c1a1a2c1a3a4a5	Parent-offspring	c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5
----------------
37	c1c1a1a2c1a3a4 vs c1a3a4	Parent-offspring	c1c1a1a2c1a3a4->c1a3a4
----------------
38	c1c1a1a2c1a3a4 vs c1a1a2twin	Parent-offspring	c1c1a1a2c1a3a4->c1a1a2->a1<-c1a1a2twin,c1c1a1a2c1a3a4->c1a1a2->a2<-c1a1a2twin
----------------
39	c1c1a1a2c1a3a4 vs c1a1a2	Parent-offspring	c1c1a1a2c1a3a4->c1a1a2
----------------
40	c1c1a1a2c2a1a2 vs c1a1a2twin	Parent-offspring	c1c1a1a2c2a1a2->c1a1a2->a1<-c1a1a2twin,c1c1a1a2c2a1a2->c1a1a2->a2<-c1a1a2twin,c1c1a1a2c2a1a2->c2a1a2->a1<-c1a1a2twin,c1c1a1a2c2a1a2->c2a1a2->a2<-c1a1a2twin
----------------
41	c1c1a1a2c2a1a2 vs c2a1a2	Parent-offspring	c1c1a1a2c2a1a2->c2a1a2
----------------
42	c1c1a1a2c2a1a2 vs c1a1a2	Parent-offspring	c1c1a1a2c2a1a2->c1a1a2
----------------
43	c1c1c1a1a2c1a3a4a5 vs a5	Parent-offspring	c1c1c1a1a2c1a3a4a5->a5
----------------
44	c1c1c1a1a2c1a3a4a5 vs c1c1a1a2c1a3a4	Parent-offspring	c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4
----------------
45	c2a1a2 vs c1c1a1a2c2a1a2	Parent-offspring	c2a1a2<-c1c1a1a2c2a1a2
----------------
46	c2a1a2 vs c1a1a2twin	Parent-offspring	c2a1a2->a1<-c1a1a2twin,c2a1a2->a2<-c1a1a2twin
----------------
47	c2a1a2 vs a2	Parent-offspring	c2a1a2->a2
----------------
48	c2a1a2 vs a1	Parent-offspring	c2a1a2->a1
----------------
49	c2a1a2 vs c1a1a2	Parent-offspring	c2a1a2->a1<-c1a1a2,c2a1a2->a2<-c1a1a2
----------------
50	c2a3a4 vs a4	Parent-offspring	c2a3a4->a4
----------------
51	c2a3a4 vs a3	Parent-offspring	c2a3a4->a3
----------------
52	c2c1a1a2c1a3a4 vs c1a3a4	Parent-offspring	c2c1a1a2c1a3a4->c1a3a4
----------------
53	c2c1a1a2c1a3a4 vs c1a1a2twin	Parent-offspring	c2c1a1a2c1a3a4->c1a1a2->a1<-c1a1a2twin,c2c1a1a2c1a3a4->c1a1a2->a2<-c1a1a2twin
----------------
54	c2c1a1a2c1a3a4 vs c1a1a2	Parent-offspring	c2c1a1a2c1a3a4->c1a1a2
----------------
55	c1a1a2 vs c1a1a2twin	Monozygotic twin	c1a1a2->a1<-c1a1a2twin,c1a1a2->a2<-c1a1a2twin
----------------
56	c1a1a2twin vs c1a1a2	Monozygotic twin	c1a1a2twin->a1<-c1a1a2,c1a1a2twin->a2<-c1a1a2
----------------
57	c1a3a4 vs c2a3a4	Full sib	c1a3a4->a3<-c2a3a4,c1a3a4->a4<-c2a3a4
----------------
58	c1c1a1a2c1a3a4 vs c2c1a1a2c1a3a4	Full sib	c1c1a1a2c1a3a4->c1a1a2<-c2c1a1a2c1a3a4,c1c1a1a2c1a3a4->c1a3a4<-c2c1a1a2c1a3a4
----------------
59	c2a3a4 vs c1a3a4	Full sib	c2a3a4->a3<-c1a3a4,c2a3a4->a4<-c1a3a4
----------------
60	c2c1a1a2c1a3a4 vs c1c1a1a2c1a3a4	Full sib	c2c1a1a2c1a3a4->c1a1a2<-c1c1a1a2c1a3a4,c2c1a1a2c1a3a4->c1a3a4<-c1c1a1a2c1a3a4
----------------
61	a1 vs c1c1c1a1a2c1a3a4a5	2nd Degree	a1<-c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5
----------------
62	a1 vs c2c1a1a2c1a3a4	2nd Degree	a1<-c1a1a2<-c2c1a1a2c1a3a4
----------------
63	a1 vs c1c1a1a2c1a3a4	2nd Degree	a1<-c1a1a2<-c1c1a1a2c1a3a4
----------------
64	a1 vs c1c1a1a2c2a1a2	2nd Degree	a1<-c1a1a2<-c1c1a1a2c2a1a2,a1<-c2a1a2<-c1c1a1a2c2a1a2
----------------
65	a2 vs c1c1c1a1a2c1a3a4a5	2nd Degree	a2<-c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5
----------------
66	a2 vs c2c1a1a2c1a3a4	2nd Degree	a2<-c1a1a2<-c2c1a1a2c1a3a4
----------------
67	a2 vs c1c1a1a2c1a3a4	2nd Degree	a2<-c1a1a2<-c1c1a1a2c1a3a4
----------------
68	a2 vs c1c1a1a2c2a1a2	2nd Degree	a2<-c1a1a2<-c1c1a1a2c2a1a2,a2<-c2a1a2<-c1c1a1a2c2a1a2
----------------
69	a3 vs c1c1c1a1a2c1a3a4a5	2nd Degree	a3<-c1a3a4<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5
----------------
70	a3 vs c2c1a1a2c1a3a4	2nd Degree	a3<-c1a3a4<-c2c1a1a2c1a3a4
----------------
71	a3 vs c1c1a1a2c1a3a4	2nd Degree	a3<-c1a3a4<-c1c1a1a2c1a3a4
----------------
72	a4 vs c1c1c1a1a2c1a3a4a5	2nd Degree	a4<-c1a3a4<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5
----------------
73	a4 vs c2c1a1a2c1a3a4	2nd Degree	a4<-c1a3a4<-c2c1a1a2c1a3a4
----------------
74	a4 vs c1c1a1a2c1a3a4	2nd Degree	a4<-c1a3a4<-c1c1a1a2c1a3a4
----------------
75	c1a1a2 vs c1c1c1a1a2c1a3a4a5	2nd Degree	c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5
----------------
76	c1a1a2 vs c1a3a2	2nd Degree	c1a1a2->a2<-c1a3a2
----------------
77	c1a1a2 vs c1a1a4	2nd Degree	c1a1a2->a1<-c1a1a4
----------------
78	c1a1a2twin vs c1c1c1a1a2c1a3a4a5	2nd Degree	c1a1a2twin->a1<-c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5,c1a1a2twin->a2<-c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5
----------------
79	c1a1a2twin vs c1a3a2	2nd Degree	c1a1a2twin->a2<-c1a3a2
----------------
80	c1a1a2twin vs c1a1a4	2nd Degree	c1a1a2twin->a1<-c1a1a4
----------------
81	c1a1a4 vs c1c1c1a1a2c1a3a4a5	2nd Degree	c1a1a4->a1<-c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5,c1a1a4->a4<-c1a3a4<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5
----------------
82	c1a1a4 vs c2c1a1a2c1a3a4	2nd Degree	c1a1a4->a1<-c1a1a2<-c2c1a1a2c1a3a4,c1a1a4->a4<-c1a3a4<-c2c1a1a2c1a3a4
----------------
83	c1a1a4 vs c1c1a1a2c1a3a4	2nd Degree	c1a1a4->a1<-c1a1a2<-c1c1a1a2c1a3a4,c1a1a4->a4<-c1a3a4<-c1c1a1a2c1a3a4
----------------
84	c1a1a4 vs c1c1a1a2c2a1a2	2nd Degree	c1a1a4->a1<-c1a1a2<-c1c1a1a2c2a1a2,c1a1a4->a1<-c2a1a2<-c1c1a1a2c2a1a2
----------------
85	c1a1a4 vs c2a3a4	2nd Degree	c1a1a4->a4<-c2a3a4
----------------
86	c1a1a4 vs c1a3a4	2nd Degree	c1a1a4->a4<-c1a3a4
----------------
87	c1a1a4 vs c1a1a2twin	2nd Degree	c1a1a4->a1<-c1a1a2twin
----------------
88	c1a1a4 vs c2a1a2	2nd Degree	c1a1a4->a1<-c2a1a2
----------------
89	c1a1a4 vs c1a1a2	2nd Degree	c1a1a4->a1<-c1a1a2
----------------
90	c1a3a2 vs c1c1c1a1a2c1a3a4a5	2nd Degree	c1a3a2->a2<-c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5,c1a3a2->a3<-c1a3a4<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5
----------------
91	c1a3a2 vs c2c1a1a2c1a3a4	2nd Degree	c1a3a2->a2<-c1a1a2<-c2c1a1a2c1a3a4,c1a3a2->a3<-c1a3a4<-c2c1a1a2c1a3a4
----------------
92	c1a3a2 vs c1c1a1a2c1a3a4	2nd Degree	c1a3a2->a2<-c1a1a2<-c1c1a1a2c1a3a4,c1a3a2->a3<-c1a3a4<-c1c1a1a2c1a3a4
----------------
93	c1a3a2 vs c1c1a1a2c2a1a2	2nd Degree	c1a3a2->a2<-c1a1a2<-c1c1a1a2c2a1a2,c1a3a2->a2<-c2a1a2<-c1c1a1a2c2a1a2
----------------
94	c1a3a2 vs c2a3a4	2nd Degree	c1a3a2->a3<-c2a3a4
----------------
95	c1a3a2 vs c1a3a4	2nd Degree	c1a3a2->a3<-c1a3a4
----------------
96	c1a3a2 vs c1a1a2twin	2nd Degree	c1a3a2->a2<-c1a1a2twin
----------------
97	c1a3a2 vs c2a1a2	2nd Degree	c1a3a2->a2<-c2a1a2
----------------
98	c1a3a2 vs c1a1a2	2nd Degree	c1a3a2->a2<-c1a1a2
----------------
99	c1a3a4 vs c1c1c1a1a2c1a3a4a5	2nd Degree	c1a3a4<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5
----------------
100	c1a3a4 vs c1a3a2	2nd Degree	c1a3a4->a3<-c1a3a2
----------------
101	c1a3a4 vs c1a1a4	2nd Degree	c1a3a4->a4<-c1a1a4
----------------
102	c1c1a1a2c1a3a4 vs c1a3a2	2nd Degree	c1c1a1a2c1a3a4->c1a1a2->a2<-c1a3a2,c1c1a1a2c1a3a4->c1a3a4->a3<-c1a3a2
----------------
103	c1c1a1a2c1a3a4 vs c1a1a4	2nd Degree	c1c1a1a2c1a3a4->c1a1a2->a1<-c1a1a4,c1c1a1a2c1a3a4->c1a3a4->a4<-c1a1a4
----------------
104	c1c1a1a2c1a3a4 vs c1c1a1a2c2a1a2	2nd Degree	c1c1a1a2c1a3a4->c1a1a2<-c1c1a1a2c2a1a2
----------------
105	c1c1a1a2c1a3a4 vs c2a3a4	2nd Degree	c1c1a1a2c1a3a4->c1a3a4->a3<-c2a3a4,c1c1a1a2c1a3a4->c1a3a4->a4<-c2a3a4
----------------
106	c1c1a1a2c1a3a4 vs a4	2nd Degree	c1c1a1a2c1a3a4->c1a3a4->a4
----------------
107	c1c1a1a2c1a3a4 vs a3	2nd Degree	c1c1a1a2c1a3a4->c1a3a4->a3
----------------
108	c1c1a1a2c1a3a4 vs c2a1a2	2nd Degree	c1c1a1a2c1a3a4->c1a1a2->a1<-c2a1a2,c1c1a1a2c1a3a4->c1a1a2->a2<-c2a1a2
----------------
109	c1c1a1a2c1a3a4 vs a2	2nd Degree	c1c1a1a2c1a3a4->c1a1a2->a2
----------------
110	c1c1a1a2c1a3a4 vs a1	2nd Degree	c1c1a1a2c1a3a4->c1a1a2->a1
----------------
111	c1c1a1a2c2a1a2 vs c1c1c1a1a2c1a3a4a5	2nd Degree	c1c1a1a2c2a1a2->c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5
----------------
112	c1c1a1a2c2a1a2 vs c2c1a1a2c1a3a4	2nd Degree	c1c1a1a2c2a1a2->c1a1a2<-c2c1a1a2c1a3a4
----------------
113	c1c1a1a2c2a1a2 vs c1c1a1a2c1a3a4	2nd Degree	c1c1a1a2c2a1a2->c1a1a2<-c1c1a1a2c1a3a4
----------------
114	c1c1a1a2c2a1a2 vs c1a3a2	2nd Degree	c1c1a1a2c2a1a2->c1a1a2->a2<-c1a3a2,c1c1a1a2c2a1a2->c2a1a2->a2<-c1a3a2
----------------
115	c1c1a1a2c2a1a2 vs c1a1a4	2nd Degree	c1c1a1a2c2a1a2->c1a1a2->a1<-c1a1a4,c1c1a1a2c2a1a2->c2a1a2->a1<-c1a1a4
----------------
116	c1c1a1a2c2a1a2 vs a2	2nd Degree	c1c1a1a2c2a1a2->c1a1a2->a2,c1c1a1a2c2a1a2->c2a1a2->a2
----------------
117	c1c1a1a2c2a1a2 vs a1	2nd Degree	c1c1a1a2c2a1a2->c1a1a2->a1,c1c1a1a2c2a1a2->c2a1a2->a1
----------------
118	c1c1c1a1a2c1a3a4a5 vs c2c1a1a2c1a3a4	2nd Degree	c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2<-c2c1a1a2c1a3a4,c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a3a4<-c2c1a1a2c1a3a4
----------------
119	c1c1c1a1a2c1a3a4a5 vs c1a3a4	2nd Degree	c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a3a4
----------------
120	c1c1c1a1a2c1a3a4a5 vs c1a1a2twin	2nd Degree	c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2->a1<-c1a1a2twin,c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2->a2<-c1a1a2twin
----------------
121	c1c1c1a1a2c1a3a4a5 vs c1a1a2	2nd Degree	c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2
----------------
122	c2a1a2 vs c1c1c1a1a2c1a3a4a5	2nd Degree	c2a1a2->a1<-c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5,c2a1a2->a2<-c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5
----------------
123	c2a1a2 vs c2c1a1a2c1a3a4	2nd Degree	c2a1a2->a1<-c1a1a2<-c2c1a1a2c1a3a4,c2a1a2->a2<-c1a1a2<-c2c1a1a2c1a3a4
----------------
124	c2a1a2 vs c1c1a1a2c1a3a4	2nd Degree	c2a1a2->a1<-c1a1a2<-c1c1a1a2c1a3a4,c2a1a2->a2<-c1a1a2<-c1c1a1a2c1a3a4
----------------
125	c2a1a2 vs c1a3a2	2nd Degree	c2a1a2->a2<-c1a3a2
----------------
126	c2a1a2 vs c1a1a4	2nd Degree	c2a1a2->a1<-c1a1a4
----------------
127	c2a3a4 vs c1c1c1a1a2c1a3a4a5	2nd Degree	c2a3a4->a3<-c1a3a4<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5,c2a3a4->a4<-c1a3a4<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5
----------------
128	c2a3a4 vs c2c1a1a2c1a3a4	2nd Degree	c2a3a4->a3<-c1a3a4<-c2c1a1a2c1a3a4,c2a3a4->a4<-c1a3a4<-c2c1a1a2c1a3a4
----------------
129	c2a3a4 vs c1c1a1a2c1a3a4	2nd Degree	c2a3a4->a3<-c1a3a4<-c1c1a1a2c1a3a4,c2a3a4->a4<-c1a3a4<-c1c1a1a2c1a3a4
----------------
130	c2a3a4 vs c1a3a2	2nd Degree	c2a3a4->a3<-c1a3a2
----------------
131	c2a3a4 vs c1a1a4	2nd Degree	c2a3a4->a4<-c1a1a4
----------------
132	c2c1a1a2c1a3a4 vs c1c1c1a1a2c1a3a4a5	2nd Degree	c2c1a1a2c1a3a4->c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5,c2c1a1a2c1a3a4->c1a3a4<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5
----------------
133	c2c1a1a2c1a3a4 vs c1a3a2	2nd Degree	c2c1a1a2c1a3a4->c1a1a2->a2<-c1a3a2,c2c1a1a2c1a3a4->c1a3a4->a3<-c1a3a2
----------------
134	c2c1a1a2c1a3a4 vs c1a1a4	2nd Degree	c2c1a1a2c1a3a4->c1a1a2->a1<-c1a1a4,c2c1a1a2c1a3a4->c1a3a4->a4<-c1a1a4
----------------
135	c2c1a1a2c1a3a4 vs c1c1a1a2c2a1a2	2nd Degree	c2c1a1a2c1a3a4->c1a1a2<-c1c1a1a2c2a1a2
----------------
136	c2c1a1a2c1a3a4 vs c2a3a4	2nd Degree	c2c1a1a2c1a3a4->c1a3a4->a3<-c2a3a4,c2c1a1a2c1a3a4->c1a3a4->a4<-c2a3a4
----------------
137	c2c1a1a2c1a3a4 vs a4	2nd Degree	c2c1a1a2c1a3a4->c1a3a4->a4
----------------
138	c2c1a1a2c1a3a4 vs a3	2nd Degree	c2c1a1a2c1a3a4->c1a3a4->a3
----------------
139	c2c1a1a2c1a3a4 vs c2a1a2	2nd Degree	c2c1a1a2c1a3a4->c1a1a2->a1<-c2a1a2,c2c1a1a2c1a3a4->c1a1a2->a2<-c2a1a2
----------------
140	c2c1a1a2c1a3a4 vs a2	2nd Degree	c2c1a1a2c1a3a4->c1a1a2->a2
----------------
141	c2c1a1a2c1a3a4 vs a1	2nd Degree	c2c1a1a2c1a3a4->c1a1a2->a1
----------------
142	c1c1c1a1a2c1a3a4a5 vs c1a3a2	nan	c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2->a2<-c1a3a2,c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a3a4->a3<-c1a3a2
----------------
143	c1c1c1a1a2c1a3a4a5 vs c1a1a4	nan	c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2->a1<-c1a1a4,c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a3a4->a4<-c1a1a4
----------------
144	c1c1c1a1a2c1a3a4a5 vs c1c1a1a2c2a1a2	nan	c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2<-c1c1a1a2c2a1a2
----------------
145	c1c1c1a1a2c1a3a4a5 vs c2a3a4	nan	c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a3a4->a3<-c2a3a4,c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a3a4->a4<-c2a3a4
----------------
146	c1c1c1a1a2c1a3a4a5 vs a4	nan	c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a3a4->a4
----------------
147	c1c1c1a1a2c1a3a4a5 vs a3	nan	c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a3a4->a3
----------------
148	c1c1c1a1a2c1a3a4a5 vs c2a1a2	nan	c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2->a1<-c2a1a2,c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2->a2<-c2a1a2
----------------
149	c1c1c1a1a2c1a3a4a5 vs a2	nan	c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2->a2
----------------
150	c1c1c1a1a2c1a3a4a5 vs a1	nan	c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2->a1
----------------

We see that there are few empty "nan" relationship, however, most of the non-complicated relationships look ok although we can argue that we have trouble separating 2nd-degree and 3rd-degree relationships. For the purpose of validating short relationships in a case-study this is probably not a big issue. Using more WGS based variants would definitely improve this analysis.

In [ ]: