Relationship analysis with King and Queen¶
%%capture
# load the magic extension and imports
%reload_ext nextcode
import pandas as pd
import GOR_query_helper as GQH
%env LOG_QUERY=1
project = "clin-hg19"
%env GOR_API_PROJECT={project}
Relationship analysis¶
An important part of case-study analysis is to perform QC on the samples and understand the relationship between the samples. In order to estimate the relationship, we use the KING algorithm (by Manichaikul A, etal. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010 Nov 15;26(22):2867-73, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3025716/). The KING algorithm uses common variants to provide a robust estimator for kinship as well as fraction of no genetic sharing. With high number of quality variants (e.g. 100k+) it makes it possible to separate relationsips in two-dimensional space between homozygous-twins, parent-child, siblings, second- and third-degree relationships.
However, with limited number of common variants (50k-ish), as is the case in targeted exon sequencing, these estimates are not as robust as one would like to think, e.g. due to imperfect AF estimation and ethnicity mixture. Therefore, in addition to common variant analysis, we also perform rare variant analysis, coined Queen. It looks for potential sharing of variants that could hardly be shared by chance and therefore imply IBD and not IBS as is the case for the sharing in KING.
By combining this approch with KING, we get additional robustness in the relationship estimate. Furthermore, even though GOR has a very efficient low-memory footprint implementation of the KING algorithm, that can be used to find relationship between hundreds of samples in few minutes, it is nevertheless an algorithm that scales as O(#samples^2) x #variants. While QUEEN has similar scaling properties, even though it requires more variants (because they are supposed to be rare) it requires much-much fewer sample comparisons per variants. Therefore, one can use QUEEN to get a rough relationship estimate very fast for all pairs and then only evaluate kinship for significant pairs by using KING2, that uses different memory and sample comparison logic than KING. With QUEEN and KING2, we have for instance evaluated the full relationship of the complete UKBB samples (600k+) in few hours on the GORdb cluster. Similarly, QUEEN, KING and KING2, can easily be setup for parallel processing and incremental processing, e.g. in order to track relationship of all GDX samples and not just within case-studies. For the discussion here, we will not implement the parallel processing, however, those who are interested can see the query template report_builders/card_view/queen.ftl.yml().
Here we will do the following:
- run relationship analysis on all of the 575 samples in clin-hg19
- pick out few unrelated samples and simulate few family relationships
- see if we can identify the family relationship we simulated
Defining allele frequency information for common and rare variants¶
We simly use gnomAD, stored in ref/freq_max.gorz, as the source for our AF information. Then we setup few sources that define SNPs that we want to use for King and Queen. The columns tpq and kpq in ##kingaf## are used by the KING command while the relation ##queenaf## is just for filtering the variant into Queen.
qh = GQH.GOR_Query_Helper()
mydefs = ""
mydefs = qh.add_many("""
create #target_segments# = gor user_data/case_data/coverageRef/hg19_refGene_primaryNM_coding.gorz | segproj;
create #af# = pgor ref/freq_max.gorz | select 1-5 | rename ref_af af
| join -snpseg -i -maxseg 1000000 [#target_segments#];
create ##queenaf## = pgor [#af#]
| where len(#3)=len(#4) and af < 0.05 and af > 0.00001
| select 1-4,af;
create ##kingaf## = pgor [#af#]
| where len(#3)=len(#4) and abs(af-0.5)<0.45
| atmin 100 pos
| select 1-4,af
| calc tpq 2*af*af*(1.0-af)*(1.0-af)
| calc kpq 2.0*af*(1-af);
""")
Converting row-level variants into horizontal genotypes¶
For demonstration purposes, we simply filter the variants in source/var/wes_varcalls.gord by the GDX exon targest and keep them in #allvars#. For large scale analysis we would NOT clone the data like this, however, this makes it easier to change the code when we start simulating few samples later. Also notice that we are therefore NOT using good_cov.gord or segment_cov.gord for the coverage, and simply assume perfect coverage across the genome, as defined with #allcov# below:
mydefs = qh.add_many("""
create #allvars# = pgor source/var/wes_varcalls.gord -s PN | select 1-4,callcopies,pn
| join -snpseg -i -maxseg 1000000 [#target_segments#];
create ##buckets## = nor <(gor -p chr22 [#allvars#]) | group -gc pn | calc bucket 'b1';
create #allcov# = pgor [#allvars#] | top 1 | hide pn | multimap -cartesian [##buckets##] | group chrom -gc pn;
create ##hgt## = pgor [#allvars#]
| join -snpseg -i -maxseg 1000000 [#target_segments#]
| where len(#3)=len(#4)
| calc gt callcopies
| merge <(gor [##kingaf##] | rename #3 Reference | rename #4 Call)
| gtgen -gc #3,#4 [##buckets##] -maxseg 250000000 [#allcov#];
""")
Notice that in #hgt# we merge the ##kingaf## relation to the genotypes. This is to ensure that we store genotypes for all the samples we plan to use in the KING command, irrespectively of how many samples are put into the ##hgt## "mini-freeze", i.e. analogous to #allvars# in the notebook Incremental_Freeze.ipynb. The QUEEN algorithm is however not sensitive to this because it does not use homozygous-reference genotypes.
Calculating Queen and King¶
The statements below execute the rare variant sharing analysis and the kinship analysis based on the common variants.
mydefs = qh.add_many("""
create ##queen## = nor <(gor [##hgt##]
| until chrom = 'chrX' | varjoin -i [##queenaf##]
| queen [##buckets##] <(nor [##buckets##] | select pn) <(nor [##buckets##] | select pn) -vs 1 /* -minSharing 0.01 */)
| select pn1-;
create ##king## = nor <(gor [##hgt##] | where len(#3)=len(#4)
| until chrom = 'chrX' | varjoin -r [##kingaf##]
| king -gc af -maxvars 3000000 [##buckets##] <(nor [##buckets##] | select pn) <(nor [##buckets##] | select pn) -vs 1 )
| select pn1-;
""")
The following code does two types of classification:
- first based on the ranges set forth in the KING paper mentioned above
- second based on Gaussian-mixture modeling of data from the UKBB and established relationships there
Important to keep in mind that the Gaussian-mixture data "training" was based on whole genome array genotype data (120k variants in LD). Thus it may NOT be perfect for our exon targets. We nevertheless keep it as is. In the code below, we swap phi with theta (i.e. a robust estimator of phi), because the code below was originally written using phi. Further below we then swap it back. The kinship we report is then the average of phi and theta.
mydefs = qh.add_many("""
def expo($1,$2,$3) = exp(-sqr( $1 - $2 )/float(2* $3 * $3 )) / $3;
create ##king_final## = nor [##king##]
| calc temp_phi phi
| replace phi theta /* (theta+phi)/2 */
| calc monozygotic if(phi > pow(2.0,-1.5) and pi0 < 0.1,1,0)
| calc parent_offspring if(phi > pow(2.0,-2.5) and phi < pow(2.0,-1.5) and pi0 < 0.1,1,0)
| calc full_sib if(phi > pow(2.0,-2.5) and phi < pow(2.0,-1.5) and pi0 > 0.1 and pi0 < 0.365,1,0)
| calc second_degree if(phi > pow(2.0,-3.5) and phi < pow(2.0,-2.5) and pi0 > 0.365 and pi0 < 1.0-pow(2,-1.5),1,0)
| calc third_degree if(phi > pow(2.0,-4.5) and phi < pow(2.0,-3.5) and pi0 > 1.0-pow(2,-1.5) and pi0 < 1.0-pow(2,-2.5),1,0)
| calc temp if(monozygotic=1,'Monozygotic twin,','')+if(parent_offspring=1,'Parent-offspring,','')+if(full_sib=1,'Full sib,','')+if(second_degree=1,'2nd Degree,','')+if(third_degree=1,'3rd Degree,','')
| calc relationship listfilter(temp,'len(x)>1')
| calc monozygotic_dist ln( expo(pi0,0.0,0.001) * expo(phi,0.498,0.0013) )
| calc parent_offspring_dist ln( expo(pi0,0.00297,0.025) * expo(phi,0.248,0.018) )
| calc full_sib_dist ln( expo(pi0,0.248,0.0406) * expo(phi,0.248,0.0187) )
| calc second_degree_dist ln( expo(pi0,0.4954,0.0608) * expo(phi,0.127,0.027) )
| calc third_degree_dist ln( expo(pi0,0.745,0.0428) * expo(phi,0.059,0.02) )
| calc rel 'Monozygotic twin,Parent-offspring,Full sib,2nd Degree,3rd Degree'
| calc relp str(monozygotic_dist)+','+str(parent_offspring_dist)+','+str(full_sib_dist)+','+str(second_degree_dist)+','+str(third_degree_dist)
| calc relmax listnummax(relp)
| calc GaussRel listzipfilter(rel,relp,'abs(float(x)-relmax)<0.001')
| replace phi temp_phi
| calc kinship (phi+theta)/2
| select pn1,pn2,relationship,GaussRel,pi0,phi,theta,count,kinship
| rename count kingcount;
create ##king_queen## = nor [##king_final##] | multimap -c pn1,pn2 -m 0 [##queen##]
| where pn1!=pn2
| rename pi0 IBD0
| rename avgSharing rareSharing
| rename Count RareCount
| select pn1,pn2,relationship,gaussrel,kinship,phi,theta,ibd0,kingcount,rareSharing,rareCount
| replace kinship-ibd0,raresharing form(float(#rc),4,3);
""")
Finally, we add create a "consensus" relationship based on the standard King classification, the Gaussian mixture estimate, and the rare sharing from QUEEN:
mydefs = qh.add_many("""
create #final# = nor [##king_queen##]
| calc consensus if(relationship=gaussrel and raresharing > 0.05,relationship,
if((relationship='Monozygotic twin' or gaussrel='Monozygotic twin') and raresharing > 0.9,'Monozygotic twin',
if((relationship='Parent-offspring' or gaussrel='Parent-offspring') and IBD0 < 0.75,'Parent-offspring',
if((relationship='Full sib' or gaussrel='Full sib') and IBD0 <= 0.45 and raresharing > 0.2,'Full sib',
if((relationship in ('2nd Degree','Full sib') or gaussrel in ('2nd Degree','Full sib')) and IBD0 <= 0.6 and raresharing > 0.125,'2nd Degree',
if(relationship='' and gaussrel = '3rd Degree' and raresharing > 0.065 and IBD0 < 0.8,'3rd Degree',
if(raresharing <= 0.05,'','')))))));
""")
%%gor
$mydefs
nor [#final#] | group -gc consensus -count | sort -c allcount:n
Query ran in 2.95 sec Query fetched 5 rows in 66.83 sec (total time 69.78 sec)
consensus | allCount | |
---|---|---|
0 | Full sib | 4 |
1 | Parent-offspring | 582 |
2 | 2nd Degree | 6473 |
3 | 3rd Degree | 9397 |
4 | NaN | 325184 |
We notice that in the project we have mostly parent-offspring relationships and few siblings. The 2nd and 3rd degree relationships may or may not be "real".
Simulating relationships¶
We pick few of the samples that don't have any relationship in order to establish "founders". Notice that due to the nature of the Gaussrel classification, it always reports something as the closes relationship.
Removing relative¶
To find the least related individuals, we use the RELREMOVE command and provide it with the pairs that have significant relationship. We use the kinship coefficient as a measure to define that cutoff:
%%gor pns <<
$mydefs
nor [##buckets##] | select pn | calc pheno 'case'
| relremove <(nor [##king_queen##] | where kinship > 0.09 or gaussrel = '2nd Degree')
| where pheno != 'NA'
| group -lis -sc pn
| replace lis_pn listmap(#rc,'squote(x)')
Query ran in 0.33 sec Query fetched 1 rows in 4.38 sec (total time 4.71 sec)
print(pns.at[0,'lis_PN'])
'GDX_2901707','GDX_2914007','GDX_2927939','GDX_2934165','GDX_2939330'
We can now inspect the relationships between these samples that we found after RELREMOVE:
%%gor
$mydefs
def #pns# = 'GDX_2901707','GDX_2914007','GDX_2927939','GDX_2935421','GDX_2943176';
nor [#final#]
| where PN1 in ( #pns# )
| where PN2 in ( #pns# )
| where pn1 != pn2
Query ran in 0.19 sec Query fetched 20 rows in 1.24 sec (total time 1.43 sec)
PN1 | PN2 | relationship | GaussRel | kinship | phi | theta | IBD0 | kingcount | rareSharing | RareCount | consensus | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | GDX_2943176 | GDX_2901707 | NaN | 3rd Degree | 0.013 | 0.189 | -0.162 | 0.795 | 61856 | 0.003 | 2783 | NaN |
1 | GDX_2935421 | GDX_2901707 | NaN | 3rd Degree | 0.066 | 0.253 | -0.122 | 0.635 | 61856 | 0.020 | 1152 | NaN |
2 | GDX_2943176 | GDX_2914007 | NaN | 3rd Degree | 0.028 | 0.196 | -0.140 | 0.768 | 61856 | 0.036 | 2783 | NaN |
3 | GDX_2935421 | GDX_2914007 | NaN | 3rd Degree | 0.058 | 0.246 | -0.129 | 0.673 | 61856 | 0.086 | 1152 | 3rd Degree |
4 | GDX_2943176 | GDX_2927939 | NaN | 3rd Degree | 0.051 | 0.209 | -0.107 | 0.698 | 61856 | 0.035 | 2783 | NaN |
5 | GDX_2935421 | GDX_2927939 | NaN | 3rd Degree | 0.089 | 0.262 | -0.084 | 0.589 | 61856 | 0.101 | 1152 | 3rd Degree |
6 | GDX_2943176 | GDX_2935421 | NaN | 3rd Degree | 0.059 | 0.216 | -0.099 | 0.667 | 61856 | 0.041 | 2783 | NaN |
7 | GDX_2935421 | GDX_2943176 | NaN | 3rd Degree | 0.059 | 0.216 | -0.099 | 0.667 | 61856 | 0.099 | 1152 | 3rd Degree |
8 | GDX_2914007 | GDX_2901707 | NaN | 3rd Degree | 0.068 | 0.254 | -0.118 | 0.630 | 61856 | 0.008 | 1223 | NaN |
9 | GDX_2914007 | GDX_2935421 | NaN | 3rd Degree | 0.058 | 0.246 | -0.129 | 0.673 | 61856 | 0.081 | 1223 | 3rd Degree |
10 | GDX_2914007 | GDX_2927939 | NaN | 3rd Degree | 0.081 | 0.257 | -0.095 | 0.615 | 61856 | 0.092 | 1223 | 3rd Degree |
11 | GDX_2914007 | GDX_2943176 | NaN | 3rd Degree | 0.028 | 0.196 | -0.140 | 0.768 | 61856 | 0.081 | 1223 | 3rd Degree |
12 | GDX_2927939 | GDX_2901707 | NaN | 3rd Degree | 0.059 | 0.247 | -0.128 | 0.650 | 61856 | 0.006 | 1350 | NaN |
13 | GDX_2927939 | GDX_2914007 | NaN | 3rd Degree | 0.081 | 0.257 | -0.095 | 0.615 | 61856 | 0.084 | 1350 | 3rd Degree |
14 | GDX_2927939 | GDX_2935421 | NaN | 3rd Degree | 0.089 | 0.262 | -0.084 | 0.589 | 61856 | 0.086 | 1350 | 3rd Degree |
15 | GDX_2927939 | GDX_2943176 | NaN | 3rd Degree | 0.051 | 0.209 | -0.107 | 0.698 | 61856 | 0.073 | 1350 | 3rd Degree |
16 | GDX_2901707 | GDX_2914007 | NaN | 3rd Degree | 0.068 | 0.254 | -0.118 | 0.630 | 61856 | 0.015 | 667 | NaN |
17 | GDX_2901707 | GDX_2943176 | NaN | 3rd Degree | 0.013 | 0.189 | -0.162 | 0.795 | 61856 | 0.012 | 667 | NaN |
18 | GDX_2901707 | GDX_2935421 | NaN | 3rd Degree | 0.066 | 0.253 | -0.122 | 0.635 | 61856 | 0.034 | 667 | NaN |
19 | GDX_2901707 | GDX_2927939 | NaN | 3rd Degree | 0.059 | 0.247 | -0.128 | 0.650 | 61856 | 0.012 | 667 | NaN |
Generating childrens!¶
We are now ready to create our founders and mate them:
mydefs = qh.add_many("""
def #gen($1) = gor source/var/wes_varcalls.gord -s PN -f $1
| select 1-4,callcopies | join -snpseg -i -maxseg 1000000 [#target_segments#];
create #a1# = #gen('GDX_2901707') | calc pn 'a1';
create #a2# = #gen('GDX_2914007') | calc pn 'a2';
create #a3# = #gen('GDX_2927939') | calc pn 'a3';
create #a4# = #gen('GDX_2935421') | calc pn 'a4';
create #a5# = #gen('GDX_2943176') | calc pn 'a5';
""")
Our mating proceedure simply follows the Mendelian rules per variant and ignores recombination, phase and certainly any romance!
mydefs = qh.add_many("""
def #mate# = rename callcopies c
| hide pn
| pivot -gc 3,4 source -v L,R -e 0
| calc f if(l_c=2,1,if(l_c=1,if(random()<0.5,1,0),0))
| calc m if(r_c=2,1,if(r_c=1,if(random()<0.5,1,0),0))
| calc callcopies f+m
| where callcopies > 0
| select 1-4,callcopies;
create #c1a1a2# = gor <(gor [#a1#] | merge [#a2#] -s) | #mate# | calc pn 'c1a1a2';
create #c2a1a2# = gor <(gor [#a1#] | merge [#a2#] -s) | #mate# | calc pn 'c2a1a2';
create #c1a1a2twin# = gor [#c1a1a2#] | replace pn 'c1a1a2twin';
create #c1a3a4# = gor <(gor [#a3#] | merge [#a4#] -s) | #mate# | calc pn 'c1a3a4';
create #c2a3a4# = gor <(gor [#a3#] | merge [#a4#] -s) | #mate# | calc pn 'c2a3a4';
create #c1c1a1a2c2a1a2# = gor <(gor [#c1a1a2#] | merge [#c2a1a2#] -s) | #mate# | calc pn 'c1c1a1a2c2a1a2';
create #c1a1a4# = gor <(gor [#a1#] | merge [#a4#] -s) | #mate# | calc pn 'c1a1a4';
create #c1a3a2# = gor <(gor [#a3#] | merge [#a2#] -s) | #mate# | calc pn 'c1a3a2';
create #c1c1a1a2c1a3a4# = gor <(gor [#c1a1a2#] | merge [#c1a3a4#] -s) | #mate# | calc pn 'c1c1a1a2c1a3a4';
create #c2c1a1a2c1a3a4# = gor <(gor [#c1a1a2#] | merge [#c1a3a4#] -s) | #mate# | calc pn 'c2c1a1a2c1a3a4';
create #c1c1c1a1a2c1a3a4a5# = gor <(gor [#c1c1a1a2c1a3a4#] | merge [#a5#] -s) | #mate# | calc pn 'c1c1c1a1a2c1a3a4a5';
""")
We define a pedigree based on these mating patterns and redefine #allvars# and coverage:
mydefs = qh.add("""
create #ped# = norrows 1
| calc pfm 'c1a1a2,a1,a2;c2a1a2,a1,a2;c1a1a2twin,a1,a2;c1a3a4,a3,a4;c2a3a4,a3,a4;c1c1a1a2c2a1a2,c1a1a2,c2a1a2;c1a1a4,a1,a4;c1a3a2,a3,a2;c1c1a1a2c1a3a4,c1a1a2,c1a3a4;c2c1a1a2c1a3a4,c1a1a2,c1a3a4;c1c1c1a1a2c1a3a4a5,c1c1a1a2c1a3a4,a5'
| split -s ';' pfm | colsplit pfm 3 x -s ',' | rename x_1 p | rename x_2 f | rename x_3 m | select p,f,m;
""")
mydefs = qh.add_many("""
create ##buckets## = nor [#ped#] | calc pn p+','+f+','+m | select PN | split PN
| distinct | calc bucket 'b1' | select pn,bucket;
create #allvars# = pgor [#a1#] [#a2#] [#a3#] [#a4#] [#a5#] [#c1a1a2#]
[#c2a1a2#] [#c1a1a2twin#] [#c1a3a4#] [#c2a3a4#] [#c1c1a1a2c2a1a2#] [#c1a1a4#]
[#c1a3a2#] [#c1c1a1a2c1a3a4#] [#c2c1a1a2c1a3a4#] [#c1c1c1a1a2c1a3a4a5#];
create #allcov# = pgor [#allvars#] | top 1 | hide pn | multimap -cartesian [##buckets##] | group chrom -gc pn;
""")
We can list the pedigree:
%%gor
$mydefs
nor [#ped#]
Query ran in 0.56 sec Query fetched 11 rows in 0.08 sec (total time 0.64 sec)
p | f | m | |
---|---|---|---|
0 | c1a1a2 | a1 | a2 |
1 | c2a1a2 | a1 | a2 |
2 | c1a1a2twin | a1 | a2 |
3 | c1a3a4 | a3 | a4 |
4 | c2a3a4 | a3 | a4 |
5 | c1c1a1a2c2a1a2 | c1a1a2 | c2a1a2 |
6 | c1a1a4 | a1 | a4 |
7 | c1a3a2 | a3 | a2 |
8 | c1c1a1a2c1a3a4 | c1a1a2 | c1a3a4 |
9 | c2c1a1a2c1a3a4 | c1a1a2 | c1a3a4 |
10 | c1c1c1a1a2c1a3a4a5 | c1c1a1a2c1a3a4 | a5 |
We can also calculate the het-hom ratio for the samples to see if they are realistic:
%%gor
$mydefs
create #temp# = gor [#allvars#] | calc het if(callcopies=1,1,0) | calc hom if(callcopies=2,1,0) | group genome -gc pn -sum -ic het-;
nor [#temp#] | select pn- | calc f sum_het/sum_hom
Query ran in 0.25 sec Query fetched 16 rows in 0.00 sec (total time 0.25 sec)
pn | sum_het | sum_hom | f | |
---|---|---|---|---|
0 | a1 | 12976 | 9659 | 1.343410 |
1 | a2 | 15065 | 9557 | 1.576331 |
2 | a3 | 15848 | 9271 | 1.709416 |
3 | a4 | 15286 | 8785 | 1.740011 |
4 | a5 | 19988 | 8817 | 2.266984 |
5 | c1a1a2 | 16064 | 8488 | 1.892554 |
6 | c1a1a2twin | 16064 | 8488 | 1.892554 |
7 | c1a1a4 | 16228 | 8206 | 1.977577 |
8 | c1a3a2 | 16371 | 9034 | 1.812154 |
9 | c1a3a4 | 16156 | 8784 | 1.839253 |
10 | c1c1a1a2c1a3a4 | 16425 | 8420 | 1.950713 |
11 | c1c1a1a2c2a1a2 | 11409 | 10856 | 1.050940 |
12 | c1c1c1a1a2c1a3a4a5 | 19172 | 8069 | 2.376007 |
13 | c2a1a2 | 16083 | 8556 | 1.879734 |
14 | c2a3a4 | 16093 | 8791 | 1.830622 |
15 | c2c1a1a2c1a3a4 | 16307 | 8465 | 1.926403 |
We see that everything looks normal although there is some variability in the ratio - founders even being outliers.
Using the pedigree as a graph to derive extended relationships¶
Now we will add logic to calculate genetic gender and also graph-based analysis that uses the pedigree to get better feeling for the relationships between our simulated samples.
mydefs = qh.add("""
create #rel# = nor [#ped#] | select p,f | calc r 'f' | rename #1 PN | rename #2 RN
| merge <(nor [#ped#] | select p,m | calc r 'm' | rename #1 PN | rename #2 RN)
| distinct;
create #pns# = nor [##buckets##] | select pn;
create #graph# = nor [#pns#] | dagmap -c pn [#rel#] -dp
| multimap -c dag_node <(nor [#pns#] | dagmap -c pn [#rel#] -dp | replace dag_path listtail(listreverse(dag_path,'->'),'->') | select dag_node,dag_path,pn)
| calc length listsize(dag_path,'->')+listsize(dag_pathx,'->')
| granno -gc pn,pnx -min -ic length
| where length = min_length
| calc rel dag_path+if(len(dag_pathx)>1,'<-','')+replace(dag_pathx,'->','<-')
| select pn,rel,pnx,length | rename pn PN1 | rename pnx PN2
| where pn1 != pn2
| group -gc pn1,pn2 -set -sc rel;
create #gen_gender# = nor <(gor -p chrX:2781480-155701382 [##hgt##]
| csvsel -vs 1 -gc #3,#4 -tag pn -hide 0,3 [##buckets##] [#pns#]
| group genome -gc pn,value -count)
| select pn,value,allcount | group -gc pn,value -sum -ic allcount
| pivot value -v 1,2 -gc pn -e 0
| rename 1_sum_allcount hetCount
| rename 2_sum_allcount homCount
| calc gender if(hetcount/(hetcount+homcount) < 0.5,'male','female')
| select pn,gender;
""")
Finally, do a slight modification to our #final# statement and add extra classification as if we know beforehand the proband (index-case) and the parents. We also add the relationships derived in #graph# and run our full analysis again on our simulated samples:
mydefs = qh.add_many("""
def #index# = 'c1a1a2';
def #parents# = 'a1','a2';
create #final# = nor [##king_queen##]
| calc consensus if(relationship=gaussrel and raresharing > 0.05,relationship,
if((relationship='Monozygotic twin' or gaussrel='Monozygotic twin') and raresharing > 0.9,'Monozygotic twin',
if((relationship='Parent-offspring' or gaussrel='Parent-offspring') and IBD0 < 0.075,'Parent-offspring',
if((relationship='Full sib' or gaussrel='Full sib') and IBD0 <= 0.45 and raresharing > 0.2,'Full sib',
if((relationship in ('2nd Degree','Full sib') or gaussrel in ('2nd Degree','Full sib')) and IBD0 <= 0.6 and raresharing > 0.125,'2nd Degree',
if(relationship='' and gaussrel = '3rd Degree' and raresharing > 0.0625 and IBD0 < 0.8,'3rd Degree',
if(raresharing <= 0.05,'','')))))))
| map -c pn2 -m '' [#gen_gender#]
| calc index_rel if(pn1= #index# and pn2 in ( #parents#) and consensus='Parent-offspring',
if(gender='female','Mother',if(gender='male','Father',consensus)),
if(pn1= #index# and consensus='Parent-offspring','Child',consensus))
| map -c pn1,pn2 -m '' [#graph#]
""")
%%gor
$mydefs
nor [#final#] | select pn1,pn2,consensus,index_rel,set_rel
Query ran in 0.46 sec Query fetched 240 rows in 0.01 sec (total time 0.46 sec)
PN1 | PN2 | consensus | index_rel | set_rel | |
---|---|---|---|---|---|
0 | c2c1a1a2c1a3a4 | a5 | NaN | NaN | NaN |
1 | c2c1a1a2c1a3a4 | c1c1c1a1a2c1a3a4a5 | 2nd Degree | 2nd Degree | c2c1a1a2c1a3a4->c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1... |
2 | c2c1a1a2c1a3a4 | c1c1a1a2c1a3a4 | Full sib | Full sib | c2c1a1a2c1a3a4->c1a1a2<-c1c1a1a2c1a3a4,c2c1a1a... |
3 | c2c1a1a2c1a3a4 | c1a3a2 | 2nd Degree | 2nd Degree | c2c1a1a2c1a3a4->c1a1a2->a2<-c1a3a2,c2c1a1a2c1a... |
4 | c2c1a1a2c1a3a4 | c1a1a4 | 2nd Degree | 2nd Degree | c2c1a1a2c1a3a4->c1a1a2->a1<-c1a1a4,c2c1a1a2c1a... |
... | ... | ... | ... | ... | ... |
235 | c1c1a1a2c2a1a2 | c1a1a2twin | Parent-offspring | Parent-offspring | c1c1a1a2c2a1a2->c1a1a2->a1<-c1a1a2twin,c1c1a1a... |
236 | c1c1a1a2c2a1a2 | c2a1a2 | Parent-offspring | Parent-offspring | c1c1a1a2c2a1a2->c2a1a2 |
237 | c1c1a1a2c2a1a2 | a2 | 2nd Degree | 2nd Degree | c1c1a1a2c2a1a2->c1a1a2->a2,c1c1a1a2c2a1a2->c2a... |
238 | c1c1a1a2c2a1a2 | a1 | 2nd Degree | 2nd Degree | c1c1a1a2c2a1a2->c1a1a2->a1,c1c1a1a2c2a1a2->c2a... |
239 | c1c1a1a2c2a1a2 | c1a1a2 | Parent-offspring | Parent-offspring | c1c1a1a2c2a1a2->c1a1a2 |
240 rows × 5 columns
%%gor
$mydefs
nor [#final#] | select pn1,pn2,consensus,index_rel,set_rel,kinship,theta,ibd0 | where index_rel in ('Father','Mother','Child')
Query ran in 0.23 sec Query fetched 6 rows in 0.01 sec (total time 0.23 sec)
PN1 | PN2 | consensus | index_rel | set_rel | kinship | theta | IBD0 | |
---|---|---|---|---|---|---|---|---|
0 | c1a1a2 | c2c1a1a2c1a3a4 | Parent-offspring | Child | c1a1a2<-c2c1a1a2c1a3a4 | 0.315 | 0.244 | 0.000 |
1 | c1a1a2 | c1c1a1a2c1a3a4 | Parent-offspring | Child | c1a1a2<-c1c1a1a2c1a3a4 | 0.321 | 0.253 | 0.000 |
2 | c1a1a2 | c1c1a1a2c2a1a2 | Parent-offspring | Child | c1a1a2<-c1c1a1a2c2a1a2 | 0.354 | 0.289 | 0.000 |
3 | c1a1a2 | c2a1a2 | Parent-offspring | Child | c1a1a2->a1<-c2a1a2,c1a1a2->a2<-c2a1a2 | 0.338 | 0.276 | 0.085 |
4 | c1a1a2 | a2 | Parent-offspring | Mother | c1a1a2->a2 | 0.314 | 0.239 | 0.000 |
5 | c1a1a2 | a1 | Parent-offspring | Mother | c1a1a2->a1 | 0.310 | 0.233 | 0.000 |
We notice that one of our Sibling sample is incorrectly classified as parent in relation to the index case, hence labeled as Child. In other words, because of surprisingly low IBD0 (lack of sharing) it looks as if a parent and not a full sibling. Gaussrel and standard King relationship agree and hence misclassified by our consensus rule. We see few other examples of this type of misclassification, e.g. c1a1a2 vs c2a1a2 shown below. c1a3a4 vs c2a3a4 is however correctly classified as full-sib.
%%gor dfFinal <<
$mydefs
nor [#final#] | select pn1,pn2,consensus,index_rel,set_rel | where set_rel != '' | sort -c consensus:r,pn1
Query ran in 0.21 sec Query fetched 150 rows in 0.01 sec (total time 0.22 sec)
for i in range(0,len(dfFinal)):
print(f"{i+1}\t{dfFinal.at[i,'PN1']} vs {dfFinal.at[i,'PN2']}\t{dfFinal.at[i,'consensus']}\t{dfFinal.at[i,'set_rel']}")
print("----------------")
1 a1 vs c1a1a4 Parent-offspring a1<-c1a1a4 ---------------- 2 a1 vs c1a1a2twin Parent-offspring a1<-c1a1a2twin ---------------- 3 a1 vs c2a1a2 Parent-offspring a1<-c2a1a2 ---------------- 4 a1 vs c1a1a2 Parent-offspring a1<-c1a1a2 ---------------- 5 a2 vs c1a3a2 Parent-offspring a2<-c1a3a2 ---------------- 6 a2 vs c1a1a2twin Parent-offspring a2<-c1a1a2twin ---------------- 7 a2 vs c2a1a2 Parent-offspring a2<-c2a1a2 ---------------- 8 a2 vs c1a1a2 Parent-offspring a2<-c1a1a2 ---------------- 9 a3 vs c1a3a2 Parent-offspring a3<-c1a3a2 ---------------- 10 a3 vs c2a3a4 Parent-offspring a3<-c2a3a4 ---------------- 11 a3 vs c1a3a4 Parent-offspring a3<-c1a3a4 ---------------- 12 a4 vs c1a1a4 Parent-offspring a4<-c1a1a4 ---------------- 13 a4 vs c2a3a4 Parent-offspring a4<-c2a3a4 ---------------- 14 a4 vs c1a3a4 Parent-offspring a4<-c1a3a4 ---------------- 15 a5 vs c1c1c1a1a2c1a3a4a5 Parent-offspring a5<-c1c1c1a1a2c1a3a4a5 ---------------- 16 c1a1a2 vs c2c1a1a2c1a3a4 Parent-offspring c1a1a2<-c2c1a1a2c1a3a4 ---------------- 17 c1a1a2 vs c1c1a1a2c1a3a4 Parent-offspring c1a1a2<-c1c1a1a2c1a3a4 ---------------- 18 c1a1a2 vs c1c1a1a2c2a1a2 Parent-offspring c1a1a2<-c1c1a1a2c2a1a2 ---------------- 19 c1a1a2 vs c2a1a2 Parent-offspring c1a1a2->a1<-c2a1a2,c1a1a2->a2<-c2a1a2 ---------------- 20 c1a1a2 vs a2 Parent-offspring c1a1a2->a2 ---------------- 21 c1a1a2 vs a1 Parent-offspring c1a1a2->a1 ---------------- 22 c1a1a2twin vs c2c1a1a2c1a3a4 Parent-offspring c1a1a2twin->a1<-c1a1a2<-c2c1a1a2c1a3a4,c1a1a2twin->a2<-c1a1a2<-c2c1a1a2c1a3a4 ---------------- 23 c1a1a2twin vs c1c1a1a2c1a3a4 Parent-offspring c1a1a2twin->a1<-c1a1a2<-c1c1a1a2c1a3a4,c1a1a2twin->a2<-c1a1a2<-c1c1a1a2c1a3a4 ---------------- 24 c1a1a2twin vs c1c1a1a2c2a1a2 Parent-offspring c1a1a2twin->a1<-c1a1a2<-c1c1a1a2c2a1a2,c1a1a2twin->a1<-c2a1a2<-c1c1a1a2c2a1a2,c1a1a2twin->a2<-c1a1a2<-c1c1a1a2c2a1a2,c1a1a2twin->a2<-c2a1a2<-c1c1a1a2c2a1a2 ---------------- 25 c1a1a2twin vs c2a1a2 Parent-offspring c1a1a2twin->a1<-c2a1a2,c1a1a2twin->a2<-c2a1a2 ---------------- 26 c1a1a2twin vs a2 Parent-offspring c1a1a2twin->a2 ---------------- 27 c1a1a2twin vs a1 Parent-offspring c1a1a2twin->a1 ---------------- 28 c1a1a4 vs a4 Parent-offspring c1a1a4->a4 ---------------- 29 c1a1a4 vs a1 Parent-offspring c1a1a4->a1 ---------------- 30 c1a3a2 vs a3 Parent-offspring c1a3a2->a3 ---------------- 31 c1a3a2 vs a2 Parent-offspring c1a3a2->a2 ---------------- 32 c1a3a4 vs c2c1a1a2c1a3a4 Parent-offspring c1a3a4<-c2c1a1a2c1a3a4 ---------------- 33 c1a3a4 vs c1c1a1a2c1a3a4 Parent-offspring c1a3a4<-c1c1a1a2c1a3a4 ---------------- 34 c1a3a4 vs a4 Parent-offspring c1a3a4->a4 ---------------- 35 c1a3a4 vs a3 Parent-offspring c1a3a4->a3 ---------------- 36 c1c1a1a2c1a3a4 vs c1c1c1a1a2c1a3a4a5 Parent-offspring c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5 ---------------- 37 c1c1a1a2c1a3a4 vs c1a3a4 Parent-offspring c1c1a1a2c1a3a4->c1a3a4 ---------------- 38 c1c1a1a2c1a3a4 vs c1a1a2twin Parent-offspring c1c1a1a2c1a3a4->c1a1a2->a1<-c1a1a2twin,c1c1a1a2c1a3a4->c1a1a2->a2<-c1a1a2twin ---------------- 39 c1c1a1a2c1a3a4 vs c1a1a2 Parent-offspring c1c1a1a2c1a3a4->c1a1a2 ---------------- 40 c1c1a1a2c2a1a2 vs c1a1a2twin Parent-offspring c1c1a1a2c2a1a2->c1a1a2->a1<-c1a1a2twin,c1c1a1a2c2a1a2->c1a1a2->a2<-c1a1a2twin,c1c1a1a2c2a1a2->c2a1a2->a1<-c1a1a2twin,c1c1a1a2c2a1a2->c2a1a2->a2<-c1a1a2twin ---------------- 41 c1c1a1a2c2a1a2 vs c2a1a2 Parent-offspring c1c1a1a2c2a1a2->c2a1a2 ---------------- 42 c1c1a1a2c2a1a2 vs c1a1a2 Parent-offspring c1c1a1a2c2a1a2->c1a1a2 ---------------- 43 c1c1c1a1a2c1a3a4a5 vs a5 Parent-offspring c1c1c1a1a2c1a3a4a5->a5 ---------------- 44 c1c1c1a1a2c1a3a4a5 vs c1c1a1a2c1a3a4 Parent-offspring c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4 ---------------- 45 c2a1a2 vs c1c1a1a2c2a1a2 Parent-offspring c2a1a2<-c1c1a1a2c2a1a2 ---------------- 46 c2a1a2 vs c1a1a2twin Parent-offspring c2a1a2->a1<-c1a1a2twin,c2a1a2->a2<-c1a1a2twin ---------------- 47 c2a1a2 vs a2 Parent-offspring c2a1a2->a2 ---------------- 48 c2a1a2 vs a1 Parent-offspring c2a1a2->a1 ---------------- 49 c2a1a2 vs c1a1a2 Parent-offspring c2a1a2->a1<-c1a1a2,c2a1a2->a2<-c1a1a2 ---------------- 50 c2a3a4 vs a4 Parent-offspring c2a3a4->a4 ---------------- 51 c2a3a4 vs a3 Parent-offspring c2a3a4->a3 ---------------- 52 c2c1a1a2c1a3a4 vs c1a3a4 Parent-offspring c2c1a1a2c1a3a4->c1a3a4 ---------------- 53 c2c1a1a2c1a3a4 vs c1a1a2twin Parent-offspring c2c1a1a2c1a3a4->c1a1a2->a1<-c1a1a2twin,c2c1a1a2c1a3a4->c1a1a2->a2<-c1a1a2twin ---------------- 54 c2c1a1a2c1a3a4 vs c1a1a2 Parent-offspring c2c1a1a2c1a3a4->c1a1a2 ---------------- 55 c1a1a2 vs c1a1a2twin Monozygotic twin c1a1a2->a1<-c1a1a2twin,c1a1a2->a2<-c1a1a2twin ---------------- 56 c1a1a2twin vs c1a1a2 Monozygotic twin c1a1a2twin->a1<-c1a1a2,c1a1a2twin->a2<-c1a1a2 ---------------- 57 c1a3a4 vs c2a3a4 Full sib c1a3a4->a3<-c2a3a4,c1a3a4->a4<-c2a3a4 ---------------- 58 c1c1a1a2c1a3a4 vs c2c1a1a2c1a3a4 Full sib c1c1a1a2c1a3a4->c1a1a2<-c2c1a1a2c1a3a4,c1c1a1a2c1a3a4->c1a3a4<-c2c1a1a2c1a3a4 ---------------- 59 c2a3a4 vs c1a3a4 Full sib c2a3a4->a3<-c1a3a4,c2a3a4->a4<-c1a3a4 ---------------- 60 c2c1a1a2c1a3a4 vs c1c1a1a2c1a3a4 Full sib c2c1a1a2c1a3a4->c1a1a2<-c1c1a1a2c1a3a4,c2c1a1a2c1a3a4->c1a3a4<-c1c1a1a2c1a3a4 ---------------- 61 a1 vs c1c1c1a1a2c1a3a4a5 2nd Degree a1<-c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5 ---------------- 62 a1 vs c2c1a1a2c1a3a4 2nd Degree a1<-c1a1a2<-c2c1a1a2c1a3a4 ---------------- 63 a1 vs c1c1a1a2c1a3a4 2nd Degree a1<-c1a1a2<-c1c1a1a2c1a3a4 ---------------- 64 a1 vs c1c1a1a2c2a1a2 2nd Degree a1<-c1a1a2<-c1c1a1a2c2a1a2,a1<-c2a1a2<-c1c1a1a2c2a1a2 ---------------- 65 a2 vs c1c1c1a1a2c1a3a4a5 2nd Degree a2<-c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5 ---------------- 66 a2 vs c2c1a1a2c1a3a4 2nd Degree a2<-c1a1a2<-c2c1a1a2c1a3a4 ---------------- 67 a2 vs c1c1a1a2c1a3a4 2nd Degree a2<-c1a1a2<-c1c1a1a2c1a3a4 ---------------- 68 a2 vs c1c1a1a2c2a1a2 2nd Degree a2<-c1a1a2<-c1c1a1a2c2a1a2,a2<-c2a1a2<-c1c1a1a2c2a1a2 ---------------- 69 a3 vs c1c1c1a1a2c1a3a4a5 2nd Degree a3<-c1a3a4<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5 ---------------- 70 a3 vs c2c1a1a2c1a3a4 2nd Degree a3<-c1a3a4<-c2c1a1a2c1a3a4 ---------------- 71 a3 vs c1c1a1a2c1a3a4 2nd Degree a3<-c1a3a4<-c1c1a1a2c1a3a4 ---------------- 72 a4 vs c1c1c1a1a2c1a3a4a5 2nd Degree a4<-c1a3a4<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5 ---------------- 73 a4 vs c2c1a1a2c1a3a4 2nd Degree a4<-c1a3a4<-c2c1a1a2c1a3a4 ---------------- 74 a4 vs c1c1a1a2c1a3a4 2nd Degree a4<-c1a3a4<-c1c1a1a2c1a3a4 ---------------- 75 c1a1a2 vs c1c1c1a1a2c1a3a4a5 2nd Degree c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5 ---------------- 76 c1a1a2 vs c1a3a2 2nd Degree c1a1a2->a2<-c1a3a2 ---------------- 77 c1a1a2 vs c1a1a4 2nd Degree c1a1a2->a1<-c1a1a4 ---------------- 78 c1a1a2twin vs c1c1c1a1a2c1a3a4a5 2nd Degree c1a1a2twin->a1<-c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5,c1a1a2twin->a2<-c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5 ---------------- 79 c1a1a2twin vs c1a3a2 2nd Degree c1a1a2twin->a2<-c1a3a2 ---------------- 80 c1a1a2twin vs c1a1a4 2nd Degree c1a1a2twin->a1<-c1a1a4 ---------------- 81 c1a1a4 vs c1c1c1a1a2c1a3a4a5 2nd Degree c1a1a4->a1<-c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5,c1a1a4->a4<-c1a3a4<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5 ---------------- 82 c1a1a4 vs c2c1a1a2c1a3a4 2nd Degree c1a1a4->a1<-c1a1a2<-c2c1a1a2c1a3a4,c1a1a4->a4<-c1a3a4<-c2c1a1a2c1a3a4 ---------------- 83 c1a1a4 vs c1c1a1a2c1a3a4 2nd Degree c1a1a4->a1<-c1a1a2<-c1c1a1a2c1a3a4,c1a1a4->a4<-c1a3a4<-c1c1a1a2c1a3a4 ---------------- 84 c1a1a4 vs c1c1a1a2c2a1a2 2nd Degree c1a1a4->a1<-c1a1a2<-c1c1a1a2c2a1a2,c1a1a4->a1<-c2a1a2<-c1c1a1a2c2a1a2 ---------------- 85 c1a1a4 vs c2a3a4 2nd Degree c1a1a4->a4<-c2a3a4 ---------------- 86 c1a1a4 vs c1a3a4 2nd Degree c1a1a4->a4<-c1a3a4 ---------------- 87 c1a1a4 vs c1a1a2twin 2nd Degree c1a1a4->a1<-c1a1a2twin ---------------- 88 c1a1a4 vs c2a1a2 2nd Degree c1a1a4->a1<-c2a1a2 ---------------- 89 c1a1a4 vs c1a1a2 2nd Degree c1a1a4->a1<-c1a1a2 ---------------- 90 c1a3a2 vs c1c1c1a1a2c1a3a4a5 2nd Degree c1a3a2->a2<-c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5,c1a3a2->a3<-c1a3a4<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5 ---------------- 91 c1a3a2 vs c2c1a1a2c1a3a4 2nd Degree c1a3a2->a2<-c1a1a2<-c2c1a1a2c1a3a4,c1a3a2->a3<-c1a3a4<-c2c1a1a2c1a3a4 ---------------- 92 c1a3a2 vs c1c1a1a2c1a3a4 2nd Degree c1a3a2->a2<-c1a1a2<-c1c1a1a2c1a3a4,c1a3a2->a3<-c1a3a4<-c1c1a1a2c1a3a4 ---------------- 93 c1a3a2 vs c1c1a1a2c2a1a2 2nd Degree c1a3a2->a2<-c1a1a2<-c1c1a1a2c2a1a2,c1a3a2->a2<-c2a1a2<-c1c1a1a2c2a1a2 ---------------- 94 c1a3a2 vs c2a3a4 2nd Degree c1a3a2->a3<-c2a3a4 ---------------- 95 c1a3a2 vs c1a3a4 2nd Degree c1a3a2->a3<-c1a3a4 ---------------- 96 c1a3a2 vs c1a1a2twin 2nd Degree c1a3a2->a2<-c1a1a2twin ---------------- 97 c1a3a2 vs c2a1a2 2nd Degree c1a3a2->a2<-c2a1a2 ---------------- 98 c1a3a2 vs c1a1a2 2nd Degree c1a3a2->a2<-c1a1a2 ---------------- 99 c1a3a4 vs c1c1c1a1a2c1a3a4a5 2nd Degree c1a3a4<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5 ---------------- 100 c1a3a4 vs c1a3a2 2nd Degree c1a3a4->a3<-c1a3a2 ---------------- 101 c1a3a4 vs c1a1a4 2nd Degree c1a3a4->a4<-c1a1a4 ---------------- 102 c1c1a1a2c1a3a4 vs c1a3a2 2nd Degree c1c1a1a2c1a3a4->c1a1a2->a2<-c1a3a2,c1c1a1a2c1a3a4->c1a3a4->a3<-c1a3a2 ---------------- 103 c1c1a1a2c1a3a4 vs c1a1a4 2nd Degree c1c1a1a2c1a3a4->c1a1a2->a1<-c1a1a4,c1c1a1a2c1a3a4->c1a3a4->a4<-c1a1a4 ---------------- 104 c1c1a1a2c1a3a4 vs c1c1a1a2c2a1a2 2nd Degree c1c1a1a2c1a3a4->c1a1a2<-c1c1a1a2c2a1a2 ---------------- 105 c1c1a1a2c1a3a4 vs c2a3a4 2nd Degree c1c1a1a2c1a3a4->c1a3a4->a3<-c2a3a4,c1c1a1a2c1a3a4->c1a3a4->a4<-c2a3a4 ---------------- 106 c1c1a1a2c1a3a4 vs a4 2nd Degree c1c1a1a2c1a3a4->c1a3a4->a4 ---------------- 107 c1c1a1a2c1a3a4 vs a3 2nd Degree c1c1a1a2c1a3a4->c1a3a4->a3 ---------------- 108 c1c1a1a2c1a3a4 vs c2a1a2 2nd Degree c1c1a1a2c1a3a4->c1a1a2->a1<-c2a1a2,c1c1a1a2c1a3a4->c1a1a2->a2<-c2a1a2 ---------------- 109 c1c1a1a2c1a3a4 vs a2 2nd Degree c1c1a1a2c1a3a4->c1a1a2->a2 ---------------- 110 c1c1a1a2c1a3a4 vs a1 2nd Degree c1c1a1a2c1a3a4->c1a1a2->a1 ---------------- 111 c1c1a1a2c2a1a2 vs c1c1c1a1a2c1a3a4a5 2nd Degree c1c1a1a2c2a1a2->c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5 ---------------- 112 c1c1a1a2c2a1a2 vs c2c1a1a2c1a3a4 2nd Degree c1c1a1a2c2a1a2->c1a1a2<-c2c1a1a2c1a3a4 ---------------- 113 c1c1a1a2c2a1a2 vs c1c1a1a2c1a3a4 2nd Degree c1c1a1a2c2a1a2->c1a1a2<-c1c1a1a2c1a3a4 ---------------- 114 c1c1a1a2c2a1a2 vs c1a3a2 2nd Degree c1c1a1a2c2a1a2->c1a1a2->a2<-c1a3a2,c1c1a1a2c2a1a2->c2a1a2->a2<-c1a3a2 ---------------- 115 c1c1a1a2c2a1a2 vs c1a1a4 2nd Degree c1c1a1a2c2a1a2->c1a1a2->a1<-c1a1a4,c1c1a1a2c2a1a2->c2a1a2->a1<-c1a1a4 ---------------- 116 c1c1a1a2c2a1a2 vs a2 2nd Degree c1c1a1a2c2a1a2->c1a1a2->a2,c1c1a1a2c2a1a2->c2a1a2->a2 ---------------- 117 c1c1a1a2c2a1a2 vs a1 2nd Degree c1c1a1a2c2a1a2->c1a1a2->a1,c1c1a1a2c2a1a2->c2a1a2->a1 ---------------- 118 c1c1c1a1a2c1a3a4a5 vs c2c1a1a2c1a3a4 2nd Degree c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2<-c2c1a1a2c1a3a4,c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a3a4<-c2c1a1a2c1a3a4 ---------------- 119 c1c1c1a1a2c1a3a4a5 vs c1a3a4 2nd Degree c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a3a4 ---------------- 120 c1c1c1a1a2c1a3a4a5 vs c1a1a2twin 2nd Degree c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2->a1<-c1a1a2twin,c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2->a2<-c1a1a2twin ---------------- 121 c1c1c1a1a2c1a3a4a5 vs c1a1a2 2nd Degree c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2 ---------------- 122 c2a1a2 vs c1c1c1a1a2c1a3a4a5 2nd Degree c2a1a2->a1<-c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5,c2a1a2->a2<-c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5 ---------------- 123 c2a1a2 vs c2c1a1a2c1a3a4 2nd Degree c2a1a2->a1<-c1a1a2<-c2c1a1a2c1a3a4,c2a1a2->a2<-c1a1a2<-c2c1a1a2c1a3a4 ---------------- 124 c2a1a2 vs c1c1a1a2c1a3a4 2nd Degree c2a1a2->a1<-c1a1a2<-c1c1a1a2c1a3a4,c2a1a2->a2<-c1a1a2<-c1c1a1a2c1a3a4 ---------------- 125 c2a1a2 vs c1a3a2 2nd Degree c2a1a2->a2<-c1a3a2 ---------------- 126 c2a1a2 vs c1a1a4 2nd Degree c2a1a2->a1<-c1a1a4 ---------------- 127 c2a3a4 vs c1c1c1a1a2c1a3a4a5 2nd Degree c2a3a4->a3<-c1a3a4<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5,c2a3a4->a4<-c1a3a4<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5 ---------------- 128 c2a3a4 vs c2c1a1a2c1a3a4 2nd Degree c2a3a4->a3<-c1a3a4<-c2c1a1a2c1a3a4,c2a3a4->a4<-c1a3a4<-c2c1a1a2c1a3a4 ---------------- 129 c2a3a4 vs c1c1a1a2c1a3a4 2nd Degree c2a3a4->a3<-c1a3a4<-c1c1a1a2c1a3a4,c2a3a4->a4<-c1a3a4<-c1c1a1a2c1a3a4 ---------------- 130 c2a3a4 vs c1a3a2 2nd Degree c2a3a4->a3<-c1a3a2 ---------------- 131 c2a3a4 vs c1a1a4 2nd Degree c2a3a4->a4<-c1a1a4 ---------------- 132 c2c1a1a2c1a3a4 vs c1c1c1a1a2c1a3a4a5 2nd Degree c2c1a1a2c1a3a4->c1a1a2<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5,c2c1a1a2c1a3a4->c1a3a4<-c1c1a1a2c1a3a4<-c1c1c1a1a2c1a3a4a5 ---------------- 133 c2c1a1a2c1a3a4 vs c1a3a2 2nd Degree c2c1a1a2c1a3a4->c1a1a2->a2<-c1a3a2,c2c1a1a2c1a3a4->c1a3a4->a3<-c1a3a2 ---------------- 134 c2c1a1a2c1a3a4 vs c1a1a4 2nd Degree c2c1a1a2c1a3a4->c1a1a2->a1<-c1a1a4,c2c1a1a2c1a3a4->c1a3a4->a4<-c1a1a4 ---------------- 135 c2c1a1a2c1a3a4 vs c1c1a1a2c2a1a2 2nd Degree c2c1a1a2c1a3a4->c1a1a2<-c1c1a1a2c2a1a2 ---------------- 136 c2c1a1a2c1a3a4 vs c2a3a4 2nd Degree c2c1a1a2c1a3a4->c1a3a4->a3<-c2a3a4,c2c1a1a2c1a3a4->c1a3a4->a4<-c2a3a4 ---------------- 137 c2c1a1a2c1a3a4 vs a4 2nd Degree c2c1a1a2c1a3a4->c1a3a4->a4 ---------------- 138 c2c1a1a2c1a3a4 vs a3 2nd Degree c2c1a1a2c1a3a4->c1a3a4->a3 ---------------- 139 c2c1a1a2c1a3a4 vs c2a1a2 2nd Degree c2c1a1a2c1a3a4->c1a1a2->a1<-c2a1a2,c2c1a1a2c1a3a4->c1a1a2->a2<-c2a1a2 ---------------- 140 c2c1a1a2c1a3a4 vs a2 2nd Degree c2c1a1a2c1a3a4->c1a1a2->a2 ---------------- 141 c2c1a1a2c1a3a4 vs a1 2nd Degree c2c1a1a2c1a3a4->c1a1a2->a1 ---------------- 142 c1c1c1a1a2c1a3a4a5 vs c1a3a2 nan c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2->a2<-c1a3a2,c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a3a4->a3<-c1a3a2 ---------------- 143 c1c1c1a1a2c1a3a4a5 vs c1a1a4 nan c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2->a1<-c1a1a4,c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a3a4->a4<-c1a1a4 ---------------- 144 c1c1c1a1a2c1a3a4a5 vs c1c1a1a2c2a1a2 nan c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2<-c1c1a1a2c2a1a2 ---------------- 145 c1c1c1a1a2c1a3a4a5 vs c2a3a4 nan c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a3a4->a3<-c2a3a4,c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a3a4->a4<-c2a3a4 ---------------- 146 c1c1c1a1a2c1a3a4a5 vs a4 nan c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a3a4->a4 ---------------- 147 c1c1c1a1a2c1a3a4a5 vs a3 nan c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a3a4->a3 ---------------- 148 c1c1c1a1a2c1a3a4a5 vs c2a1a2 nan c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2->a1<-c2a1a2,c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2->a2<-c2a1a2 ---------------- 149 c1c1c1a1a2c1a3a4a5 vs a2 nan c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2->a2 ---------------- 150 c1c1c1a1a2c1a3a4a5 vs a1 nan c1c1c1a1a2c1a3a4a5->c1c1a1a2c1a3a4->c1a1a2->a1 ----------------
We see that there are few empty "nan" relationship, however, most of the non-complicated relationships look ok although we can argue that we have trouble separating 2nd-degree and 3rd-degree relationships. For the purpose of validating short relationships in a case-study this is probably not a big issue. Using more WGS based variants would definitely improve this analysis.