KING¶
The KING command is used to calculate parameters that can be used in the KING algorithm (see Ani Manichaikul et.al. Bioinformatics, Vol. 26 no. 22 2010, pages 2867–2873).
The leftinput stream is of the same type as for CSVSEL, i.e. it must contain the columns values
and bucket
.
The tagbucket relation is used to define how the values for different tags are stored in the value column in each bucket.
The tag selections specify which tags are selected and the calculation is carried out for each tag (PN) in the first
tag list against each tag in the second tag list, for every variant in the leftinput stream. The leftinput stream
must include a column with the allele frequency, named AF. Such column is most easily added to the horizontal genotype
variant format with a VARJOIN.
The output is a row for each tagpair, using columns named PN1 and PN2. Since the output is an aggregate over the genome is also includes dummy columns for Chrom and Pos with the values “ChrA” and 0, respectively. The important columns are however PN1, PN2, IBS0, XX, tpq, kpq, Nhet, Nhom, NAai, NAaj, count, pi0, phi, and theta. See the code examples and the KING paper for further details related to the KING command.
Usage¶
gor ...  KING tagbucketrelation tagselection1 tagselection2 [gc cols] [(s sep  vs charsize)]
Options¶

Grouping columns other than (chrom,pos). NOTE, one of the must include AF 

Specify separator for the elements in values. Default to comma. 

Use a fixed character size for values, e.g. rather than variable length separated with comma. 

Threshold value to filter output on pi0, e.g. to pass we must have pi0 < value 

Threshold value to filter output on phi, e.g. to pass we must have phi > value 

Threshold value to filter output on theta, e.g. to pass we must have theta > value 
Examples¶
The following query shows an example where the the QUEEN command is used to estimate which sample pairs
to evaluate in a KING2 analysis. The QUEEN command uses two PN lists that are generated by grouping the
pnpairs according to the bucket groups of the genotype matrix into an uppertriagonal matrix.
Only the genotype buckets necessary for evaluating the pairs are read from disk in each job and following the analysis,
the outputpairs are filtered to avoid storing symmetric pairs.
def #buckets# = freezes/ukbb_500k/imputed_comm_vep95/buckets.tsv;
def #hgt# = freezes/ukbb_500k/imputed_comm_vep95/variants.gord;
def #af# = freezes/ukbb_500k/imputed_comm_vep95/metadata/AF.gorz;
create #useaf# = gor #af# /* Only use rare variants for Queen */
 where len(ref)=len(alt) and af < 0.0025
 select 14,af;
create #buckgroups# = nor #buckets#
 select bucket
 distinct
 sort c bucket:n
 rownum
 calc gr div(rownum,4); /* larger groups mean more memory usage but less repeated IO access of genotypes */
create #parts# = nor [#buckgroups#]
 select gr
 distinct
 multimap cartesian <(nor [#buckgroups#]  select gr  distinct)
 where gr <= grx
 rownum
 calc p 1+mod(rownum,2) /* Allow us to avid limit on the number of maximum job per query, e.g. 1000 */
 hide rownum;
create #sharepairs1# = parallel parts <(nor [#parts#]  where p = 1)
<(nor <(gor #hgt# nf ff <(nor #buckets#  inset c bucket <(nor [#buckgroups#]  where gr = '#{col:gr}' or gr = '#{col:grx}'))
 where chrom < 'chrX'  varjoin i <(gor [#useaf#])
 QUEEN #buckets# <(nor #buckets#  inset c bucket <(nor [#buckgroups#]  where gr = '#{col:gr}')  select pn)
<(nor #buckets#  inset c bucket <(nor [#buckgroups#]  where gr = '#{col:grx}')  select pn) vs 1 minSharing 0.02
 multimap c pn1 #buckets#  multimap c pn2 #buckets#  where bucket < bucketx or bucket = bucketx and pn1 < pn2  hide bucket,bucketx )
 select pn1 );
create #sharepairs2# = parallel parts <(nor [#parts#])
<(nor <(gor #hgt# nf ff <(nor #buckets#  inset c bucket <(nor [#buckgroups#]  where gr = '#{col:gr}' or gr = '#{col:grx}'))
 where chrom < 'chrX'  varjoin i <(gor [#useaf#])
 QUEEN #buckets# <(nor #buckets#  inset c bucket <(nor [#buckgroups#]  where gr = '#{col:gr}')  select pn)
<(nor #buckets#  inset c bucket <(nor [#buckgroups#]  where gr = '#{col:grx}')  select pn) vs 1 minSharing 0.02
 multimap c pn1 #buckets#  multimap c pn2 #buckets#  where bucket < bucketx or bucket = bucketx and pn1 < pn2  hide bucket,bucketx )
 select pn1 );
create #usedvars# = pgor #af#
 where len(ref)=len(alt)
 select 14,af
 where abs(af0.5)<0.4
 where random()<1e2
 granno 10000 min ic pos
 where min_pos = pos
 hide min_pos;
create #af# = pgor [#usedvars#]
 select 14,af
 where len(#3)=len(#4)
 select 14,af
 calc tpq 2*af*af*(1.0af)*(1.0af)
 calc kpq 2.0*af*(1af);
create #varsharing# = parallel parts <(nor [#parts#] )
<(nor <(gor #hgt# nf ff <(nor #buckets#  inset c bucket <(nor [#buckgroups#]  where gr = '#{col:gr}' or gr = '#{col:grx}'))
 where len(ref)=len(alt)  until chrom = 'chrX'  varjoin r [#af#]
 KING2 gc af #buckets# <(nor [#sharepairs1#]  merge [#sharepairs2#]
/* unfortunately we don't have partitioned access to this data, hence the filtering with INSET */
 inset c pn1 <(nor #buckets#  inset c bucket <(nor [#buckgroups#]  where gr = '#{col:gr}' ))
 inset c pn2 <(nor #buckets#  inset c bucket <(nor [#buckgroups#]  where gr = '#{col:grx}' ))  select pn1,pn2 )
vs 1 pi0thr 0.76 phithr 0.039 )
 group gc pn1,pn2 sum fc IBS0
 calc pi0 sum_IBS0/float(sum_tpq)
 calc phi 0.5float(sum_XX)/float(4.0*sum_kpq)
 calc theta (sum_Nhet2.0*sum_Nhom)/(sum_NAai+sum_NAaj)
);
def expo($1,$2,$3) = exp(sqr( $1  $2 )/float(2* $3 * $3 )) / $3;
nor [#varsharing#]
 calc monozygotic if(phi > pow(2.0,1.5) and phi < 0.1,1,0)
 calc parent_offspring if(phi > pow(2.0,2.5) and phi < pow(2.0,1.5) and pi0 < 0.1,1,0)
 calc full_sib if(phi > pow(2.0,2.5) and phi < pow(2.0,1.5) and pi0 > 0.1 and pi0 < 0.365,1,0)
 calc second_degree if(phi > pow(2.0,3.5) and phi < pow(2.0,2.5) and pi0 > 0.365 and pi0 < 1.0pow(2,1.5),1,0)
 calc third_degree if(phi > pow(2.0,4.5) and phi < pow(2.0,3.5) and pi0 > 1.0pow(2,1.5) and pi0 < 1.0pow(2,2.5),1,0)
 calc temp if(monozygotic=1,'Monozygotic twin,','')+if(parent_offspring=1,'Parentoffspring,','')+if(full_sib=1,'Full sib,','')+if(second_degree=1,'2nd Degree,','')+if(third_degree=1,'3rd Degree,','')
 calc relationship listfilter(temp,'len(x)>1')
 calc monozygotic_dist ln( expo(pi0,0.0,0.000001) * expo(phi,0.498,0.0013) )
 calc parent_offspring_dist ln( expo(pi0,0.00297,0.0025) * expo(phi,0.248,0.01) )
 calc full_sib_dist ln( expo(pi0,0.248,0.0406) * expo(phi,0.248,0.0187) )
 calc second_degree_dist ln( expo(pi0,0.4954,0.0608) * expo(phi,0.127,0.027) )
 calc third_degree_dist ln( expo(pi0,0.745,0.0428) * expo(phi,0.059,0.02) )
 calc rel 'Monozygotic twin,Parentoffspring,Full sib,2nd Degree,3rd Degree'
 calc relp str(monozygotic_dist)+','+str(parent_offspring_dist)+','+str(full_sib_dist)+','+str(second_degree_dist)+','+str(third_degree_dist)
 calc relmax listnummax(relp)
 calc GaussRel listzipfilter(rel,relp,'abs(float(x)relmax)<0.001')
 select pn1,pn2,relationship,GaussRel,pi0,phi,theta