HPO¶
Working with the HPO direct acyclic graph (DAG)¶
First we include boiler plate code to define the project, set environment variables and import libraries.
%%capture
# load the magic extension and imports
%reload_ext nextcode
import pandas as pd
%env LOG_QUERY=1
project = "janssen_sle"
%env GOR_API_PROJECT={project}
The HPO table is stored in the reference data. It includes the code, search terms, associated genes and one or more parent HPO ids (codes).
%%gor
nor ref/disgenes/hpo_ensgenes.tsv | top 5
Query ran in 0.06 sec Query fetched 5 rows in 0.01 sec (total time 0.07 sec)
hpo_code | search_terms | database | name | description | gene_symbols | parent_ids | |
---|---|---|---|---|---|---|---|
0 | HP:0000001 | all | HPO | All | All | NaN | |
1 | HP:0000002 | abnormality of body height A2ML1 AAAS AAAS AA... | HPO | Abnormality of body height | Deviation from the norm of height with respect... | A2ML1;AAAS;AARS;ABAT;ABCA12;ABCB11;ABCB6;ABHD5... | HP:0001507 |
2 | HP:0000003 | multicystic kidney dysplasia HP:0004715 dyspla... | HPO | Multicystic kidney dysplasia | Multicystic dysplasia of the kidney is charact... | ACTG2;AMER1;ARL3;ARL6;ARL6IP6;B3GLCT;B9D1;B9D2... | HP:0000107 |
3 | HP:0000005 | mode of inheritance HP:0001453 HP:0001461 | HPO | Mode of inheritance | The pattern in which a particular genetic trai... | NaN | HP:0000001 |
4 | HP:0000006 | autosomal dominant inheritance HP:0001415 HP:0... | HPO | Autosomal dominant inheritance | A mode of inheritance that is observed for tra... | A2M;A2ML1;A4GALT;AAGAB;AARS;ABCA1;ABCA4;ABCA7;... | HP:0000005 |
We will now create two relations from the hpo_ensgenes.tsv source that we will use in our examples. First is a relation for hpo description and then another for the DAG as a (parent,child) relation and also a (hpo,gene_symbol) relation for mapping between hpo codes and gene_symbols.
gordefs = """
create #hpodescr# = nor -h ref/disgenes/hpo_ensgenes.tsv
| select hpo_code,description;
create #parentchild# = nor -h ref/disgenes/hpo_ensgenes.tsv
| select parent_ids,hpo_code
| split parent_ids -s ';'
| rename parent_ids parent_id;
create #hpogenes# = nor -h ref/disgenes/hpo_ensgenes.tsv
| select hpo_code,gene_symbols;
"""
Let's start by viewing the nature of the DAG and annotate it with descriptions:
%%gor
$gordefs
nor [#parentchild#] | map -c parent_id -m 'missing' [#hpodescr#] | rename description parent_description
| map -c hpo_code -m 'missing' [#hpodescr#] | rename description code_description
| top 10
Query ran in 0.08 sec Query fetched 10 rows in 0.02 sec (total time 0.10 sec)
parent_id | hpo_code | parent_description | code_description | |
---|---|---|---|---|
0 | HP:0000001 | missing | All | |
1 | HP:0001507 | HP:0000002 | Growth abnormality | Deviation from the norm of height with respect... |
2 | HP:0000107 | HP:0000003 | A fluid filled sac in the kidney. Alt_id: HP:0... | Multicystic dysplasia of the kidney is charact... |
3 | HP:0000001 | HP:0000005 | All | The pattern in which a particular genetic trai... |
4 | HP:0000005 | HP:0000006 | The pattern in which a particular genetic trai... | A mode of inheritance that is observed for tra... |
5 | HP:0000005 | HP:0000007 | The pattern in which a particular genetic trai... | A mode of inheritance that is observed for tra... |
6 | HP:0000812 | HP:0000008 | An anomaly of the adnexa, uterus, and vagina (... | An abnormality of the female internal genitalia. |
7 | HP:0010460 | HP:0000008 | Abnormality of the female genital system. | An abnormality of the female internal genitalia. |
8 | HP:0000014 | HP:0000009 | An abnormality of the urinary bladder. | Dysfunction of the urinary bladder. Alt_id: HP... |
9 | HP:0002719 | HP:0000010 | Increased susceptibility to infections. Alt_id... | Repeated infections of the urinary tract. Alt_... |
We can now easily filter to find all the decendats of HP:0030126 by using the special ÌNDAG
functional operator that behave simlarly as column in ( .. )
%%gor
$gordefs
nor [#parentchild#] | map -c parent_id -m 'missing' [#hpodescr#] | rename description parent_description
| map -c hpo_code -m 'missing' [#hpodescr#] | rename description code_description
| where hpo_code indag([#parentchild#],'HP:0030126')
Query ran in 0.07 sec Query fetched 9 rows in 0.11 sec (total time 0.18 sec)
parent_id | hpo_code | parent_description | code_description | |
---|---|---|---|---|
0 | HP:0010784 | HP:0012114 | A tumor (abnormal growth of tissue) of the ute... | A carcinoma of the endometrium, the mucous lin... |
1 | HP:0030126 | HP:0012114 | An anomaly of the inner mucous membrane of the... | A carcinoma of the endometrium, the mucous lin... |
2 | HP:0012888 | HP:0012889 | An anomaly of the neck of the uterus (lower pa... | Abnormal growth of endometrial cells (which ar... |
3 | HP:0030127 | HP:0012889 | The growth of endometrial tissue outside the u... | Abnormal growth of endometrial cells (which ar... |
4 | HP:0030126 | HP:0025636 | An anomaly of the inner mucous membrane of the... | Inflammation of the inner lining of the uterus... |
5 | HP:0031105 | HP:0030126 | Any anomaly of the structure of the uterus | An anomaly of the inner mucous membrane of the... |
6 | HP:0030012 | HP:0030127 | Abnormal female reproductive system physiology | The growth of endometrial tissue outside the u... |
7 | HP:0030126 | HP:0030127 | An anomaly of the inner mucous membrane of the... | The growth of endometrial tissue outside the u... |
8 | HP:0030126 | HP:0040298 | An anomaly of the inner mucous membrane of the... | Hyperplasia of the endometrium |
Another very useful command in DAGMAP
that works with parent-child DAGS in a similar way as the MULTIMAP
command works with regular relations.
%%gor
$gordefs
nor [#parentchild#] | hide parent_id
| map -c hpo_code -m 'missing' [#hpodescr#] | rename description code_description
| where hpo_code = 'HP:0030126'
| dagmap -c hpo_code [#parentchild#] -dp
Query ran in 0.08 sec Query fetched 6 rows in 0.11 sec (total time 0.19 sec)
hpo_code | code_description | DAG_node | DAG_dist | DAG_path | |
---|---|---|---|---|---|
0 | HP:0030126 | An anomaly of the inner mucous membrane of the... | HP:0030126 | 0 | HP:0030126 |
1 | HP:0030126 | An anomaly of the inner mucous membrane of the... | HP:0012114 | 1 | HP:0030126->HP:0012114 |
2 | HP:0030126 | An anomaly of the inner mucous membrane of the... | HP:0030127 | 1 | HP:0030126->HP:0030127 |
3 | HP:0030126 | An anomaly of the inner mucous membrane of the... | HP:0025636 | 1 | HP:0030126->HP:0025636 |
4 | HP:0030126 | An anomaly of the inner mucous membrane of the... | HP:0040298 | 1 | HP:0030126->HP:0040298 |
5 | HP:0030126 | An anomaly of the inner mucous membrane of the... | HP:0012889 | 2 | HP:0030126->HP:0030127->HP:0012889 |
We easily see that all the codes that passed through the filter _"where hpocode indag([#parentchild#],'HP:0030126')" are decendats of HP:0030126.
Similarly, we can see which genes map directly or indirectly to the HPO term HP:0030126.
%%gor
$gordefs
nor [#parentchild#] | hide parent_id
| where hpo_code indag([#parentchild#],'HP:0030126')
| distinct
| map -c hpo_code [#hpodescr#] | rename description code_description
| map -c hpo_code [#hpogenes#]
Query ran in 0.06 sec Query fetched 6 rows in 0.08 sec (total time 0.15 sec)
hpo_code | code_description | gene_symbols | |
---|---|---|---|
0 | HP:0012114 | A carcinoma of the endometrium, the mucous lin... | AKT1;BMPR1A;CDH1;DMPK;GREM1;KLLN;MLH1;MLH3;MSH... |
1 | HP:0012889 | Abnormal growth of endometrial cells (which ar... | NaN |
2 | HP:0025636 | Inflammation of the inner lining of the uterus... | NaN |
3 | HP:0030126 | An anomaly of the inner mucous membrane of the... | AKT1;BMPR1A;CDH1;DMPK;GREM1;KLLN;MLH1;MLH3;MSH... |
4 | HP:0030127 | The growth of endometrial tissue outside the u... | THOC6 |
5 | HP:0040298 | Hyperplasia of the endometrium | NaN |
Finally, we show how one can search for genes based on filtering of HPO terms. The filter is a generic search filter on the HPO description. HPO terms that pass the filter are then used to find all the decendant HPO terms and all the genes associated with them. Notice that we use the SPLIT
command to separate each gene into a separate row. Then we count and annotate with GRANNO
how many genes are associated with each HPO term and order the such that the most specific terms show up firs with each gene.
%%gor dfGene2HPOs <<
$gordefs
def #search_filter# = description ~ '*eye*' and description ~ '*disease*';
/* Search for HPO terms and their child nodes and find how genes associate with them */
nor [#parentchild#]
| select hpo_code | distinct
| map -c hpo_code [#hpodescr#]
| where #search_filter#
| hide description
| dagmap -c hpo_code [#parentchild#]
| select dag_node
| rename dag_node hpo_code
| distinct
| map -c hpo_code [#hpodescr#]
| map -c hpo_code [#hpogenes#]
| split gene_symbols -s ';'
| rename gene_symbols gene_symbol
| granno -gc hpo_code -count
| calc hpos '('+hpo_code+':'+str(allcount)+' '+if(len(description)>50,left(description,50)+'...','')+')'
| sort -c gene_symbol,allcount:n
| select gene_symbol,hpos
| group -gc gene_symbol -lis -sc hpos -s ', '
| where gene_symbol != ''
| sort -c gene_symbol
Query ran in 0.09 sec Query fetched 61 rows in 0.25 sec (total time 0.34 sec)
In order to display the wide columns, we use small Python code to print the results.
for i in range(0,len(dfGene2HPOs)):
print(f"{i+1}\t{dfGene2HPOs.at[i,'gene_symbol']}\t{dfGene2HPOs.at[i,'lis_hpos']}")
print("----------------")
1 APOA2 (HP:0010732:14 Nodular changes affecting the eyelids may have man...), (HP:0001114:14 The presence of xanthomata in the skin of the eyel...) ---------------- 2 APOB (HP:0010732:14 Nodular changes affecting the eyelids may have man...), (HP:0001114:14 The presence of xanthomata in the skin of the eyel...) ---------------- 3 APOE (HP:0010732:14 Nodular changes affecting the eyelids may have man...), (HP:0001114:14 The presence of xanthomata in the skin of the eyel...) ---------------- 4 ARHGEF18 (HP:0011505:16 Cystoid macular edema (CME) is any type of macular...), (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 5 ARL3 (HP:0011505:16 Cystoid macular edema (CME) is any type of macular...), (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 6 ATOH7 (HP:0000667:13 Atrophy of the eyeball with blindness and decrease...) ---------------- 7 BACH2 (HP:0100280:10 A chronic granulomatous inflammatory disease of th...) ---------------- 8 BCOR (HP:0000667:13 Atrophy of the eyeball with blindness and decrease...) ---------------- 9 CDH11 (HP:0000667:13 Atrophy of the eyeball with blindness and decrease...) ---------------- 10 COL18A1 (HP:0000667:13 Atrophy of the eyeball with blindness and decrease...) ---------------- 11 CTLA4 (HP:0100280:10 A chronic granulomatous inflammatory disease of th...) ---------------- 12 CYBC1 (HP:0100280:10 A chronic granulomatous inflammatory disease of th...) ---------------- 13 CYP27A1 (HP:0010732:14 Nodular changes affecting the eyelids may have man...), (HP:0001114:14 The presence of xanthomata in the skin of the eyel...) ---------------- 14 DHDDS (HP:0011505:16 Cystoid macular edema (CME) is any type of macular...), (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 15 EPHX2 (HP:0010732:14 Nodular changes affecting the eyelids may have man...), (HP:0001114:14 The presence of xanthomata in the skin of the eyel...) ---------------- 16 G6PC (HP:0010732:14 Nodular changes affecting the eyelids may have man...), (HP:0001114:14 The presence of xanthomata in the skin of the eyel...) ---------------- 17 GHR (HP:0010732:14 Nodular changes affecting the eyelids may have man...), (HP:0001114:14 The presence of xanthomata in the skin of the eyel...) ---------------- 18 GPC1 (HP:0010732:14 Nodular changes affecting the eyelids may have man...), (HP:0001114:14 The presence of xanthomata in the skin of the eyel...) ---------------- 19 HDAC8 (HP:0000667:13 Atrophy of the eyeball with blindness and decrease...) ---------------- 20 HLA-A (HP:0011505:16 Cystoid macular edema (CME) is any type of macular...), (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 21 IGHM (HP:0100280:10 A chronic granulomatous inflammatory disease of th...) ---------------- 22 IL10RA (HP:0100280:10 A chronic granulomatous inflammatory disease of th...) ---------------- 23 IL6 (HP:0100280:10 A chronic granulomatous inflammatory disease of th...) ---------------- 24 INAVA (HP:0100280:10 A chronic granulomatous inflammatory disease of th...) ---------------- 25 KIAA1549 (HP:0011505:16 Cystoid macular edema (CME) is any type of macular...), (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 26 KMT2A (HP:0000667:13 Atrophy of the eyeball with blindness and decrease...) ---------------- 27 LDLR (HP:0010732:14 Nodular changes affecting the eyelids may have man...), (HP:0001114:14 The presence of xanthomata in the skin of the eyel...) ---------------- 28 LPL (HP:0010732:14 Nodular changes affecting the eyelids may have man...), (HP:0001114:14 The presence of xanthomata in the skin of the eyel...) ---------------- 29 LRP5 (HP:0000667:13 Atrophy of the eyeball with blindness and decrease...) ---------------- 30 MARK3 (HP:0000667:13 Atrophy of the eyeball with blindness and decrease...) ---------------- 31 MEFV (HP:0100280:10 A chronic granulomatous inflammatory disease of th...) ---------------- 32 MFRP (HP:0011505:16 Cystoid macular edema (CME) is any type of macular...), (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 33 NIPBL (HP:0000667:13 Atrophy of the eyeball with blindness and decrease...) ---------------- 34 NOD2 (HP:0100280:10 A chronic granulomatous inflammatory disease of th...), (HP:0011505:16 Cystoid macular edema (CME) is any type of macular...), (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 35 NR2E3 (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 36 NRL (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 37 PCSK9 (HP:0010732:14 Nodular changes affecting the eyelids may have man...), (HP:0001114:14 The presence of xanthomata in the skin of the eyel...) ---------------- 38 PDE6G (HP:0011505:16 Cystoid macular edema (CME) is any type of macular...), (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 39 PIEZO1 (HP:0032106:3 Conjunctival icterus is a condition where there is...) ---------------- 40 POMGNT1 (HP:0011505:16 Cystoid macular edema (CME) is any type of macular...), (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 41 PPP1R17 (HP:0010732:14 Nodular changes affecting the eyelids may have man...), (HP:0001114:14 The presence of xanthomata in the skin of the eyel...) ---------------- 42 PRPF31 (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 43 PRPF8 (HP:0011505:16 Cystoid macular edema (CME) is any type of macular...), (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 44 PRPH2 (HP:0011505:16 Cystoid macular edema (CME) is any type of macular...), (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 45 PSTPIP1 (HP:0100280:10 A chronic granulomatous inflammatory disease of th...) ---------------- 46 RAD21 (HP:0000667:13 Atrophy of the eyeball with blindness and decrease...) ---------------- 47 RDH5 (HP:0011505:16 Cystoid macular edema (CME) is any type of macular...), (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 48 REEP6 (HP:0011505:16 Cystoid macular edema (CME) is any type of macular...), (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 49 RHO (HP:0011505:16 Cystoid macular edema (CME) is any type of macular...), (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 50 RLBP1 (HP:0011505:16 Cystoid macular edema (CME) is any type of macular...), (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 51 RP1L1 (HP:0011505:16 Cystoid macular edema (CME) is any type of macular...), (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 52 RP9 (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 53 SETD5 (HP:0000667:13 Atrophy of the eyeball with blindness and decrease...) ---------------- 54 SLC37A4 (HP:0010732:14 Nodular changes affecting the eyelids may have man...), (HP:0001114:14 The presence of xanthomata in the skin of the eyel...) ---------------- 55 SLCO1B1 (HP:0032106:3 Conjunctival icterus is a condition where there is...) ---------------- 56 SLCO1B3 (HP:0032106:3 Conjunctival icterus is a condition where there is...) ---------------- 57 SMC1A (HP:0000667:13 Atrophy of the eyeball with blindness and decrease...) ---------------- 58 SMC3 (HP:0000667:13 Atrophy of the eyeball with blindness and decrease...) ---------------- 59 TREX1 (HP:0040049:22 Thickening of the retina that takes place due to a...) ---------------- 60 TTPA (HP:0010732:14 Nodular changes affecting the eyelids may have man...), (HP:0001114:14 The presence of xanthomata in the skin of the eyel...) ---------------- 61 VHL (HP:0040049:22 Thickening of the retina that takes place due to a...) ----------------