Pathogenic variants observed with a too high number of homozygous individuals

Dear gnomAD support,

Thank you for all your work on the new v4.0 version!

As specialists of the Bardet-Biedl syndrome (BBS), a paediatric recessive disorder we evaluated a few elements after some checking of the gnomAD v4. As this resource is very important for us, we are glad you put so much efforts into releasing continuous updated versions.

Our first surprise and questioning was raised when we observed homozygous individuals (n=2) for the undisputable not so rare pathogenic variant c.1169T>G, p.Met390Arg in the BBS1 gene. The symptoms of this genotype is ranging from isolated retinitis pigmentosa (blind) <18 years old or a syndromic forms (Bardet-Biedl Syndrome Overview - GeneReviews® - NCBI Bookshelf). We were also surprised that the “age distribution” was not available for the 2 homozygous individuals.

Variant:
11-66526181-T-G

The same occurs for the main pathogenic variant in CFTR with 58 homozygous individuals:
7-117559590-ATCT-A

In line with the identification of individuals homozygous for known pathogenic variants.

On the structural variants side, we searched the gnomAD_SV v4.0 datasets for all frequent DELs completely overlapping a BBS gene with the following criteria:

  • CONTROLS_and_biobanks_AF >= 1%
  • CONTROLS_and_biobanks_AN >= 1000
  • CONTROLS_and_biobanks_N_HOMALT > 5

We are surprised to find 17 frequent DELs covering 100% of a BBS gene (for 11 of the 26 known BBS genes) with a large number of homozygotes (see below).

Considering the 2 datasets (SNV/indel and SV) is there something in particular that explains this? (method, pipeline, genotyping…). We haven’t found any disclaimers that could explain this.

Thank you for any explanation you can provide us,

Best,

Véronique Geoffroy and Jean Muller

Frequent DELs completely overlapping a BBS gene in gnomAD_SV v4.0

3 45823315 45916042 LZTFL1 ENSG00000163818 gnomAD-SV_v3_DEL_chr3_74d6df2b chr3:16955863-54461038
AN=97552
AF=0.181011
N_HOMALT=441
POPMAX_AF=0.40346
controls_and_biobanks_AN=19712
controls_and_biobanks_AF=0.188109
controls_and_biobanks_N_HOMALT=125
HWE p-value= 3.005755e-31 => WARNING
SVLEN=37505175

3 45823315 45916042 LZTFL1 ENSG00000163818 gnomAD-SV_v3_DEL_chr3_9b8d4286 chr3:11110258-145805573
AN=110334
AF=0.035184
N_HOMALT=80
POPMAX_AF=0.117278
controls_and_biobanks_AN=21372
controls_and_biobanks_AF=0.044638
controls_and_biobanks_N_HOMALT=24
HWE p-value= 0.9146657
SVLEN=134695315

3 97764520 97801242 ARL6 ENSG00000113966 gnomAD-SV_v3_DEL_chr3_9b8d4286 chr3:11110258-145805573
AN=110334
AF=0.035184
N_HOMALT=80
POPMAX_AF=0.117278
controls_and_biobanks_AN=21372
controls_and_biobanks_AF=0.044638
controls_and_biobanks_N_HOMALT=24
HWE p-value= 0.9146657
SVLEN=134695315

4 121824439 121870497 BBS7 ENSG00000138686 gnomAD-SV_v3_DEL_chr4_aa7116c9 chr4:73564710-189048534
AN=81956
AF=0.199156
N_HOMALT=732
POPMAX_AF=0.37896
controls_and_biobanks_AN=17152
controls_and_biobanks_AF=0.216243
controls_and_biobanks_N_HOMALT=196
HWE p-value= 9.956235e-23 => WARNING
SVLEN=115483824

4 121824439 121870497 BBS7 ENSG00000138686 gnomAD-SV_v3_DEL_chr4_e8bd9f62 chr4:90675743-168705093
AN=119054
AF=0.019915
N_HOMALT=27
POPMAX_AF=0.05
controls_and_biobanks_AN=21570
controls_and_biobanks_AF=0.03032
controls_and_biobanks_N_HOMALT=14
HWE p-value= 0.6866972
SVLEN=78029350

4 122732701 122744943 BBS12 ENSG00000181004 gnomAD-SV_v3_DEL_chr4_aa7116c9 chr4:73564710-189048534
AN=81956
AF=0.199156
N_HOMALT=732
POPMAX_AF=0.37896
controls_and_biobanks_AN=17152
controls_and_biobanks_AF=0.216243
controls_and_biobanks_N_HOMALT=196
HWE p-value= 9.956235e-23 => WARNING
SVLEN=115483824

4 122732701 122744943 BBS12 ENSG00000181004 gnomAD-SV_v3_DEL_chr4_e8bd9f62 chr4:90675743-168705093
AN=119054
AF=0.019915
N_HOMALT=27
POPMAX_AF=0.05
controls_and_biobanks_AN=21570
controls_and_biobanks_AF=0.03032
controls_and_biobanks_N_HOMALT=14
HWE p-value= 0.6866972
SVLEN=78029350

4 128864920 129093607 SCLT1 ENSG00000151466 gnomAD-SV_v3_DEL_chr4_aa7116c9 chr4:73564710-189048534
AN=81956
AF=0.199156
N_HOMALT=732
POPMAX_AF=0.37896
controls_and_biobanks_AN=17152
controls_and_biobanks_AF=0.216243
controls_and_biobanks_N_HOMALT=196
HWE p-value= 9.956235e-23 => WARNING
SVLEN=115483824

4 128864920 129093607 SCLT1 ENSG00000151466 gnomAD-SV_v3_DEL_chr4_e8bd9f62 chr4:90675743-168705093
AN=119054
AF=0.019915
N_HOMALT=27
POPMAX_AF=0.05
controls_and_biobanks_AN=21570
controls_and_biobanks_AF=0.03032
controls_and_biobanks_N_HOMALT=14
HWE p-value= 0.6866972
SVLEN=78029350

7 33129243 33606068 BBS9 ENSG00000122507 gnomAD-SV_v3_DEL_chr7_f5bb684d chr7:26936101-36716780
AN=124522
AF=0.036861
N_HOMALT=136
POPMAX_AF=0.082487
controls_and_biobanks_AN=24274
controls_and_biobanks_AF=0.053267
controls_and_biobanks_N_HOMALT=58
HWE p-value= 0.03085629
SVLEN=9780679

9 26947038 27062930 IFT74 ENSG00000096872 gnomAD-SV_v3_DEL_chr9_cfa52761 chr9:12951493-85461681
AN=75338
AF=0.23453
N_HOMALT=16
POPMAX_AF=0.386988
controls_and_biobanks_AN=15150
controls_and_biobanks_AF=0.22264
controls_and_biobanks_N_HOMALT=7
HWE p-value= 8.099657e-101 => WARNING
SVLEN=72510188

14 88824152 88881078 TTC8 ENSG00000165533 gnomAD-SV_v3_DEL_chr14_aa7bf92b chr14:56471273-99420950
AN=107748
AF=0.615195
N_HOMALT=13806
POPMAX_AF=0.753061
controls_and_biobanks_AN=20248
controls_and_biobanks_AF=0.632951
controls_and_biobanks_N_HOMALT=3073
HWE p-value= 8.532602e-224 => WARNING
SVLEN=42949677

15 72686178 72738476 BBS4 ENSG00000140463 gnomAD-SV_v3_DEL_chr15_6b875bca chr15:32070447-77618522
AN=123266
AF=0.706659
N_HOMALT=25480
POPMAX_AF=0.740749
controls_and_biobanks_AN=24044
controls_and_biobanks_AF=0.659167
controls_and_biobanks_N_HOMALT=3827
HWE p-value= 0 => WARNING
SVLEN=45548075

15 72686178 72738476 BBS4 ENSG00000140463 gnomAD-SV_v3_DEL_chr15_c4d4814a chr15:56865557-77618525
AN=114796
AF=0.997186
N_HOMALT=57082
POPMAX_AF=1
controls_and_biobanks_AN=20414
controls_and_biobanks_AF=0.995542
controls_and_biobanks_N_HOMALT=10116
HWE p-value= 0.7670995
SVLEN=20752968

15 76347903 76905444 SCAPER ENSG00000140386 gnomAD-SV_v3_DEL_chr15_6b875bca chr15:32070447-77618522
AN=123266
AF=0.706659
N_HOMALT=25480
POPMAX_AF=0.740749
controls_and_biobanks_AN=24044
controls_and_biobanks_AF=0.659167
controls_and_biobanks_N_HOMALT=3827
HWE p-value= 0 => WARNING
SVLEN=45548075

15 76347903 76905444 SCAPER ENSG00000140386 gnomAD-SV_v3_DEL_chr15_c4d4814a chr15:56865557-77618525
AN=114796
AF=0.997186
N_HOMALT=57082
POPMAX_AF=1
controls_and_biobanks_AN=20414
controls_and_biobanks_AF=0.995542
controls_and_biobanks_N_HOMALT=10116
HWE p-value= 0.7670995
SVLEN=20752968

We expect to see pathogenic variants, including dominant and biallelic recessives, as gnomAD is a population database where individuals with rare disease can be found, particularly in biobanks, and not a database of controls for which all diseases are excluded (except early lethal diseases). So the first step is to gather the prevalence of the disorder you are studying from the literature and determine if the allele frequency in gnomAD is above what you’d expect based on the population prevalence of the disorder. Only if the AF is above what you’d expect for prevalence, should you be concerned. That said, not all variants are real in gnomAD and if the AF is above what you’d expect for the prevalence, you should look carefully at quality metrics and the raw read data if available. If it appears high quality from that examination, then you might consider genotyping a set of samples that you have access to and performing orthogonal confirmation to understand if a given genotype is real. Unfortunately we do not have access to the samples from which gnomAD data derives, so can’t facilitate these studies ourselves.