Frequent pathogenic variants for rare early onset diseases in v4

Hey, thanks for all your work on the new version and for this forum!

THE pathogenic CFTR variant 508Del (7-117559590-ATCT-A) is homozygous in exomes 57 times in v4 (compared to once in v2.1.1).
Of these, 46 are from samples of non-Finnish European ancestry in the non_ukb subset (nhomalt_non_ukb_nfe). Did one contributor provide all these samples and is this a cohort of affected individuals?
Your stats show that certain diseases were inlcuded in v4, but cystic fibrosis is not listed.

Thank you once again for your excellent work.

Hello – thank you for the kind words, and apologies for the delayed response.

We are actively investigating this question to better understand the data sources that are bringing in rare disease patients. If there are other variants or genes of interest aside from Phe508del, let us know. We are curious about the rates of rare disease patient inclusion in our input data and want to investigate across a broad range of early onset rare diseases.


Thanks for your reply.
There are other CFTR variants that we didn’t expect, like SNV:7-117530975-G-A(GRCh38).

And we came across this variant: 14-28767528-GC-G(GRCh38). Here I can imagine that the calls in gnomAD v4 are artefacts. In v2 the variant was marked with a warning tag.

If we find other variants, we’ll let you know.

Hi Katherine,
Here’s another example of a true pathogenic variant with falsely high population frequency due to an artifact (we exchanged emails regarding this issue in June 2021):
2-71511923-T-G (GRCh38)
The variant is currently flagged AS_VQSR in v4.