Discrepancy between gnomAD and UK Biobank

Hello, I am a researcher at Nationwide Children’s in Columbus, OH. My groups works on genetic therapies to treat cystic fibrosis (CF). We have been trying to understand CF prevalence and CFTR variants in different ancestral groups using gnomAD. We were comparing our analysis against a recent publication describing CFTR variants in different ancestral groups from the UK biobank (Diversity of CFTR variants across ancestries characterized using 454,727 UK biobank whole exome sequences | Genome Medicine | Full Text). We noticed two discrepancies. We are trying to find an explanation since the UK biobank is listed as one of the sources for the gnomAD4 dataset.

Firstly, the published UK biobank paper described a variant A46D. They report 9 alleles in their African group with this variant. But this variant does not appear in gnomAD 4. I was under the impression that exomes from the UK Biobank were part of gnomAD 4. But is it only a partial set that has been included?

Secondly, they report a variant V520F as the most common one in East Asians (36 alleles). The variant is present in gnomAD (43 alleles) but 38/43 alleles are in the European group in gnomAD and the other 3 are in the Remaining category. The variant has previously not been associated with East Asians. It is reasonable to assume that the methods used to assign ancestries to samples are different between gnomAD and the UK Biobank? Or are we missing something?

Any explanation you may be able to offer would greatly help our efforts. Thanks!

@svaidy Could you confirm the GRCh38 coordinates for both variants are like this?
CFTR V520F, 7-117559629-G-T
CFTR A46D, 7-117668786-C-A

We had for the first one:

Source AC_nfe AN_nfe
gnomAD 39 1,179,464
gnomAD_non_UKB 11 417,862
gnomAD_UKB 28 761,602

This variant on UKB browser has AC=36, AN=917,554 in NFE, the second variant not found.

Please note that gnomAD v4 included 416,555 UKB samples, it’s not equal to 454,727 in the paper you cited and 490,640 WGS individuals on UKB allele frequency browser (assuming that UKB WGS includes almost all UKB WES?).

Hi Qin,

Thank you so much! The first one is correct. The UK Biobank browser link you sent lists the variant as present in Non-Finnish Europeans (and in East Asians as reported in the paper).

The position for the second variant should be close to 7-117504336 since -117504336

  • C-T (GRCh38) corresponds to A46V in gnomAD. The UKB allele browser shows variants in -117504330, -117504333, and -117504339. The UKB allele browser and gnomAD v4 are consistent and we will report that when we write up this work. Thank you again for your help!