Hello, I am a researcher at Nationwide Children’s in Columbus, OH. My groups works on genetic therapies to treat cystic fibrosis (CF). We have been trying to understand CF prevalence and CFTR variants in different ancestral groups using gnomAD. We were comparing our analysis against a recent publication describing CFTR variants in different ancestral groups from the UK biobank (Diversity of CFTR variants across ancestries characterized using 454,727 UK biobank whole exome sequences | Genome Medicine | Full Text). We noticed two discrepancies. We are trying to find an explanation since the UK biobank is listed as one of the sources for the gnomAD4 dataset.
Firstly, the published UK biobank paper described a variant A46D. They report 9 alleles in their African group with this variant. But this variant does not appear in gnomAD 4. I was under the impression that exomes from the UK Biobank were part of gnomAD 4. But is it only a partial set that has been included?
Secondly, they report a variant V520F as the most common one in East Asians (36 alleles). The variant is present in gnomAD (43 alleles) but 38/43 alleles are in the European group in gnomAD and the other 3 are in the Remaining category. The variant has previously not been associated with East Asians. It is reasonable to assume that the methods used to assign ancestries to samples are different between gnomAD and the UK Biobank? Or are we missing something?
Any explanation you may be able to offer would greatly help our efforts. Thanks!