Best Practice for Reporting Population Allele Frequencies When Both UK Biobank and Non-UK Biobank Data Are Available in gnomAD v4

Hello gnomAD team,

When investigating missense variants in a gene using gnomAD v4, both UK Biobank (UKB) and non-UK Biobank (non-UKB) allele frequency data are available for many populations. For certain filtering criteria (e.g., rare disease variant curation or ACMG guidelines), would it be appropriate to simply report the higher observed allele frequency (“GroupMax”) between the two sources for each ancestry? Is this approach recommended—or are there potential pitfalls or more statistically robust alternatives for allele frequency reporting, especially for populations where sample sizes or representativeness differ substantially between UKB and non-UKB?
Any clarification or references to current best practices would be greatly appreciated!

Thank you,
RN

Hi Rohan,

In general, the larger the dataset, the more accurate the allele frequency estimate, so we recommend using the GroupMax value calculated across the full gnomAD v4 dataset. Any differences in frequency between the GroupMax values calculated across the UKB and non-UKB subsets of v4 are likely small and due to random sampling.

Hi Katherine,

Thank you for clarifying. I’ll proceed with the GroupMax value from the full gnomAD v4 dataset.

Hi Katherine

Just to confirm:

I am preparing to report allele frequencies from gnomAD v4 and want to clarify the best practice. For example, for the CNR1 Ala120Ser variant, I see both sas_non_ukb (AF = 2.87e-05) and sas (AF = 2.32e-05). When reporting allele frequencies, should I always use the full population value (sas in this case, which includes both UKB and non-UKB)? Also, when calculating GroupMax, should I rely on the full population values rather than the non-UKB subsets?

Thanks in advance!