Clarification on gnomAD v4.1 Data in GCP

Dear gnomAD Team,

Could you please confirm whether the gnomAD v4.1 data available in GCP includes both UK Biobank and non-UK Biobank samples, or only one of them?

We’ve noticed some variants where VEP and our gnomAD plugin return no values, while the gnomAD website shows values when the non-UK dataset is selected. The documentation and blog post don’t specify which dataset the GCP data corresponds to.

Thank you for your help.

Best regards,
Fatima Farhan

The gnomAD v4.1 downloadable data contains allele frequencies from the full set of 730,947 exomes in v4.1 and allele frequencies calculated using various data strata.

In the VCF, frequency fields without suffixes reflect the values from the complete exome dataset, while fields with suffixes indicate the specific data stratification used. Here are some fields from the VCF as an example:

##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate allele count">
##INFO=<ID=AC_XX,Number=A,Type=Integer,Description="Alternate allele count for XX samples">
##INFO=<ID=AC_afr,Number=A,Type=Integer,Description="Alternate allele count for samples in the African/African-American genetic ancestry group">
##INFO=<ID=AC_non_ukb,Number=A,Type=Integer,Description="Alternate allele count in non_ukb subset">
##INFO=<ID=AC_non_ukb_XX,Number=A,Type=Integer,Description="Alternate allele count for XX samples in non_ukb subset">
##INFO=<ID=AC_non_ukb_afr,Number=A,Type=Integer,Description="Alternate allele count for samples in the African/African-American genetic ancestry group in non_ukb subset">

I recommend referring to this help page If you are using the Hail version of the downloads.