gnomAD 4.1 non-ukb download?

Dear team,

Firstly, thank you for your continued efforts and commitment to this initiative.

I would like to download the gnomAD 4.1 non-UK Biobank (non-UKB) data subset. From the screenshot attached, the gnomAD website appears to suggest that the non-UKB data should be available as a downloadable dataset.

However, I have been unable to locate a download link for the non-UKB subset, either on the website or in the public AWS S3 bucket (s3://gnomad-public-us-east-1/release/4.1/vcf/).

Could you please confirm whether the non-UKB data is available for download as a dataset? If so, would you be able to provide the appropriate download link (ideally an AWS S3 path)?

Thank you in advance for your help.

Hi @tg10,

Thank you for reaching out! We do not have a single non-UKB download, though I can see how the above phrasing could suggest that. Instead you will need to filter the VCF sites to where the info field AC_non_ukb > 0. You’ll see a non_ukb string in all non-UKB fields if you want to examine specific genetic ancestry groups e.g., AC_non_ukb_amr.

Alternatively, if you are familiar with hail, you could filter the release HT. You could accomplish this by filtering the HT sites using the frequency array’s high quality non-UKB entry which is the 167th element.

ht = hl.read_table(“gs://gcp-public-data–gnomad/release/4.1/ht/exomes/gnomad.exomes.v4.1.sites.ht”)
ht = ht.filter(ht.freq[167].AC>0)

Please let me know if you have any follow up questions!

Best,

Mike

1 Like

Hi Mike,

Thanks for clarifying that out, and thank you very much for the workaround!