Firstly, thank you for your continued efforts and commitment to this initiative.
I would like to download the gnomAD 4.1 non-UK Biobank (non-UKB) data subset. From the screenshot attached, the gnomAD website appears to suggest that the non-UKB data should be available as a downloadable dataset.
However, I have been unable to locate a download link for the non-UKB subset, either on the website or in the public AWS S3 bucket (s3://gnomad-public-us-east-1/release/4.1/vcf/).
Could you please confirm whether the non-UKB data is available for download as a dataset? If so, would you be able to provide the appropriate download link (ideally an AWS S3 path)?
Thank you for reaching out! We do not have a single non-UKB download, though I can see how the above phrasing could suggest that. Instead you will need to filter the VCF sites to where the info field AC_non_ukb > 0. You’ll see a non_ukb string in all non-UKB fields if you want to examine specific genetic ancestry groups e.g., AC_non_ukb_amr.
Alternatively, if you are familiar with hail, you could filter the release HT. You could accomplish this by filtering the HT sites using the frequency array’s high quality non-UKB entry which is the 167th element.