gnomAD v4.1.0 Joint sites Hail Table download results in 404

Dear gnomAD team,

All Joint sites Hail Table download links are not working for me:

`$ wget https://storage.googleapis.com/gcp-public-data--gnomad/release/4.1/ht/joint/gnomad.joint.v4.1.site.ht
--2024-04-24 05:16:05--  https://storage.googleapis.com/gcp-public-data--gnomad/release/4.1/ht/joint/gnomad.joint.v4.1.site.ht
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.36.59, 142.250.179.187, 142.250.179.219, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.36.59|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-04-24 05:16:05 ERROR 404: Not Found.

$ wget https://gnomad-public-us-east-1.s3.amazonaws.com/release/4.1/ht/joint/gnomad.joint.v4.1.site.ht
--2024-04-24 05:16:26--  https://gnomad-public-us-east-1.s3.amazonaws.com/release/4.1/ht/joint/gnomad.joint.v4.1.site.ht
Resolving gnomad-public-us-east-1.s3.amazonaws.com (gnomad-public-us-east-1.s3.amazonaws.com)... 52.216.27.156, 54.231.140.97, 52.217.132.233, ...
Connecting to gnomad-public-us-east-1.s3.amazonaws.com (gnomad-public-us-east-1.s3.amazonaws.com)|52.216.27.156|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-04-24 05:16:27 ERROR 404: Not Found.

$ wget https://datasetgnomad.blob.core.windows.net/dataset/release/4.1/ht/joint/gnomad.joint.v4.1.site.ht
--2024-04-24 05:16:36--  https://datasetgnomad.blob.core.windows.net/dataset/release/4.1/ht/joint/gnomad.joint.v4.1.site.ht
Resolving datasetgnomad.blob.core.windows.net (datasetgnomad.blob.core.windows.net)... 20.150.78.68
Connecting to datasetgnomad.blob.core.windows.net (datasetgnomad.blob.core.windows.net)|20.150.78.68|:443... connected.
HTTP request sent, awaiting response... 404 The specified blob does not exist.
2024-04-24 05:16:37 ERROR 404: The specified blob does not exist..

Could it be that the links on the page are incorrect?

Do these files contain the following information that was stored in the gnomAD v4.0 vcf files?

Warning: The tag "AF_joint" not defined in the header
Warning: The tag "AN_joint" not defined in the header
Warning: The tag "nhomalt_joint" not defined in the header
Warning: The tag "faf95_joint" not defined in the header
Warning: The tag "faf99_joint" not defined in the header

Best regards,
Dennis

Hi @dennishendriksen,

I believe you are missing the trailing ‘s’ in ‘sites’.

wget https://storage.googleapis.com/gcp-public-data--gnomad/release/4.1/ht/joint/gnomad.joint.v4.1.sites.ht

should work.

The joint HT does contain all of the fields you list however they will just be inside of a joint struct which can be accessed using ht.joint.freq[0].AF, ht.joint.freq[0].AN, ht.joint.freq[0].homozygote_count, ht.joint.faf[0].faf95, and ht.joint.faf[0].faf99. The file also contains genomesandexomes` structs containing the same information.

Just noting that the link was incorrect on our downloads page, and there is a fix incoming: fixup(browser): typo in joint downloads link by sjahl · Pull Request #1513 · broadinstitute/gnomad-browser · GitHub

1 Like

Hi @mike and @steve,

Thank you for the quick fix. Unfortunately I still receive the same errors on the fixed download links:

$ wget https://storage.googleapis.com/gcp-public-data--gnomad/release/4.1/ht/joint/gnomad.joint.v4.1.sites.ht
--2024-04-25 05:56:07--  https://storage.googleapis.com/gcp-public-data--gnomad/release/4.1/ht/joint/gnomad.joint.v4.1.sites.ht
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.39.123, 142.251.36.27, 142.250.179.219, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.39.123|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-04-25 05:56:08 ERROR 404: Not Found.

$ wget https://gnomad-public-us-east-1.s3.amazonaws.com/release/4.1/ht/joint/gnomad.joint.v4.1.sites.ht
--2024-04-25 05:56:25--  https://gnomad-public-us-east-1.s3.amazonaws.com/release/4.1/ht/joint/gnomad.joint.v4.1.sites.ht
Resolving gnomad-public-us-east-1.s3.amazonaws.com (gnomad-public-us-east-1.s3.amazonaws.com)... 16.182.36.241, 52.216.129.3, 16.182.107.161, ...
Connecting to gnomad-public-us-east-1.s3.amazonaws.com (gnomad-public-us-east-1.s3.amazonaws.com)|16.182.36.241|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-04-25 05:56:25 ERROR 404: Not Found.

$ wget https://datasetgnomad.blob.core.windows.net/dataset/release/4.1/ht/joint/gnomad.joint.v4.1.sites.ht
--2024-04-25 05:56:40--  https://datasetgnomad.blob.core.windows.net/dataset/release/4.1/ht/joint/gnomad.joint.v4.1.sites.ht
Resolving datasetgnomad.blob.core.windows.net (datasetgnomad.blob.core.windows.net)... 20.150.78.68
Connecting to datasetgnomad.blob.core.windows.net (datasetgnomad.blob.core.windows.net)|20.150.78.68|:443... connected.
HTTP request sent, awaiting response... 404 The specified blob does not exist.
2024-04-25 05:56:41 ERROR 404: The specified blob does not exist..

Note that other links on gnomAD are working without issues, for example:

$ wget https://storage.googleapis.com/gcp-public-data--gnomad/release/4.1/vcf/exomes/gnomad.exomes.v4.1.sites.chr1.vcf.bgz
--2024-04-25 05:59:49--  https://storage.googleapis.com/gcp-public-data--gnomad/release/4.1/vcf/exomes/gnomad.exomes.v4.1.sites.chr1.vcf.bgz
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.36.27, 142.251.39.123, 172.217.168.219, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.36.27|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 18789775480 (17G) [application/octet-stream]
Saving to: ‘gnomad.exomes.v4.1.sites.chr1.vcf.bgz.1’

gnomad.exomes.v4.1.sites.chr1.vcf.bgz.1      0%[                                                                                        ]   1.99M  3.05MB/s

Greetings,
dennishendriksen

This is due to some technical details in how object storage buckets work, as they’re somewhat different than a typical web server. In short, hail tables are directories that contain many other files, and wget isn’t aware of all the requests it needs to make to locate and retrieve all of the table’s contents.

It works fine for single files, since those can be retrieved with a single HTTP request. However, for pulling down the hail tables you’ll need to use one of the tools recommended on our downloads page, which is specifically designed for interacting with object storage buckets (such as gsutil cp -r for Google Cloud Storage).