Access to preprocessing pipeline code

iggy_m · December 15, 2023, 12:49am

Hi GnomAD team

The GnomAD browser is really excellent. We would like to try to use the front-end to help visualise our own dataset (as it is not suitable for inclusion into the wider GnomAD data set). Can you please help with a question related to this?

We have had some success with using the code in the Broad Institute gnomad-browser repo to set up the required infrastructure, run some data pipelines, load the resulting data into the backend APIs, and view this in the browser. Thanks for making this code available!

We would now like to try with a different dataset. Some of the gnomAD data pipelines require preprocessed hail tables (e.g. Coverage pipelines requires the corresponding [...]coverage.ht table; Variant pipelines require corresponding [...]sites.ht table). For us to use our own dataset we will need to produce these preprocessed data artifacts too. Are you able to share the code for the gnomAD data preprocessing pipelines?

Many thanks!

steve · December 15, 2023, 3:45pm

Hi @iggy_m !

This may be a better question for our methods team, since they’re the ones generating the source tables that we import into the browser. However, I think the following should get you started, and I can escalate to the methods team if you have further questions.

The code for generating gnomAD’s hail tables that eventually appear in gs://gcp-public-data--gnomad can generally be found in:

In particular, I believe the scripts that we ran dataproc pipelines with to generate coverage.ht and sites.ht can be found at:

Hope this helps!

iggy_m · December 17, 2023, 9:00pm

That is awesome Steve. Thank you very much for your super helpful reply. I hadn’t had a thorough look through the QC repo. I’ll look at those scripts now.

Thanks again!

iggy_m · January 10, 2024, 2:28am

Happy New Year @steve. Just popping back to this thread to confirm that I was able use the gnomad_qc and gnomad_methods code to create a Coverage.ht file from our custom dataset that was sufficient to run the gnomAD coverage data pipeline. Thanks for your help!

Topic		Replies	Views
Ancestry inference v4 General	1	296	January 9, 2024
Bug with structural variant browser Browser	2	37	June 18, 2025
Assistance with Querying Non-Synonymous Variants for Specific Transcript ID Browser	2	50	August 9, 2024
Unable to download ancestry loadings v3 General	1	18	May 12, 2025
Region-specific Proteomics analysis General	1	20	April 8, 2025

Access to preprocessing pipeline code

Related topics