Access to preprocessing pipeline code

Hi GnomAD team :wave:

The GnomAD browser is really excellent. We would like to try to use the front-end to help visualise our own dataset (as it is not suitable for inclusion into the wider GnomAD data set). Can you please help with a question related to this?

We have had some success with using the code in the Broad Institute gnomad-browser repo to set up the required infrastructure, run some data pipelines, load the resulting data into the backend APIs, and view this in the browser. Thanks for making this code available! :tada:

We would now like to try with a different dataset. Some of the gnomAD data pipelines require preprocessed hail tables (e.g. Coverage pipelines requires the corresponding [...] table; Variant pipelines require corresponding [...] table). For us to use our own dataset we will need to produce these preprocessed data artifacts too. Are you able to share the code for the gnomAD data preprocessing pipelines?

Many thanks!

Hi @iggy_m !

This may be a better question for our methods team, since they’re the ones generating the source tables that we import into the browser. However, I think the following should get you started, and I can escalate to the methods team if you have further questions.

The code for generating gnomAD’s hail tables that eventually appear in gs://gcp-public-data--gnomad can generally be found in:

In particular, I believe the scripts that we ran dataproc pipelines with to generate and can be found at:

Hope this helps!

That is awesome Steve. Thank you very much for your super helpful reply. I hadn’t had a thorough look through the QC repo. I’ll look at those scripts now.

Thanks again!

Happy New Year @steve. Just popping back to this thread to confirm that I was able use the gnomad_qc and gnomad_methods code to create a file from our custom dataset that was sufficient to run the gnomAD coverage data pipeline. Thanks for your help!