The sample metadata tables provided under “Sample Metadata Hail Table” and “Sample Metadata TSV” at https://gnomad.broadinstitute.org/data#v3-hgdp-1kg appear to disagree. In particular, the “high_quality” column is different between the two sources. (There are other differences as well, but that is the one that I’ve noticed and been struggling with). The hail table also has a column “gnomad_high_quality”, which appears to be equivalent to the “high_quality” column in the TSV, so my guess is that the TSV was never updated with the changes in the 3.1.2 release, and the hail table is more accurate and up to date, with the “high_quality” in the 3.1.2 hail table being the recommended value to use for sample filtering. But just wanted to make sure. Thanks!
Thanks for reaching out and for flagging this. We recommend using the high_quality annotation in the Hail Table. This filter removes samples that fail gnomAD’s hard filters but keeps samples previously erroneously identified as sample QC metric outliers. For more information about this annotation and the gnomad_high_quality annotation, please see our previous blog post or our help page.
The TSV includes updated sample metadata as described in this publication, but it was not updated to reflect the v3.1.2 filtering logic. We will work on adding this point to our documentation.
Got it, thanks @kchao !