Number of observed variants vs hail query (v4.1.0)

Thank you for pointing this out! We truly appreciate your diligence in bringing this to our attention.

Regarding the discrepancy in the number of variants you’re seeing for the gene “FGA,” we recently discovered a bug that affected the filtering of variants by LOFTEE flag. Specifically, while we intended to filter based on flags, this wasn’t done correctly due to a bug in the gnomad_methods code run at the time of the v4.1 constraint release.

The bug was fixed in a recent update. You can check out the details in this pull request. The relevant fix is: “Allow lof_flags to be missing in addition to lof_flags == "" for lof == "HC" to have no flag penalty” (note that this is a change from previous LOFTEE annotations).

This bug likely explains why you’re seeing 13 variants instead of the 16 listed in the constraint table. The table may include variants that were not properly filtered due to the bug.

Additionally, I’d like to emphasize two points:

  1. The v4.0 constraint metrics are still experimental, as highlighted in our constraint blog post. If you’re looking for a more established version, we recommend continuing to use the gnomAD v2.1.1 constraint metrics.
  2. If you’re interested in retrieving all observed pLoF variants for a gene, you may want to adjust your filtering criteria, as the constraint model applies a coverage threshold of ≥30, which might exclude some variants relevant to your analysis but not to the constraint calculations.

Thank you again for pointing this out, and please let me know if you have any further questions!

1 Like