The list of variants with incorrect allele frequencies in gnomAD-v4

Hello. I used gnomAD-v4 and gnomAD-v4.1 to filter for coding variants that have allele frequencies of (=<0.01) and (<=0.0005). I did this for a cohort of 100 cases and generated two sets of files (one set post v4, one set post v4.1 filter). Unexpectedly, the variant counts remained unchanged after filtering with either versions of gnomAD-v4/v4.1. This raised concerns for me. Do you happen to have the list of variants with incorrect allele frequencies in gnomAD-v4? If so, could you please share it with me? Thank you.

Hello,

As I checked with my code on gnomAD v4.1 exomes to get the number of coding variants at the 2 AF thresholds that you used, I got different number of variants. You may have to double check your code, I don’t think these variants’ frequencies are incorrect.

Here is a snippet to get the count using global AF and the most_severe_consequence of a variant is coding.

"""Get the count of coding variants with MAF <= 0.01 or MAF <= 0.0005."""

import hail as hl
from gnomad.utils.vep import (
    CSQ_CODING,
    filter_vep_transcript_csqs,
    get_most_severe_consequence_for_summary,
)
from gnomad_qc.v4.resources.release import release_sites

ht = release_sites().ht()
ht = filter_vep_transcript_csqs(
    ht,
    synonymous=False,
    canonical=True,
)
ht = get_most_severe_consequence_for_summary(ht)

filter_expr = {}

filter_expr["coding"] = hl.any(lambda csq: ht.most_severe_csq == csq, CSQ_CODING)
filter_expr["coding_0.01"] = filter_expr["coding"] & (ht.freq[0].AF <= 0.01)
filter_expr["coding_0.0005"] = filter_expr["coding"] & (ht.freq[0].AF <= 0.0005)

print(
    "Number of coding variants with an MAF<=0.01: ",
    ht.filter(filter_expr["coding_0.01"]).count(),
)
print(
    "Number of coding variants with an MAF<=0.0005: ",
    ht.filter(filter_expr["coding_0.0005"]).count(),
)

Results:
Number of coding variants with an MAF<=0.01: 28009104
Number of coding variants with an MAF<=0.0005: 27837973

An update: I just did a quick filtering on gnomAD-v4 versus v4.1 files, and the number of filtered variants were different (182467865 vs 181594644) so there is no issue with the AF in the new gnomAD files. The filtered variants in my cohort are not affected by the AF change in v4.1 and that might be because I only used non-ukb AF to filter variants. My question is resolved now. Thank you!

Hello,
Thank you for your response and the code. There might have been a misunderstanding. The variant counts comparison I’m doing is between the two versions of gnomAD. I was expecting that the variant counts change after using the v4.1 because of the issue with AF in v4; However for all the cases in our cohort the variant counts remained the same, which made me wonder if the AF in the new gnomAD release (v4.1) has been really corrected. See the following two lines as an example:
Case1 → Filter (gnomAD v4 AF_non-ukb <= 0.01) = 524
Case1 → Filter (gnomAD v4.1 AF_non-ukb <= 0.01) = 524