Discordant variant observations between v2.1.1 and v4.1

Variant Differences Between v2.1.1 and v4.1

Dear all,

I’ve been comparing observed syn, mis, and lof variants between the v2.1.1 and v4.1 constraint metric files for MANE transcripts as defined in the v4.1 file. Surprinsingly, among the 12,045 genes with values available in all three categories, I found that:

  • 6.2% of the genes have more LOF variants in v2.1.1 than in v4.1
  • 2.7% of the genes have more missense variants in v2.1.1 than in v4.1
  • 3% of the genes have more synonymous variants in v2.1.1 than in v4.1

I assume some of these differences may be due to minor variations in filtration steps between the two releases, but I’m surprised by the relatively large fraction genes with more variants in v2.1.1 compared to v4.1 despite the considerably smaller sample size.

Does anyone have an explanation for this observation?

Best,
Steffan

This is likely because the gnomAD v4.0/4.1 constraint metrics were calculated only across high coverage (median exome depth >=30) bases. For more information, please see our blog post.

1 Like

Thanks a lot for pointing me in that direction. As you suggest, the “Fraction of individuals with coverage over 30” for several of the genes is in the range of 0.1-0.5 for v4.1.