GroupMax values missing for a lot of variants

I’d like to understand why GroupMax values are missing for so many variants. For example, among the >5000 BRCA2 variants in gnomAD, nearly 3000 have GrpMax missing. Most variants are exceedingly rare, leading me to think missing GrpMax values were actually meant to be ‘zero’. For those variants with at least 4 observations, I found some cases where the variant was only present in populations not considered in GrpMax calculation or variants with warnings about coverage in <50% of exomes. This again made me think missing values were actually zero. But then I found a variant with 40 observations in the European subset and GrpMax was still missing. I am hesitant to move my analysts over to using GrpMax until I understand better why so many values are missing. Thanks!

The new release of gnomAD V4.1 resolved some of these. Now it looks like the only remaining missing GrpMax values are meant to be zero. Thx

thank you for updating here – this will help any other users that have the same question about v4.0!

Hi,
I’m currently incorporating gnomAD v4.1 into a genomics pipeline that my department maintains. I am still seeing a lot of variants with missing total grpmax filtering AF (I’ve included links to some of these variants at the bottom of this message) Are these missing values actually supposed to be 0 or could there be another reason for it to be missing?

Links:

Thank you very much for your help in advance!

@kchao Hi Katherine, I quickly wanted to follow up on my previous message. It would be really helpful if you’re able to clarify regarding the missing total grpmax filtering AF values. Many thanks for your help!

@anjalijain The grpmax FAF may be missing due to two reasons:

  1. When AC (allele count) is small, the FAF (Filtering Allele Frequency) calculation can yield 0. If FAFs across non-bottlenecked ancestry groups are not greater than 0, grpmax FAF (based on this function) will be missing. The browser team applied historical transformation steps for exomes and genomes instead of taking the values directly from our release table. In the two examples, it should be missing for all: exomes, genomes, and joint. This will be corrected soon.

  2. Variants absent from non-bottlenecked groups, like this example, will not have a calculated FAF, leaving grpmax FAF missing.

I hope this answers your question.

Hi Qin, thank you very much for your reply!
I think I understand reason #2 well. However, for reason #1, can you please clarify what is meant by “FAF across non-bottlenecked ancestry group are not greater than zero”? In the examples I linked, the AC is certainly low in non-bottlenecked ancestry groups but the FAF although very small, is still greater than 0. So, I’m not sure why in this instance it should be missing?

Many thanks in advance for your help!

@anjalijain We’re using the FAF calculation function from Hail, which was written in Scala, I made it a Python version for my own comprehension. They both returned 0 for the 2 examples. Here is a screenshot using Hail’s function:

Maximum of a list of 0 is 0, hence it would be missing by our definition.