Difference between popmax in the VCF and popmax/GrpmaxFAF in the website

Hi, I’m trying to figure out why the popmax population may differ between the VCF (v.2.1.1) and website for a few variants (e.g. gnomAD). Or more specifically if I can replicate the filtering steps to mirror the output on the website.

For the 7-117171029-G-A?dataset=gnomad_r2_1 variant, the VCF.INFO ‘popmax’ field = ‘amr’ (and on the website this is reflected in the ordering of the frequency table). However the Grpmax FAF is using the ‘nfe’ population for genomes. This is presumably due to the low allele count/total for AMR (2/846 Genomes) versus NFE (22/15422 Genomes).

Looking through the code base for I can’t find any explicit filtering for popmax beyond max frequency. (https://github.com/broadinstitute/gnomad-browser/blob/0eb6920726c61756b7e934c3110f382683d91f87/data-pipeline/src/data_pipeline/datasets/gnomad_v2/gnomad_v2_variants.py#L181). Are there adjustments to low allele counts or quality filters that impact the popmax calculation?

Hi @dhtran,

Thanks for writing in! Hopefully I can clarify the differences between the site’s “Grpmax Filtering AF” and the VCF’s popmax AF.

The v2 VCF’s popmax’s pop, and v4’s VCF’s grpmax’s gen_anc, will be, as you pointed out, the genetic ancestry group with the highest allele frequency at a site. The website displays a “Grpmax Filtering AF” which corresponds to the maximum “filtering allele frequency” and its associated genetic ancestry group at a site. This value can differ from the popmax/grpmax and can be found by taking the maximum faf_95_* at a site in the VCF, where * represents all genetic ancestry groups. The Filtering AF can be used for filtering variants by allele frequency against a disease-specific threshold that can be set for each disease (e.g. BA1 in the 2015 ACMG/AMP guidelines). For more information on Filtering AF, please see Frequency Filter and Whiffin et al. 2017.

We do understand that this is not intuitive and the acronym proximity, AF vs FAF, is not ideal. We are working on documentation and revisiting the site/annotation naming to address this.

Please let me know if you have any more questions!

Best,
Mike

I see my misconception was the FAF was calculated off of the ‘popmax’. Instead—to rephrase just to make sure I understand—the Grpmax FAF is filtered by max FAF (or lowest 95% CI threshold) for a given dataset and for non-bottlenecked populations; in some instances low allele counts this may then result in different populations presented as the Grpmax FAF (i.e. Grpmax Filtering FAF) compared to the ‘popmax’ population.

Thank you for clarifying!