Difference between popmax in the VCF and popmax/GrpmaxFAF in the website

dhtran · November 15, 2023, 12:49am

Hi, I’m trying to figure out why the popmax population may differ between the VCF (v.2.1.1) and website for a few variants (e.g. gnomAD). Or more specifically if I can replicate the filtering steps to mirror the output on the website.

For the 7-117171029-G-A?dataset=gnomad_r2_1 variant, the VCF.INFO ‘popmax’ field = ‘amr’ (and on the website this is reflected in the ordering of the frequency table). However the Grpmax FAF is using the ‘nfe’ population for genomes. This is presumably due to the low allele count/total for AMR (2/846 Genomes) versus NFE (22/15422 Genomes).

Looking through the code base for I can’t find any explicit filtering for popmax beyond max frequency. (https://github.com/broadinstitute/gnomad-browser/blob/0eb6920726c61756b7e934c3110f382683d91f87/data-pipeline/src/data_pipeline/datasets/gnomad_v2/gnomad_v2_variants.py#L181). Are there adjustments to low allele counts or quality filters that impact the popmax calculation?

mike · November 20, 2023, 4:30pm

Hi @dhtran,

Thanks for writing in! Hopefully I can clarify the differences between the site’s “Grpmax Filtering AF” and the VCF’s popmax AF.

The v2 VCF’s popmax’s pop, and v4’s VCF’s grpmax’s gen_anc, will be, as you pointed out, the genetic ancestry group with the highest allele frequency at a site. The website displays a “Grpmax Filtering AF” which corresponds to the maximum “filtering allele frequency” and its associated genetic ancestry group at a site. This value can differ from the popmax/grpmax and can be found by taking the maximum faf_95_* at a site in the VCF, where * represents all genetic ancestry groups. The Filtering AF can be used for filtering variants by allele frequency against a disease-specific threshold that can be set for each disease (e.g. BA1 in the 2015 ACMG/AMP guidelines). For more information on Filtering AF, please see Frequency Filter and Whiffin et al. 2017.

We do understand that this is not intuitive and the acronym proximity, AF vs FAF, is not ideal. We are working on documentation and revisiting the site/annotation naming to address this.

Please let me know if you have any more questions!

Best,
Mike

dhtran · November 20, 2023, 9:45pm

I see my misconception was the FAF was calculated off of the ‘popmax’. Instead—to rephrase just to make sure I understand—the Grpmax FAF is filtered by max FAF (or lowest 95% CI threshold) for a given dataset and for non-bottlenecked populations; in some instances low allele counts this may then result in different populations presented as the Grpmax FAF (i.e. Grpmax Filtering FAF) compared to the ‘popmax’ population.

Thank you for clarifying!

Topic		Replies	Views
What is "joint subset" in v4.0? General	1	425	December 2, 2023
Inconsistent SNP MAF values between VCF v4.0 and browser General	5	354	October 28, 2024
GroupMax genetic ancestry differs in total from G & E? General	4	436	November 27, 2023
GroupMax values missing for a lot of variants General	9	512	January 2, 2025
Variant allele frequency versus population frequency General	1	352	May 28, 2024

Difference between popmax in the VCF and popmax/GrpmaxFAF in the website

Related topics