Discrepancies in AN values for multiallelic sites

olga_v · February 6, 2026, 6:28pm

Dear gnomAD team,

I have a question regarding the procedure used to compute total allele number (AN) for multiallelic sites in the gnomAD v4.1 dataset.

I have come across multiple instances of apparent discrepancies between genomic AN values provided for different alleles at the same site. For example, if we look at variants found at the position chr1:1228635, we will see that there are two alternative alleles T and A both present in the gnomAD v4.1 genomic dataset (https://gnomad.broadinstitute.org/variant/rs61743559?dataset=gnomad_r4). However, AN values reported for these two alleles are slightly different (152256 and 152374 respectively). I am wondering what is the reason for this difference, as these SNPs are obviously called from the same data and I would expect the sample size to be the same for all alleles at a given locus. Could you please clarify this for me?

Kind regards,

Olga

kchao · February 20, 2026, 9:25pm

This is due to the downcoding of genotypes when splitting multiallelic variants. We use Hail to split multiallelic variants prior to calculating aggregate variant frequency statistics, and the process of downcoding genotypes is nicely explained in their documentation.

Note that we also released results from allele number calculated across all possible sites as part of gnomAD v4.1. These data, which were calculated prior to splitting multiallelics, are available for download here. In the all sites AN results, the total genomes AN at chr1:1228635 is 152374.

olga_v · February 26, 2026, 5:58pm

Many thanks for your response and clarification!

I have two small follow-up questions:

Would it therefore be reasonable to use allele numbers computed prior to splitting multiallelic variants and ignore AN values provided for individual alleles?
I would like to obtain estimates of the reference allele number/frequency for each multiallelic site. Given that alternative allele counts at such sites are sometimes computed using different AN values, subtracting AC values from the AN value might result in a slightly incorrect estimate of the reference allele number. Still, to obtain an estimate of the reference allele frequency, I guess, I could simply subtract allele frequencies for all alternative alleles from 1. Would it be a reasonable approach?

I would greatly appreciate your comments and suggestions.

Thank you very much for your help,

Kind regards,

Olga

Topic		Replies	Views
Allele number differences General	6	646	May 7, 2024
Multiple ref alleles at same location Multinucleotide variants	1	69	February 24, 2025
AN vs coverage values General	5	478	December 14, 2023
The list of variants with incorrect allele frequencies in gnomAD-v4 General	2	444	May 6, 2024
Population AF versus gnomAD AF General	2	475	January 16, 2024

Discrepancies in AN values for multiallelic sites

Related topics