@Christine_Preston Sorry for my late reply, I went back to read the paper and understand the code. In our combined_faf data, you will encounter 77,624 variants that have the same genetic ancestry of grpmax_faf in exomes and genomes, but a different genetic ancestry in the joint data.
I don’t have a guidance, but this might also explain why:
"Usually, this is from the population with the highest nominal allele frequency.
However, because the tightness of a 95% confidence interval in the Poisson distribution depends upon sample size, the stringency of the filter depends upon the allele number (AN). The stringency of the filter therefore varies appropriately according the the size of the sub-population in which the variant is observed, and sequencing coverage at that site, and af_filter is occasionally derived from a population other than the one with the highest nominal allele frequency. "
For this example variant you found, the FAF calculated from the observed AC and AN in each ancestry group is in detail as follow:
The FAF value itself is correct, but it depends on the size of the sub-population, AC and AN, I would say we added relatively less AC than AN for NFE, the FAF didn’t increase as much (but it’s not linear) as in AMR. We might see opposite situations, added more AC relative to added AN, the FAF grpmax ancestry would also switch.
It seems to me, either using each dataset alone or joint, for both AMR and NFE, this variant can be ruled out as Mendelian disease causing, because its AF is bigger than the FAF computed.
Our team is currently short of bandwidth to dig more, but welcome to report more examples if this explanation doesn’t apply. If you want to look at the code on how the FAF is calculated, it’s here.