Hello!
I’m using Hail to build a pipeline that calculates allele frequency statistics by sex and ancestry. I’m aiming to replicate the gnomAD allele frequency table, which includes:
-
Allele Count (AC)
-
Allele Number (AN)
-
Number of Homozygotes (homozygote_count)
-
Number of Hemizygotes (n_hemizygous, when applicable)
-
Allele Frequency (AF)
All the statistics are outputed directly by Hail’s function hl.agg.call_stats but I want to ensure that I calculate correctly the hemizygous count, as I don’t see any direct way to do it.
My logic would be:
-
PAR regions: diploid →
n_hom_varworks as usual. -
Non-PAR regions: haploid → we need to count any alternate allele in these samples as “hemizygous alternate.”
Is this how you get the hemizgous count shown in the gnomAD browser?
Thanks in advance!
Mireia