How to calculate hemizygous counts

Hello!

I’m using Hail to build a pipeline that calculates allele frequency statistics by sex and ancestry. I’m aiming to replicate the gnomAD allele frequency table, which includes:

  • Allele Count (AC)

  • Allele Number (AN)

  • Number of Homozygotes (homozygote_count)

  • Number of Hemizygotes (n_hemizygous, when applicable)

  • Allele Frequency (AF)

All the statistics are outputed directly by Hail’s function hl.agg.call_stats but I want to ensure that I calculate correctly the hemizygous count, as I don’t see any direct way to do it.

My logic would be:

  • PAR regions: diploid → n_hom_var works as usual.

  • Non-PAR regions: haploid → we need to count any alternate allele in these samples as “hemizygous alternate.”

Is this how you get the hemizgous count shown in the gnomAD browser?

Thanks in advance!

Mireia

Hi Mireia,

That’s correct – in the browser pipeline, if a sample’s sex is XY and the variant is in a non-pseudoautosomal region of chromosome X or Y, then the any alternate allele in that sample is considered hemizygous alternate.

Also, note that in the data production pipeline upstream, we adjust each sample’s ploidy in non-pseudoautosomal regions using this function.

Hi Katherine,

Thank you very much for your reply! I’ll check the ploidy adjustion function and apply it to my code.