Hi all! I am new to Hail, and I am interested in aggregating all of the heterozygote counts (AC - 2*(homozygote_count)) across all of the exomes (including UKBB). However, I am not too sure which row field to use for this purpose, can I get some advice for this please? Thank you very much!
Hi @Inquisitive,
Thanks for your question and your patience and also welcome to Hail! You’ll need to use the freq
array annotation within the data type’s (exome or genome) hail Table and a group’s index to access your desired group’s call statistics, AC
and homozygote_count
. gnomAD as a whole is the first entry in the array, 0. With logic similar to the code block below, where ht
is the exome release Hail Table, you should be able to calculate the number of heterozygotes.
ht = ht.annotate(nhets = ht.freq[0].AC-(2*ht.freq[0].homozygote_count))
If you are looking for the het counts across variants present in both exomes and genomes, you can use the joint_freq
annotation and achieve the same thing.
For more information on how to access the gnomAD frequency array, please visit this help page.
Best,
Mike
Thank you for your response. I realized that the AN differs for most variants. However, there are 730,947 exome samples according to the website. May I check why the AN is not consistently 730,947*2? Appreciate your help on this.