Hail info fields for exomes

Inquisitive · December 15, 2023, 12:49am

Hi all! I am new to Hail, and I am interested in aggregating all of the heterozygote counts (AC - 2*(homozygote_count)) across all of the exomes (including UKBB). However, I am not too sure which row field to use for this purpose, can I get some advice for this please? Thank you very much!

mike · January 9, 2024, 6:44pm

Hi @Inquisitive,

Thanks for your question and your patience and also welcome to Hail! You’ll need to use the freq array annotation within the data type’s (exome or genome) hail Table and a group’s index to access your desired group’s call statistics, AC and homozygote_count. gnomAD as a whole is the first entry in the array, 0. With logic similar to the code block below, where ht is the exome release Hail Table, you should be able to calculate the number of heterozygotes.

ht = ht.annotate(nhets = ht.freq[0].AC-(2*ht.freq[0].homozygote_count))

If you are looking for the het counts across variants present in both exomes and genomes, you can use the joint_freq annotation and achieve the same thing.

For more information on how to access the gnomAD frequency array, please visit this help page.

Best,
Mike

Inquisitive · January 21, 2024, 4:02am

Thank you for your response. I realized that the AN differs for most variants. However, there are 730,947 exome samples according to the website. May I check why the AN is not consistently 730,947*2? Appreciate your help on this.

Topic		Replies	Views
Allele number differences General	6	498	May 7, 2024
Interpretation of Homozygous and Hemizygous Counts on gnomAD variants General	1	291	May 22, 2024
Variant frequency extraction for XX and XY Feature requests	0	39	September 4, 2024
Haplotype aggregated info General	1	52	September 5, 2024
AN vs coverage values General	5	383	December 14, 2023

Hail info fields for exomes

Related topics