Thanks for continuing to maintain such an essential resource.
As part of an approach to combine in-house sequencing data with gnomAD summary data, we want to identify variant positions with significantly lower coverage in the gnomAD data. The existing v4.1 exome coverage file is useful in this regard, however for some analyses we want to look only in female individuals, white individuals etc.
Does any version of the data presented in the existing v4.1 exome coverage file exist that is more stratified, e.g. into female vs. male, different ancestries, UKB vs. non-UKB etc.? It’s particularly pertinent given the likely differences in capture regions/methodology between UKB and the other gnomAD datasets.
If not, is there any plan to release them in the future?
Coverage for gnomAD v4 was calculated from sample genomic VCFs (gVCFs), which is less granular than coverage information from read data due to the reference block structure within gVCFs. As part of gnomAD v4.1, we released an all sites allele number (AN) resource as a higher resolution proxy for coverage. We do not have plans to release a stratified version of our exome coverage file; however, we have released a stratified version of our all sites AN resource, available here.
This is the schema of the exomes all sites AN Hail Table:
The global field strata_meta describes the various strata available for the all sites AN resource (e.g., {'gen_anc': 'afr', 'group': 'adj'}; the definition of adj is described in this help page). The index of a group in strata_meta corresponds to the position of that group’s AN values in the AN array.