In gnomAD v4.0 | gnomAD browser it is stated that “62,901,592 SNVs and 6,189,261 indels pass all filters in the v4 release.”. On trying to reproduce this number we arrive at 600,294,364 SNVs that pass all filters (out of a total of 786,500,648 SNVs listed in https://gnomad.broadinstitute.org/stats). Is there an error in my reasoning?
Hello, my name is Daniel Marten and I’m a member of the gnomAD Production Team:
1 - One thing to note is that parts of the blog post you linked, including that Variant QC number you’re citing, only covers the gnomAD v4 Exomes Release. This is discussed under the ‘Creating gnomAD v4’ section , linked here. The exome release only contains 167,897,387 SNVs total (either high quality or containing reasons to be filtered out), so I’m going to assume that your number - 600,294,364 SNVs - is from genomes.
2 - For the number you did arrive at though, what methods did you use to arrive at it?
Ah hello - sorry for the delay, and thank you for sharing your methods. It’s a bit helpful to know how you got this and how people are using our resource.
For SNVs that pass all filters, we have 565,523,876 in genomes and 62,901,592 in exomes - of which 537,949,472 are genome-exclusive and 44,108,890 are exome-exclusive.
Any further or more detailed bioinformatics support for outside scripts and their results is a bit beyond the scope of the production team in this forum, but it’s nice to see how you arrived at those numbers and we’re happy to clear up that high-level confusion from the start.
With your first question answered, I’m going to mark this ticket as resolved, but comment on this thread or DM me if you have any further concerns!
@dennishendriksen A joint (exomes + genomes) release is available from gnomAD v4.1, you may not need to merge them on your own, the vcf FILTER is also combined as explained here.