Variant QC SNV pass count v4

dennishendriksen · January 8, 2024, 10:32am

Dear Katherine Chao / gnomAD Production Team,

In gnomAD v4.0 | gnomAD browser it is stated that “62,901,592 SNVs and 6,189,261 indels pass all filters in the v4 release.”. On trying to reproduce this number we arrive at 600,294,364 SNVs that pass all filters (out of a total of 786,500,648 SNVs listed in https://gnomad.broadinstitute.org/stats). Is there an error in my reasoning?

Best regards,
Dennis

Daniel_Marten · January 8, 2024, 2:54pm

Hello, my name is Daniel Marten and I’m a member of the gnomAD Production Team:

1 - One thing to note is that parts of the blog post you linked, including that Variant QC number you’re citing, only covers the gnomAD v4 Exomes Release. This is discussed under the ‘Creating gnomAD v4’ section , linked here. The exome release only contains 167,897,387 SNVs total (either high quality or containing reasons to be filtered out), so I’m going to assume that your number - 600,294,364 SNVs - is from genomes.

2 - For the number you did arrive at though, what methods did you use to arrive at it?

dennishendriksen · January 8, 2024, 3:11pm

Hello Daniel Marten,

Thank you for your quick reply and thank you for clarifying that the numbers are based on gnomAD v4 exomes only!

600,294,364 is based on the combined exomes/genomes data and includes SNVs for which either the exomes/genomes QC is ‘no variant’ or ‘pass’.

Daniel_Marten · January 8, 2024, 3:44pm

I understand that, but could you be a bit more specific so I can try and replicate this?

Exactly what tables were you using? And if applicable, what code did you use to get the numbers ?

dennishendriksen · January 8, 2024, 4:03pm

we’ve created a combined resource using vip/utils/create_gnomad.sh at v7.2.1 · molgenis/vip · GitHub resulting in https://downloads.molgeniscloud.org/downloads/vip/resources/GRCh38/gnomad.total.v4.0.sites.stripped.tsv.gz

the SNV count results from running: zcat gnomad.total.v4.0.sites.stripped.tsv.gz | awk ‘BEGIN { FS=OFS=“\t” } NR>1 { if(length($3)==1 && length($4)==1 && ($17 == “NO_VAR” || $17 == “PASS”) && ($18 == “NO_VAR” || $18 == “PASS”)) print }’ | wc -l
600294364

Daniel_Marten · February 5, 2024, 3:32pm

Ah hello - sorry for the delay, and thank you for sharing your methods. It’s a bit helpful to know how you got this and how people are using our resource.

For SNVs that pass all filters, we have 565,523,876 in genomes and 62,901,592 in exomes - of which 537,949,472 are genome-exclusive and 44,108,890 are exome-exclusive.

Any further or more detailed bioinformatics support for outside scripts and their results is a bit beyond the scope of the production team in this forum, but it’s nice to see how you arrived at those numbers and we’re happy to clear up that high-level confusion from the start.

With your first question answered, I’m going to mark this ticket as resolved, but comment on this thread or DM me if you have any further concerns!

dennishendriksen · February 5, 2024, 3:48pm

Feel free to close, I appreciate your time and effort, thanks!

Qin · October 28, 2024, 2:52pm

@dennishendriksen A joint (exomes + genomes) release is available from gnomAD v4.1, you may not need to merge them on your own, the vcf FILTER is also combined as explained here.

Topic		Replies	Views
The list of variants with incorrect allele frequencies in gnomAD-v4 General	2	305	May 6, 2024
V4: Number of missense variants in VCFs doesn't match the stats General	1	47	July 23, 2024
Discordant variant counts between the constraint metrics file and the browser Constraint	4	340	December 5, 2023
Quality score in gnomAD SVs v4.1.0 Structural Variation	1	25	December 18, 2024
gnomAD v4 genome and exome sample counts per genetic ancestry groups General	4	189	May 16, 2024

Variant QC SNV pass count v4

Related topics