Hi all,
I downloaded the exomes of gnomAD release v4.1.0 and I filtered only for GENE_PHENO=Ensembl and missense variants. For some reason, the number of rows I get is 50,162,936, while in the statistics at the website, only 16,412,219 are declared. What is the reason for that?
Hi @Tehila_Leiman,
We are reporting the number of missense variants found on canonical transcripts. However, even when I don’t filter to canonical, I am only seeing 18M sites with a “missense_variant” in the consequence_term array within any VEP transcript_consequences array. Here is the code I used:
import hail as hl
from gnomad.resources.grch38.gnomad import public_release
ht = public_release("exomes").ht()
ht = ht.filter(
hl.any(
hl.map(
lambda x: (x.consequence_terms.contains("missense_variant")),
ht.vep.transcript_consequences,
)
)
)
ht.count()
It returns 18,231,426. How exactly are you filtering the release to missense?