Discordant variant counts between the constraint metrics file and the browser

Hi,

I have observed a tendency that constrained genes have discordant counts of observations between the constraint metrics file and the browser in v.4.0.0. For instance, there are 19 observed missense variants in the MANE transcript ([ENST00000301071.12] (gnomAD)) of the TUBA1A gene while there are more than 30 missense variants according to the browser and exported table from the browser.

In comparison, there are 7 observed missense variants for the corresponding transcript (ENST00000301071.7) in both the constraint metrics file and the browser for gnomAD v2.1.1.

Do you know of any reasonable explanation for this discordance? While a fraction of the variants is marked with the AS_VQSR label (for instance, 12-49185035-C-G), the AS_VQSR status do not fully explain the differences.

Thanks for a very useful resource!

Best,

Steffan

Hi Steffan,

Thank you for posting about this issue. There appears to be a bug on the browser which is causing some filtered variants to appear as PASS when searching for them on the gene page or exporting the variants to a CSV. For example, 12-49185641-A-C is currently displaying as a PASS variant on the gene page, but actually has a AS_VQSR filter on the variant page and downloadable Tables and VCFs. We will be putting in a fix for this issue shortly. ENST00000301071.12 only has 19 PASS missense SNVs.

Regards,
Kristen

Dear Kristen,

Thanks a lot for a quick response!

I have been looking in to the different available variables from the VCF file. I find it difficult to find a single AS_VQSR treshold that can be used to filter spurious variants from the VCF in order to obtain identical variant counts compared with the constraint table. Do you know if it is only the AS_VQSR filter that needs to be taken into account or if other measures such as depth and genotype quality also needs to be integrated?

Best,
Steffan

Hi Steffan,

Some variants are also filtered by the “AC0” filter. To include a variant when computing constraint metrics, we also require >= 30X median coverage. However, the missense variants in ENST00000301071.12 look well covered, so I don’t think this is contributing to the differences you are seeing. Sharing the variants that you are including when counting the observed missense variants may help with troubleshooting.

Regards,
Kristen

Thanks again for the suggestions. The combination of including the two filters, AC0 and AS_VQSR, together with a depth cutoff of 30 made a huge difference. The counts are considerably more similar with these changes.

Best,
Steffan