Hi folks,
In previous versions of gnomAD, the download data contained explicit flags for QC-related issues, as well as annotation issues such as LOF variants. In gnomAD v4.1, at least as I’m seeing in the VCF files, the only flags (which are in the VCF Filter field) indicate if there are flags on the genome data, exome data or both. In the current gnomAD, why might variants be flagged in the genome and/or exome data, and is there any way to get specific flagging information?
Thanks!
Melissa
Hi Melissa,
The specific exome and genome filters can be found in the joint VCF within the “exomes_filters” and “genomes_filters” INFO fields, respectively. The relevant headers in the VCF provide more information.
##FILTER=<ID=BOTH_FILTERED,Description="Failed variant filters in both exomes and genomes datasets. Refer to 'exomes_filters' and 'genomes_filters' within INFO for more information">
##FILTER=<ID=EXOMES_FILTERED,Description="Failed variant filters in the exomes dataset and either passed all variant filters in the genomes dataset or the variant was not present in the genomes dataset. Refer to 'exomes_filters' within INFO for more information">
##FILTER=<ID=GENOMES_FILTERED,Description="Failed variant filters in the genomes dataset and either passed all variant filters in the exomes dataset or the variant was not present in the exomes dataset. Refer to 'genomes_filters' within INFO for more information">
##INFO<ID=exomes_filters,Number=.,Type=String,Description="Filters' values from the exomes dataset.">
##INFO=<ID=genomes_filters,Number=.,Type=String,Description="Filters' values from the genomes dataset.">
In the separate VCF files, you can find annotations in ##INFO=<ID=vep,Number=.,Type=String,Description=“Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|ALLELE_NUM|DISTANCE|STRAND|FLAGS|VARIANT_CLASS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|MANE_SELECT|MANE_PLUS_CLINICAL|TSL|APPRIS|CCDS|ENSP|UNIPROT_ISOFORM|SOURCE|DOMAINS|miRNA|HGVS_OFFSET|PUBMED|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|TRANSCRIPTION_FACTORS|LoF|LoF_filter|LoF_flags|LoF_info”>.
Thanks - this is super-helpful!