Hi,
There seems to be an error in the vcf files (at least in genome versions, I did not look at exome) in the INFO
column, for the field vep
.
Files investigated:
64.64 GiB 2023-11-01T00:03:45Z gs://gcp-public-data--gnomad/release/4.0/vcf/genomes/gnomad.genomes.v4.0.sites.chr1.vcf.bgz
890.03 MiB 2023-11-01T00:03:47Z gs://gcp-public-data--gnomad/release/4.0/vcf/genomes/gnomad.genomes.v4.0.sites.chrY.vcf.bgz
for “synonymous” variants,
the value of the vep
field is typically:
G|synonymous_variant|LOW|XKR3|ENSG00000172967|Transcript|ENST00000331428|protein_coding|4/4||ENST00000331428.5:c.985T>C|ENSP00000331704.5:p.Leu329=|1088|985|329|L|Ttg/Ctg|1||-1||SNV|HGNC|HGNC:28778||||1|P1|CCDS42975.1|ENSP00000331704||Ensembl|||PANTHER:PTHR14297&PANTHER:PTHR14297&Pfam:PF09815||||||||||||,G|synonymous_variant|LOW|XKR3|ENSG00000172967|Transcript|ENST00000684488|protein_coding|4/4||ENST00000684488.1:c.985T>C|ENSP00000507478.1:p.Leu329=|1116|985|329|L|Ttg/Ctg|1||-1||SNV|HGNC|HGNC:28778|YES|NM_001386955.1|||P1|CCDS42975.1|ENSP00000507478||Ensembl|||Pfam:PF09815&PANTHER:PTHR14297&PANTHER:PTHR14297||||||||||||,G|synonymous_variant|LOW|XKR3|150165|Transcript|NM_001318251.3|protein_coding|4/4||NM_001318251.3:c.985T>C|NP_001305180.1:p.Leu329=|1091|985|329|L|Ttg/Ctg|1||-1||SNV|EntrezGene|HGNC:28778|||||||NP_001305180.1||RefSeq|||||||||||||||,G|synonymous_variant|LOW|XKR3|150165|Transcript|NM_001386955.1|protein_coding|4/4||NM_001386955.1:c.985T>C|NP_001373884.1:p.Leu329=|1116|985|329|L|Ttg/Ctg|1||-1||SNV|EntrezGene|HGNC:28778|YES|ENST00000684488.1|||||NP_001373884.1||RefSeq|||||||||||||||,G|synonymous_variant|LOW|XKR3|150165|Transcript|NM_001386956.1|protein_coding|4/4||NM_001386956.1:c.985T>C|NP_001373885.1:p.Leu329=|1057|985|329|L|Ttg/Ctg|1||-1||SNV|EntrezGene|HGNC:28778|||||||NP_001373885.1||RefSeq|||||||||||||||,G|synonymous_variant|LOW|XKR3|150165|Transcript|NM_001386957.1|protein_coding|6/6||NM_001386957.1:c.985T>C|NP_001373886.1:p.Leu329=|1511|985|329|L|Ttg/Ctg|1||-1||SNV|EntrezGene|HGNC:28778|||||||NP_001373886.1||RefSeq|||||||||||||||,G|synonymous_variant|LOW|XKR3|150165|Transcript|NM_175878.5|protein_coding|4/4||NM_175878.5:c.985T>C|NP_787074.2:p.Leu329=|1088|985|329|L|Ttg/Ctg|1||-1||SNV|EntrezGene|HGNC:28778|||||||NP_787074.2||RefSeq|||||||||||||||,G|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00000301023|CTCF_binding_site||||||||||1||||SNV||||||||||||||||||||||||||,G|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00001057386|TF_binding_site||||||||||1||||SNV||||||||||||||||||||||||||,G|TF_binding_site_variant|MODIFIER|||MotifFeature|ENSM00525037649|||||||||||1||1||SNV||||||||||||||||||ENSPFM0378|7|N|0.059|MEIS2&MEIS3||||,G|TF_binding_site_variant|MODIFIER|||MotifFeature|ENSM00145341180|||||||||||1||-1||SNV||||||||||||||||||ENSPFM0379|1|N|-0.070|MEIS2&MEIS3&TGIF2&TGIF2LX&PKNOX1&PKNOX2&TGIF1||||
You can see that this string contains a number of =
signs, for instance ENSP00000331704.5:p.Leu329=
However, a field value in the INFO column of a VCF file cannot contain an =
sign since this sign is reserved for assigning the value to the vep
variable/field, (and the field value in a VCF is not isolated by quotes).
This formatting error crashes parsing programs such as snpsift, bcftools, etc…
Thank you for your attention and hoping to help,
Best
Christophe