Hi, thanks for gnomAD, everyone in clinical genetics needs you thanks so much etc, I hope I don’t sound too grumpy…
The VCF INFO fields seem to change almost every release, eg “Non Finnish European” history:
v2 AF_nfe
v3 AF-nfe
v3.1 AF_nfe
v4.0 nfe_AF
This causes scripts to break and extra effort to maintain conversion code etc. Ideally, you’d just keep them the same and not break backwards compatability, but if you feel you need to do this, it would be useful to document it so people can see how the fields have changed over time
For instance a table with rows being a label like “Non-finnish Euroean Allele Frequency” and columns being gnomAD releases, and the cells being as my example at the start.
This would help find out how eg some INFO fields disappeared between versions, some were renamed etc
Thank you for the feedback and we understand how these breaking changes are frustrating. For your example, each version of gnomAD should contain <metric>_<sampling_grouping> for call statistics. In the v4 example you provide, sampling grouping, nfe, is first. Could you confirm this within the file your accessing? I am seeing metric first within the v4 exome chrY VCF.
#CHROM POS ID REF ALT QUAL FILTER INFO
chrY 2784606 . C T . AC0 AC=0;AN=0;nhomalt_XX=2147483647;AC_XY=0;AN_XY=0;nhomalt_XY=0;nhomalt=0;nhomalt_afr_XX=2147483647;A
C_afr_XY=0;AN_afr_XY=0;nhomalt_afr_XY=0;AC_afr=0;AN_afr=0;...
Each version of gnomAD, and even v4 exomes and genomes, contain some different sample groupings, e.g. subsets were dropped in v4 exomes but v4 genomes contain subsets like HGDP and TGP. We only compute statistics for the sample groupings within each dataset so that should explain why a large number of fields disappeared between v3 and v4.
Again thank you for the feedback and we will be more cognizant of any format changes that will cause headaches for users and will look to update documentation for these types of changes.
Hi, I was looking in the Structural Variants VCF, and thought you had changed them everywhere.
So - for my example non-Finnish Europeans, it seems the issue isn’t that gnomAD has changed it’s more that for that field, SV file is inconsistent with the genome/exome INFO fields.
Eg:
SV has “nfe_AF” while genomes/exomes is “AF_nfe”
Another one:
SV has “POPMAX_AF” while genomes/exomes is “AF_grpmax”
But some fields did change from v3 to v4, so in general what I’d like is eg documenting:
“nonpar” changed to “non_par”
“X_popmax” changed to “X_grpmax”