I noticed that the flags LC_pLoF are put on different variants in v4 vs v2. For some genes, there are huge differences, for example RP1, EN1 or AP5B1. This is particularly true for genes with long last exon.
When using GeniE, this makes a huge difference in estimate prevalence if we use gnomAD v2 or v4.
It seems to me that the v2 flags are OK while there is a bug with v4.
Did you already investigate the matter?
Thank you for your help and for providing amazing ressources to the community.
Mathieu
@Mathieu_Quinodoz We got emails before about the “HC” and “LC” annotated differently in GRCh37 for v2 and GRCh38 for v3, since your concern is also the last exon, this might be able to explain why you’re seeing the difference.
In LOFTEE, a LOF variant falls into ‘LC’ with an ‘END_TRUNC’ filter if:
they fail 50bp rule (FYI, for both GRCh37 and GRCh38, we stop using 50bp rule on stop_gained and frameshift variants);
their gerp_dist less than a cutoff (for GRCh37, the cutoff was 180; but for GRCh38, the cutoff is -58, which lowers the bar for HC classification, potentially moving some previously LC variants to HC status, even near the protein’s end);