Spliceai_ds_max scores don't match spliceailookup

Hi, i’ve noticed that for most of the variants the spliceai_ds_max score listed in the v4 VCF files doesn’t seem to match the maximum delta score as seen in the spliceailookup tool. For example the variant chr21-10649707-C-A has spliceai_ds_max=0.240000 listed in the VCF, but when you look at the scores on spliceailookup (SpliceAI Lookup) the maximum delta shown in 0.86. Has anyone else noticed this/ found an explanation? I’m not sure if it’s just due to some difference in how the spliceai_ds_max scores were generated or if there was an error when they were generated. Pangolin scores do seem to match.

1 Like

@Ruby_Dawes Sorry for the belated reply.
For the gnomAD v4 release, we used Illumina’s precomputed spliceAI scores, which inevitably leads to some discrepancies with the SpliceAI Lookup Browser. After consulting with Kishore Jaganathan from Illumina, the reason for the discrepancy in your specific example became clear: the precomputed scores were originally generated using the hg19 genome build (Gencode v24) and later lifted over to hg38. However, only 19,306 out of 20,275 genes successfully passed the liftover sanity checks, and IGHV1OR21-1 was not one of them.

Additionally, Illumina’s score computation involved selecting a single transcript per gene. Kishore confirmed that these scores, based on hg19 and Gencode v24, are outdated. For a more reliable prediction, we recommend using the SpliceAI Lookup Browser with the latest Gencode version, the default distance of 500, and masked scores.

If you’d like to explore reasons behind discrepancies in other genes that successfully lifted over to hg38 or reproduce the SpliceAI scores as seen in the precomputed table, you can refer to this guide: discrepancy between pre-computed scores and spliceAI scores · Issue #26 · Illumina/SpliceAI · GitHub.

Hi @Qin , thanks so much for your response!
Ah, I see now- for some reason when I read the gnomAD v4 announcement, I thought new spliceAI scores had been generated to supersede the original precomputed scores with updated genome build/annotations - which is what led to my confusion! Looking back at the announcement now, i’m not sure why I drew that conclusion. Probably wishful thinking :laughing:
Thanks for your help!
Cheers,
ruby