Spliceai_ds_max scores don't match spliceailookup

Ruby_Dawes · January 28, 2024, 11:55am

Hi, i’ve noticed that for most of the variants the spliceai_ds_max score listed in the v4 VCF files doesn’t seem to match the maximum delta score as seen in the spliceailookup tool. For example the variant chr21-10649707-C-A has spliceai_ds_max=0.240000 listed in the VCF, but when you look at the scores on spliceailookup (SpliceAI Lookup) the maximum delta shown in 0.86. Has anyone else noticed this/ found an explanation? I’m not sure if it’s just due to some difference in how the spliceai_ds_max scores were generated or if there was an error when they were generated. Pangolin scores do seem to match.

Qin · September 10, 2024, 9:54pm

@Ruby_Dawes Sorry for the belated reply.
For the gnomAD v4 release, we used Illumina’s precomputed spliceAI scores, which inevitably leads to some discrepancies with the SpliceAI Lookup Browser. After consulting with Kishore Jaganathan from Illumina, the reason for the discrepancy in your specific example became clear: the precomputed scores were originally generated using the hg19 genome build (Gencode v24) and later lifted over to hg38. However, only 19,306 out of 20,275 genes successfully passed the liftover sanity checks, and IGHV1OR21-1 was not one of them.

Additionally, Illumina’s score computation involved selecting a single transcript per gene. Kishore confirmed that these scores, based on hg19 and Gencode v24, are outdated. For a more reliable prediction, we recommend using the SpliceAI Lookup Browser with the latest Gencode version, the default distance of 500, and masked scores.

If you’d like to explore reasons behind discrepancies in other genes that successfully lifted over to hg38 or reproduce the SpliceAI scores as seen in the precomputed table, you can refer to this guide: discrepancy between pre-computed scores and spliceAI scores · Issue #26 · Illumina/SpliceAI · GitHub.

Ruby_Dawes · September 16, 2024, 3:53pm

Hi @Qin , thanks so much for your response!
Ah, I see now- for some reason when I read the gnomAD v4 announcement, I thought new spliceAI scores had been generated to supersede the original precomputed scores with updated genome build/annotations - which is what led to my confusion! Looking back at the announcement now, i’m not sure why I drew that conclusion. Probably wishful thinking
Thanks for your help!
Cheers,
ruby

Topic		Replies	Views
SV genotype quality threshold Structural Variation	1	135	June 20, 2024
Discordant variant counts between the constraint metrics file and the browser Constraint	4	345	December 5, 2023
PhyloP Question- Different on UCSC browser General	1	151	July 11, 2024
The list of variants with incorrect allele frequencies in gnomAD-v4 General	2	329	May 6, 2024
Irregularities in the CNV v4.0 control dataset Structural Variation	1	322	April 23, 2024

Spliceai_ds_max scores don't match spliceailookup

Related topics