RUNX2 coordinates in STR template catalog

Hello,

I noticed that, for example in variant_catalog_without_offtargets.GRCh37.json, your reference coordinates for the RUNX2 locus only comprise 14 repeats, whereas in the literature and other references, the consensus on wildtype repeat size is 17.

My question is whether there is a reason for the different repeat size used in the template that I cannot seem to figure out? I would greatly appreciate it if you could elaborate on that.

Thank you very much in advance and kind regards!

Thanks for the question. Originally, I had hoped that limiting ExpansionHunter locus definitions to the relatively pure GCG repeat interval (and excluding the GCAGCTGCA suffix) would improve ExpansionHunter’s average genotype quality at this locus (like it does for some other loci). However, manual review of the read visualizations suggests that the genotype quality remains low for RUNX2:

As you point out, other resources include the 3 additional repeats (also seen in this comparison to STRchive and other catalogs:
str-analysis/str_analysis/variant_catalogs/scripts/gnomAD_STRchive_comparison.txt at 9984a588dffc6bbbeb2e8c8251bba1ce660b3f4b · broadinstitute/str-analysis · GitHub ), so I’ve now updated the catalog JSON files to use the 17 repeat coordinates.
The gnomAD calls will remain based on the 14 repeat definition.

Thank you very much for your explanation!
This is very helpful!