Phased info for STR?

Hi, Thanks for everyone’s hardwork on gnomAD. It truly is a valuable resource that will continue to revolutionize genomic medicine.

I am a physician scientist in a US university. I would like to characterize variants nearby (<1kb) a disease-associated tandem repeat region and compare the genotype of these variants between those who are predicted to be unaffected and affected. For instance, finding that those with X-gene expanded STR tend to have rsXXXX alternative allele. I am grateful for the phased HGDP+1KG callset that the committee has released and I am able to find variant information in that region for each individual. However, when I tried to do a raw sum of all the reference/alternative allele repeats in the region of interest from this callset, the number of repeats are too long (likely bc they should be adjusted with ExpansionHunter as they have been done in the STR files). So I cannot tell the difference between who has normal repeats and who has expanded repeats. Then I downloaded the STR files. While the STR files do have the repeats size, they are not phased and do not contain individual info so I cannot correlate them back to the phased call set.

I would like to ask if you have any recommendations for how to approach this question, perhaps I am missing something, and any datasets or tools I should consider?

Thank you!