I’ve run across a multi-allellic variant, and I’m trying to decide if it’s real, or a sequencing / alignment artifact.
On chr1 at 1355427 these 2 variants are reported:
a deletion: gnomAD
an insertion: gnomAD
The first variant reads as if the reference sequence is GCCCCGGT, and the ALT allele has 7bp are deleted, leaving just a G
The second variant reads as if the reference sequence is GCCCCGGT (I added some flanking bases to harmonize the REF with the first variant), and the ALT allele has 7bp added, leaving GCCCCGGTCCCCGGT
This region has a lot of repeated CCCCNN sequences, visible in the Read Data. It’s also flagged as low-complexity. I can see this contribution to either real STR-like expansion, or sequencing errors. Both of these variants are very rare, as expected.
This issue was discovered during a plink merge between two datasets, and these were flagged as same-position errors. Only looking at the plink .bim files, they appeared to be flipped alleles, but based on the gnomad data, are both variants real?