The positions of two deletions are identical. Why weren’t they merged into a single entry?

Hello. The position of DEL_CHR11_3DCD2FD1 and DEL_CHR11_F213CF06 are identical. Why weren’t they merged into a single entry? Can I merge them?

Position11:134738783-134739009

I found similar cases as follows:

 bcftools query -f '%CHROM-%POS-%ALT-%INFO/END-%CHR2-%INFO/POS2-%INFO/END2-%INFO/SVLEN\n' -i "SVTYPE = \"DEL\"" ../../original/gnomad.v4.1.sv.sites.vcf.gz | sort | uniq -d
chr11-134738783-<DEL>-134739009-chr11-.-.-226
chr11-68233234-<DEL>-68233798-chr11-.-.-564
chr1-201209715-<DEL>-201209798-chr1-.-.-83
chr20-50022383-<DEL>-50022483-chr20-.-.-100
chr21-27038559-<DEL>-27038636-chr21-.-.-77
chr21-45484269-<DEL>-45484471-chr21-.-.-202
chr3-177242504-<DEL>-177242908-chr3-.-.-404
chr3-49259693-<DEL>-49259766-chr3-.-.-73
chr5-110233-<DEL>-110378-chr5-.-.-145
chr5-124769362-<DEL>-124769439-chr5-.-.-77
chr6-140634236-<DEL>-140634421-chr6-.-.-185
chr6-87009825-<DEL>-87009902-chr6-.-.-77
chr7-75767243-<DEL>-75767347-chr7-.-.-104

Thank you for your help.

Nobutaka

Hi Nobutaka,

Thank you for pointing this out. We noticed this issue after the data release—the duplicated variants remain in the dataset because short-read sequencing often struggles to accurately detect structural variants in repetitive regions. For interpretation, I recommend focusing on variants with higher quality scores (the “QUAL” column in the VCF, or “Quality score” in the browser).

I hope this helps!

1 Like