There are 1314 SVs in gnomAD v4 SV that are over 80% of their chromosome, and 4728 that are over 50% of their chromosome
As these are so large, they basically overlap with everything, so combined they cause a huge numbers of false positive overlaps
Yes, they are almost all flagged with a FILTER, but if someone forgets they will get a lot of false positive overlaps (which could be bad if they are using this to discard variants with overlaps above an AF threshold)
A lot of these are not biologically plausible, an example is gnomAD-SV_v3_DEL_chr1_2a75678c has SVLEN=203,277,062
chr1 in GRCh38 - NC_000001.11 = 248,956,422 so this represents a deletion of 82% of chr1
There are 2 reported homozygotes - but it is not plausible there are people walking around with only 18% of chromosome 1 left with no ill effect
Example test program:
import cyvcf2
CONTIG_SIZES_GRCH38 = {
"chr1": 248956422,
"chr2": 242193529,
"chr3": 198295559,
"chr4": 190214555,
"chr5": 181538259,
"chr6": 170805979,
"chr7": 159345973,
"chr8": 145138636,
"chr9": 138394717,
"chr10": 133797422,
"chr11": 135086622,
"chr12": 133275309,
"chr13": 114364328,
"chr14": 107043718,
"chr15": 101991189,
"chr16": 90338345,
"chr17": 83257441,
"chr18": 80373285,
"chr19": 58617616,
"chr20": 64444167,
"chr21": 46709983,
"chr22": 50818468,
"chrX": 156040895,
"chrY": 57227415
}
over_50_percent = 0
over_80_percent = 0
for v in cyvcf2.Reader("/data/annotation/VEP/annotation_data/GRCh38/gnomad.v4.0.sv.merged.vcf.gz"):
contig_size = CONTIG_SIZES_GRCH38[v.CHROM]
if svlen := v.INFO.get("SVLEN"):
svlen = int(svlen)
if svlen > (contig_size * .8):
over_80_percent += 1
if svlen > (contig_size * .5):
over_50_percent += 1
print(f"{over_80_percent=}, {over_50_percent=}")