Implausibly large structural variants causing false positive overlaps

There are 1314 SVs in gnomAD v4 SV that are over 80% of their chromosome, and 4728 that are over 50% of their chromosome

As these are so large, they basically overlap with everything, so combined they cause a huge numbers of false positive overlaps

Yes, they are almost all flagged with a FILTER, but if someone forgets they will get a lot of false positive overlaps (which could be bad if they are using this to discard variants with overlaps above an AF threshold)

A lot of these are not biologically plausible, an example is gnomAD-SV_v3_DEL_chr1_2a75678c has SVLEN=203,277,062

chr1 in GRCh38 - NC_000001.11 = 248,956,422 so this represents a deletion of 82% of chr1

There are 2 reported homozygotes - but it is not plausible there are people walking around with only 18% of chromosome 1 left with no ill effect

Example test program:

import cyvcf2

CONTIG_SIZES_GRCH38 = {
    "chr1": 248956422,
    "chr2": 242193529,
    "chr3": 198295559,
    "chr4": 190214555,
    "chr5": 181538259,
    "chr6": 170805979,
    "chr7": 159345973,
    "chr8": 145138636,
    "chr9": 138394717,
    "chr10": 133797422,
    "chr11": 135086622,
    "chr12": 133275309,
    "chr13": 114364328,
    "chr14": 107043718,
    "chr15": 101991189,
    "chr16": 90338345,
    "chr17": 83257441,
    "chr18": 80373285,
    "chr19": 58617616,
    "chr20": 64444167,
    "chr21": 46709983,
    "chr22": 50818468,
    "chrX": 156040895,
    "chrY": 57227415
}

over_50_percent = 0
over_80_percent = 0

for v in cyvcf2.Reader("/data/annotation/VEP/annotation_data/GRCh38/gnomad.v4.0.sv.merged.vcf.gz"):
    contig_size = CONTIG_SIZES_GRCH38[v.CHROM]
    if svlen := v.INFO.get("SVLEN"):
        svlen = int(svlen)
        if svlen > (contig_size * .8):
            over_80_percent += 1
        if svlen > (contig_size * .5):
            over_50_percent += 1

print(f"{over_80_percent=}, {over_50_percent=}")