Hi,
I have a question regarding multiallelic CNVs. In my work, I’m aiming to represent each MCNV as a DEL and a DUP, summing the allele frequencies from each copy state into two new totals. For autosomal chromosomes it is quite straightforward, with CN = 0 and CN = 1 being deletion states and everything CN > 2 being duplication states. For chr X and Y it is important to know what copy state is to be considered the “reference”.
The strategy that I wanted to implement is the following:
- (1) For XX counts on X and XY counts on X inside PAR, CN = 2 is the reference as for autosomal chromosomes.
- (2) For XY counts on X outside PAR, CN = 1 is the reference (so CN = 0 is a deletion, CN > 1 is a duplication)
- (3) For SVs on chr Y, CN = 1 is the reference as well
However, when inspecting the data (using gnomAD VCFs), the most frequent copy states for (3) seem to be CN = 2, suggesting that the counts were normalized to have similar reference states as the other chromosomes. The copy number distribution visualization in the gnomAD browser also shows CN = 2 in a different color as the other copy states (e.g. this example). For (2) cases, it seems that this normalization did not happen in v4 and CN = 1 is indeed the reference, although the v2 SV data seems to also show the normalization to CN = 2 for these cases.
These are just observations from inspecting the data, so it would be great to know how I can correctly interpret these X and Y MCNVs.
Thanks in advance for any help!