Irregularities in the CNV v4.0 control dataset

Dear gnomAD CNV team,

Thank you for publicly releasing the gnomAD v4.0 data. It’s such a valuable resource. I reached out to your team via email and the email reply suggested I post my query in the forums instead. My query consists of some irregularities I saw. Some of these don’t affect my research, but I thought I’d list them so you can make changes for the next version. Others are a little odd, and I’m hoping to get some clarification before I submit for publication.

The dataset is the “CNV v4.0 non neuro control dataset” excel file

  1. The excel file has a column for gene content. Many CNV variants do not seem to include the genes in their region, or include only some genes. For example, variant 432421_Dup should also include the NSD1 gene (this can be seen on the website gnomAD ). This means if you filter by the NSD1 gene on the excel file, there are no CNVs in the file that involve this gene, which is incorrect. Other examples include Variant 224065__DUP and 167825__DUP. I suspect there may be thousands, actually, with this issue.

  2. 193076__DUP is mislabelled. I think the label was meant to say 16p12.2_DUP.

  3. Variant 17q11.2 is listed in the excel sheet twice. One of these should be 17q11.2-NF1__DEL and the other should be 17q11.2-NF1__DUP

  4. The 22q11.2 distal type 1 and type 2 (del and dup) on the gnomad website may be listed with incorrect coordinates (too short). The CNVs on the gnomad website do not appear to include BCR which is one of the critical genes. Decipher data suggests different coordinates for both type 1 and type 2. Clarifying this is important for my upcoming publication (systematic review on penetrance estimates for 83 CNVs). I was hoping to clarify each CNV (dels and dups) that intersect MAPK1 and BCR, their coordinates and ‘site count’ please? I’d be really grateful for that. Thank you.

  5. The excel file has a variant that is labelled 15q11-q13-BP1-BP3__DEL. I think this is mislabelled - it should be BP2-BP3. The reciprocal duplication with the same coordinates is listed correctly. Relevant to my work, can I check if the 4 individuals listed with this del are actually a BP2-BP3 or a BP1-BP3?

I love the dataset. Thank you for the work you’ve put into releasing it.