You list many contributing resources, including ESP, 1000 Genomes and PAGE for gnomAD v4. Is it fair to assume that the data in those resources is largely or entirely subsumed by gnomAD v4? Are there specific resources that are listed but are not largely/entirely. subsumed by the gnomAD v4 dataset?
It varies but for these three datasets, I think we mostly consume all samples that passed quality metrics and were not related. The exception is that I think there was a subset of ESP that did not have sufficient consent (perhaps a quarter?? - I’d need to investigate further) that we were not allowed to include. Also, for TOPMed, we only have the data from those samples sequenced at the Broad which is about 30,000 so a much smaller fraction than the total TOPMed dataset. If you have specific questions, I’d ask Sam Baxter at samantha@broadinstitute.org
1 Like