Rare and common variants numbers

How many variants have frequency <1% , <0.1% and how many are seen in only one individual? How did this number changed in V4 versus V2?

Thank you for your question!

While we don’t currently have exact numbers for variants with frequencies <1%, <0.1%, or singletons reported directly, you can determine these counts yourself using the gnomAD VCF files or Hail Tables available for download.

Here’s an example of how you can use Hail to query the gnomAD v2 and v4 dataset for variants with allele frequency <1%, <0.1%, and those seen in only one individual (singletons).

import hail as hl
from gnomad.resources.grch37.gnomad import public_release as v2_public_release
from gnomad.resources.grch38.gnomad import public_release as v4_public_release

# Load the gnomAD exome datasets.
v2_ht = v2_public_release("exomes").ht()
v4_ht = v4_public_release("exomes").ht()

def get_variant_count(ht):
    # Filter to PASS variants.
    ht = ht.filter(hl.len(ht.filters) == 0)

    # Count variants with frequency <1%, <0.1%, and singletons (AC == 1).
    return ht.aggregate(
        {
            "Variants <1%": hl.agg.filter(ht.freq[0].AF < 0.01, hl.agg.count()),
            "Variants <0.1%": hl.agg.filter(ht.freq[0].AF < 0.001, hl.agg.count()),
            "Singletons": hl.agg.filter(ht.freq[0].AC == 1, hl.agg.count()),
        }
    )

print(f"v2 counts: {get_variant_count(v2_ht)}")
print(f"v4 counts: {get_variant_count(v4_ht)}")

This script uses Hail to:

  • Load the gnomAD v2 and v4 exome datasets.
  • Filter to only PASS variants.
  • Filter and count variants based on the specified frequency thresholds and singletons.

I quickly ran that to get the following counts for v2.1.1 and v4.1 exomes:

  • v2 counts: {‘Singletons’: 7763393, ‘Variants <0.1%’: 14551940, ‘Variants <1%’: 14795986}
  • v4 counts: {‘Singletons’: 34047562, ‘Variants <0.1%’: 67709028, ‘Variants <1%’: 68398090}

Additionally, you can explore the gnomAD stats page, where we provide high-level summary statistics about the dataset. While we don’t currently break down variant counts by frequency there, it’s a good suggestion, and we can considering adding this type of information in future updates.

You can find more information about downloading the gnomAD VCF files and Hail Tables here.

I hope this helps!