Feedback from using the GraphQL API for ClinPGx

Hi, I’ve got some feedback and questions about using the gnomAD API to pull basic frequency data.

For background, I’m a developer at ClinPGx. We’ve used gnomAD v2 and v3 data in our system for a while and we want to pull and use v4 data for our annotated variants (a few thousand).

Previously, we used the Ensembl API to get gnomAD data but for this pull I wanted to try using the “new” (to us, at least) GraphQL API to get the data directly from you. I’ve heard of GraphQL but never used it in production before attempting this data load. I’m not using a dedicated GraphQL library, just constructing the queriers through templates and using standard JSON parsing libraries.

Here are three places where I’m running into problems adopting the gnomAD API.

Exploring the API

First, figuring out the available queries as a GraphQL beginner was a challenge. The query builder at GraphiQL is nice but it was hard to explore what data is available and how to filter it. Eventually, I ended up using the Bruno API client which has a nice UI for GraphQL queries.

This is mostly a limitation stemming from my inexperience with GraphQL but I think there may be more you could do to document common use cases, specific example queries, and best practices.

API Limits

I needed to pull frequency data for the alleles of a few thousand location records (equivalent to a dbSNP RSID).

At first, I didn’t understand GraphQL very well so I sent a request for each individual variant record with a query like so:

query Variant($variantId: String, $dataset: DatasetId!) {
  variant(variantId: $variantId, dataset: $dataset) {
    ref
    alt
    caid
    joint {
      ac
      an
      populations {
        id
        ac
        an
      }
    }
    rsid
    variantId
  }
}

Quickly, I ran into the API request limit of 10 requests per minute. This is pretty slow compared to other types of APIs but I configured my script to work within the limit. This took a long time to complete and still ran into random response errors every few hundred records.

I read the GraphQL docs and found out you can batch requests. So I rewrote my requests to batch like so

{
  A22_42126619_G_A: variant(variantId: "22-42126619-G-A", dataset: gnomad_r4) {
    ...variantFields
  }

A1_97593250_CT_C: variant(variantId: "1-97593250-CT-C", dataset: gnomad_r4) {
    ...variantFields
  }

AX_101400692_G_T: variant(variantId: "X-101400692-G-T", dataset: gnomad_r4) {
    ...variantFields
  }


}
fragment variantFields on VariantDetails {
          ref
          alt
          caid
          joint {
            ac
            an
            populations {
              id
              ac
              an
            }
          }
          rsid
          variantId
        }

It worked for small batches which was promising. I didn’t see a documented batch limit but I found out when requesting 100 variants in one batch that the limit was 25 in an error response. I adjusted to 25 variants in my request but then the responses either timed out or gave other errors. Through trial and error I limited the batch size to 5 and that seemed to give consistent results. So 5 variants every 6 seconds is better than 1 but not a lot.

Reference Alleles

When loading the alleles for a location I would like to get a frequency for every allele (including reference). For example, according to dbSNP rs1060463 has C, G, A, and T alleles. If I issue a request to the “variant_search” query in GraphQL, I only get back the 19-15914366-C-T variant_id which tells me gnomAD has counts for the T allele. However, there is no mention of the C, G, or A alleles that dbSNP mentions. I assume G and A are not seen in the gnomAD dataset, that’s fine, but since the C allele is not mentioned either it looks like there is no information about it either.

In the variant record for 19-15914366-C-T it says the allele count is 952785 and the allele number is 1613380. What are the other 660595 alleles at that position? Are they all reference? Are some of them other alt alleles? There is no place that explicitly says one way or another. I can make assumptions but for accuracy’s sake I prefer not to do so.