I’m trying to annotate a small (several hundred) dataset of variants. I’m using the API but after about 10 queries I seem to get blocked (I’m using gnomadR so I don’t see the error message). is there a way to submit a set of variant_ids and get an AF back? currently I’m iterating one by one, and that is surely causing needless network traffic and compute…what’s the “right” way to do this (other than spinning up my on cluster…)?
Regarding gnomadR, my familiarity with this specific library is limited. I’ve created a Python script below designed to fetch data for a list of variants without breaching rate limits. It’s important to note that our API caters to the browser’s front end, where the typical use case involves examining variants individually or within specific genes or regions. Consequently, we do not currently offer support for batch queries of a miscellaneous list of variants. Should there be an interest in accessing variants by gene or region, I can provide examples tailored to those scenarios. We recognize the constraints of the current system and hope to expand the API’s functionality in the future.
# pip install requests
import requests
import time
# GraphQL endpoint
url = "https://gnomad.broadinstitute.org/api"
# The GraphQL query template
query = """
query GnomadVariant($variantId: String!, $datasetId: DatasetId!) {
variant(variantId: $variantId, dataset: $datasetId) {
variant_id
reference_genome
chrom
pos
ref
alt
colocated_variants
faf95_joint {
popmax
popmax_population
}
coverage {
exome {
mean
over_20
}
genome {
mean
over_20
}
}
exome {
ac
an
ac_hemi
ac_hom
faf95 {
popmax
popmax_population
}
filters
populations {
id
ac
an
ac_hemi
ac_hom
}
}
genome {
ac
an
ac_hemi
ac_hom
faf95 {
popmax
popmax_population
}
filters
populations {
id
ac
an
ac_hemi
ac_hom
}
}
flags
lof_curations {
gene_id
gene_symbol
verdict
flags
project
}
rsids
transcript_consequences {
domains
gene_id
gene_version
gene_symbol
hgvs
hgvsc
hgvsp
is_canonical
is_mane_select
is_mane_select_version
lof
lof_flags
lof_filter
major_consequence
polyphen_prediction
sift_prediction
transcript_id
transcript_version
}
in_silico_predictors {
id
value
flags
}
}
}
"""
# Function to query for a single variant
def query_variant(variant_id, dataset_id="gnomad_r4"):
# Query variables
variables = {"variantId": variant_id, "datasetId": dataset_id}
# HTTP POST request
response = requests.post(url, json={"query": query, "variables": variables})
if response.status_code == 200:
return response.json() # Returns the JSON response
else:
raise Exception(
f"Query failed to run by returning code of {response.status_code}. {response.text}"
)
# List of variants to query
variants = ["1-55052746-GT-G", "1-55058620-TG-T"] # Add your variants here
# Results list
results = []
# Loop through each variant and query
for variant in variants:
result = query_variant(variant)
results.append(result)
time.sleep(6) # Sleep to respect the 10 queries per minute limit
# results now contains the response for each variant