Hello,
I’m interested in the number of pLoF SNVs observed for a particular set of genes, one being “FGA”. I would like to make a table of those specific variants and have tried to follow the steps previously highlighted to do this on the really helpful forum post. ([Constraint metric - #4 by jkgoodrich)]
For the example gene: “FGA” when I run the following script on hail, I only return 13 variants when the observed should be 16 ( After filtering for: * High-confidence LOFTEE variants (without any flags), * Only variants in the MANE Select transcript, * PASS variants that are SNVs with MAF ≤ 0.1%, and * Exome median depth ≥ 30,).
The question is should I see the exact same number of variants (i.e. 16) that is posted on the table for the gene (screenshot below) or is 13 now the correct number?
The script for hail I am using is below (adapted from previous post, thank you! )
**import hail as hl**
**from gnomad.resources.grch38.gnomad import public_release, coverage**
**from gnomad.utils.vep import filter_vep_transcript_csqs**
# Define the interval for FGA gene
FGA_interval = "chr4:154583128-154590745"
# Get v4.1 exome sites Hail Table and exome coverage.
ht = public_release("exomes").ht()
coverage_ht = coverage("exomes").ht()
# Filter the exome release Hail Table to the FGA gene interval.
ht = hl.filter_intervals(ht, [hl.parse_locus_interval(FGA_interval, reference_genome="GRCh38")])
# Filter to variants in FGA that are LOFTEE high-confidence (with no flags) in the MANE select transcript.
ht = filter_vep_transcript_csqs(
ht,
synonymous=False,
mane_select=True,
genes=["FGA"],
match_by_gene_symbol=True,
additional_filtering_criteria=[lambda x: (x.lof == "HC") & hl.is_missing(x.lof_flags)],
)
# Filter to PASS SNVs with AF <= 0.1% and median exome depth ≥ 30.
ht = ht.filter(
(hl.len(ht.filters) == 0)
& (ht.allele_info.allele_type == "snv")
& (ht.freq[0].AF <= 0.001)
& (coverage_ht[ht.locus].median_approx >= 30)
)
# Output the number of variants
print(f"Number of variants: {ht.count()}")
# Select and show the filtered variants
ht.select(
freq=ht.freq[0],
csq=ht.vep.transcript_consequences[0].consequence_terms,
coverage=coverage_ht[ht.locus],
).show(-1)
Results
Best wishes,
Alex