Say I have 2 WGS population datasets- one Indian (10,000 samples) and one European (50,000 samples). The total number of ultra-rare mutations (AF<0.1%) will of course be higher in the european dataset (say for example, 60M vs 30M in the Indian dataset). Now, if I want to create a population specific constraint metric, how do I normalise the rates between the two populations? The datatset with more mutations, will obviously have higher rates of mutations for each context. Is there a way to normalise between different sample sizes, so that I can directly compare between the 2 populations?
I can provide more information if needed.
Thanks