For each crypt, an artefact filter based on the beta-binomial distribution was applied, which exploits read count information in other crypts from the same individual. More specifically, for each sample, we fitted a beta-binomial distribution to the variant allele counts and sequencing depths of somatic variants across samples from the same individual. The beta-binomial distribution was used to determine whether read support for a mutation varies across samples from an individual, as expected for genuine somatic mutations but not for artefacts. Artefacts tend to be randomly distributed across samples and can be modelled as drawn from a binomial or a lowly overdispersed beta-binomial distribution. True somatic variants will be present at a high VAF in some samples, but absent in others, and are hence best captured by a highly overdispersed beta-binomial. For each variant site, the maximum likelihood estimate of the overdispersion factor (ρ) was calculated using a grid-based method, with values ranging between 10−6 and 10−0.05. Variants with ρ > 0.3 were considered to be artefactual and discarded. The code for this filter is based on the Shearwater variant caller62. We found this to be one of the most effective filters against spurious calls (Supplementary Fig. 1b).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.