Kinship coefficient calculation

AN Ardalan Naseri
JS Junjie Shi
XL Xihong Lin
SZ Shaojie Zhang
DZ Degui Zhi
request Request a Protocol
ask Ask a question
Favorite

Kinship coefficients are calculated among pairs sharing at least one IBD segment, which is typically quite sparse for samples from outbred populations. First, the IBD segments are separated into IBD1 and IBD2 segments. IBD1 segments are haploid matches between any pair of individuals where only a pair of haplotypes are involved. IBD2 segments are diploid matches where both haplotypes of a pair of individuals match, more specifically both haplotype matches were inherited from common ancestor(s). Following a similar decision-making process of KING that uses the kinship coefficient (ϕ) and the fraction of IBD0 segments (π0), we also calculate these quantities but by using a data-driven approach.

Table 1 contains the expected kinship coefficient and the threshold cutoffs for inferring different degrees of relatedness, following KING’s decision boundaries. Table 2 shows the threshold cutoff for separating parent/offspring and full-sibling pairs using IBD2 segments. Using simulated data (see Simulated datasets subsection), we verified that the expected decision boundaries for different degrees of relatedness up to 4th degree are consistent with computed kinship coefficients from RAFFI (Fig 1).

Kinship coefficients are computed by the total sum of IBDs from RaPID results among pairs with different degrees of relatedness data in simulated data. Different degrees of relatedness (up to 4th degree) can be easily distinguished using the kinship coefficients.

The main reason we adopt the data-driven approach is that due to imperfections of haplotype phasing, the lengths of the detected IBD1 and IBD2 segments might be shorter than their real length. As a result, the IBD segments between even very close relatives such as parent-offspring or full-siblings may not extend to their expected length. We observe that phasing errors affect the lengths of IBD segments approximately proportionally (Fig 2A, to be detailed in the next sections). Based on this observation, we introduce an adjustment factor α as the fraction of the full IBD segments that are detectable (see the next section on how to estimate α). For relatedness estimates, we first calculate the raw values of the kinship coefficient (ϕ) and the fraction of IBD2 segments (π2):

where IBD1 denotes the length of the genome covered by IBD1 segments and IBD2 denotes the length of the genome covered by IBD2 segments, and L denotes the total length of the genome.

Kinship coefficients computed by the total sum of IBDs using RaPID in a dataset with phasing and genotyping errors with (a) the expected kinship coefficient thresholds, and (b) adjusted kinship coefficient thresholds for different degrees of relatedness accounting for phasing/genotyping errors.

We then calculate the adjusted kinship coefficient (ϕα) and the fraction of IBD2 segments (π2α) as the estimates of the true ϕ and π2 values in the presence of phasing errors:

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A