Kinship coefficient calculation

Ardalan Naseri; Junjie Shi; Xihong Lin; Shaojie Zhang; Degui Zhi

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Kinship coefficient calculation

AN Ardalan Naseri

JS Junjie Shi

XL Xihong Lin

SZ Shaojie Zhang

DZ Degui Zhi

This method is extracted from research article: PLoS Genet, Jan 2021

RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID

DOI: 10.1371/journal.pgen.1009315

Request a Protocol

Ask a question

Favorite

Kinship coefficients are calculated among pairs sharing at least one IBD segment, which is typically quite sparse for samples from outbred populations. First, the IBD segments are separated into IBD1 and IBD2 segments. IBD1 segments are haploid matches between any pair of individuals where only a pair of haplotypes are involved. IBD2 segments are diploid matches where both haplotypes of a pair of individuals match, more specifically both haplotype matches were inherited from common ancestor(s). Following a similar decision-making process of KING that uses the kinship coefficient (ϕ) and the fraction of IBD0 segments (π₀), we also calculate these quantities but by using a data-driven approach.

Table 1 contains the expected kinship coefficient and the threshold cutoffs for inferring different degrees of relatedness, following KING’s decision boundaries. Table 2 shows the threshold cutoff for separating parent/offspring and full-sibling pairs using IBD2 segments. Using simulated data (see Simulated datasets subsection), we verified that the expected decision boundaries for different degrees of relatedness up to 4^th degree are consistent with computed kinship coefficients from RAFFI (Fig 1).

Kinship coefficients are computed by the total sum of IBDs from RaPID results among pairs with different degrees of relatedness data in simulated data. Different degrees of relatedness (up to 4^th degree) can be easily distinguished using the kinship coefficients.

The main reason we adopt the data-driven approach is that due to imperfections of haplotype phasing, the lengths of the detected IBD1 and IBD2 segments might be shorter than their real length. As a result, the IBD segments between even very close relatives such as parent-offspring or full-siblings may not extend to their expected length. We observe that phasing errors affect the lengths of IBD segments approximately proportionally (Fig 2A, to be detailed in the next sections). Based on this observation, we introduce an adjustment factor α as the fraction of the full IBD segments that are detectable (see the next section on how to estimate α). For relatedness estimates, we first calculate the raw values of the kinship coefficient (ϕ) and the fraction of IBD2 segments (π₂):

where IBD1 denotes the length of the genome covered by IBD1 segments and IBD2 denotes the length of the genome covered by IBD2 segments, and L denotes the total length of the genome.

Kinship coefficients computed by the total sum of IBDs using RaPID in a dataset with phasing and genotyping errors with (a) the expected kinship coefficient thresholds, and (b) adjusted kinship coefficient thresholds for different degrees of relatedness accounting for phasing/genotyping errors.

We then calculate the adjusted kinship coefficient (ϕ^α) and the fraction of IBD2 segments (π₂^α) as the estimates of the true ϕ and π₂ values in the presence of phasing errors:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol