After SNP and genetic variant calling (27,28), the genotype at one specific locus is represented by AA, Aa and aa for one homozygote, the heterozygote and the other homozygote, respectively. The three genotypes are then numerically coded by −1, 0, 1 or 0, 1, 2 for the additive effect and 0, 1, 0 for the dominance effect. From Equations (5) and (6), we can easily calculate the two main effect kinship matrices, e.g. and
. With the high-throughput sequencing technology, we can easily generate millions of SNP data. The large volume of data is hard to handle using traditional matrix multiplications. Luckily, we can take advantage of the linearity of matrix theory and partition the whole genotype data into many blocks. Kinship matrix is calculated for each block, then the block specific kinship matrices are added to generate the final kinship matrix.
Let be a
matrix for the additive or dominance genotype effect, where
is the number of markers and n is the sample size. If we partition
into
blocks, and each block
has
individuals and
markers so that
. The partitioned matrix can be written as
, according to matrix block theory, it is easy to prove that
Figure Figure1A1A illustrates the principle to generate the coded additive and dominance genotypic effect and calculate the main kinship matrix through partitioning the additive/dominance marker into blocks.
Mathematical principle to calculate the high-dimensional kinship matrices. (A) Principle to generate the coded additive and dominant genotypic effect and calculate the main kinship matrix through partitioning the additive/dominance marker into blocks. (B) Principle to generate all of the epistatic genotypic marker pairs and calculate the epistatic kinship matrix through partitioning the marker pairs into blocks.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.