The per-nucleotide read coverage data is used to evaluate this score. To evaluate the probability that the contig originates from a single transcript (i.e., it is not chimeric), a Bayesian segmentation analysis of the per-nucleotide coverage depth is performed. For a correctly assembled contig, it is assumed that the distribution of per-nucleotide coverage values in that contig is best described by a single Dirichlet distribution, i.e., all nucleotides in the same transcript should have the same expression level, and thus should be best modeled as a stochastic sample from a single distribution. In contrast, a contig that is a chimera derived from concatenation of two or more transcripts will have per-nucleotide coverage values that are best described by two or more different Dirichlet distributions. The probability that the distribution of per-nucleotide read coverage values comes from a single Dirichlet distribution is evaluated using a Bayesian segmentation algorithm previously developed for analysis of changes in nucleotide composition (Liu and Lawrence 1999). To facilitate the use of this method, the per-nucleotide coverage along the contig is encoded as a sequence of symbols in an unordered alphabet by taking log2 of the read depth rounded to the nearest integer. As the probability will be a value between 0 and 1, this probability is used directly as s(Cseg).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.