The purpose of indexing a population graph is to allow for efficient substring queries on the paths that span across nodes and edges of the graph (Fig. 6). Given any non-trivial sized graph, enumerating all possible paths is often unfeasible, given the exponential nature of traversing all combinations of nodes and edges.
Reporting the haplotyped k-paths in the population graph G transforms it into the null graph GE, here k=4. a A population graph with sequence encodings on the nodes. b Indexing of k-paths based on three operations; Collapsing, merging adjacent nodes. Extension, assigning k-length substrings as prefixes or suffixes between adjacent nodes. Duplication, copying of nodes and redistribution of edges among copies. c The null graph encodes all 4-length paths in the original graph, coloring of lines and text denote the origin of assigned prefixes (green) and suffixes (red) (note that colored lines are not edges in the graph)
CHOP constrains queries through a graph to be part of a haplotype with which the population graph was built. Hereto, CHOP transforms graph G into a null graph GE such that every node in GE represents a sequence of length k or longer and that every substring of length k originating from the encoded haplotypes in G is also a substring in a node of GE. Meaning that if sequencing reads are true error-free samplings of an underlying haplotype and are of the same length (or shorter) than the chosen value of k, they should correspond to a substring of a node in GE. This, in turn, enables the application of any existing read aligner to place reads onto GE. Through this transformation of G to GE, all haplotyped paths of at least length k in the graph are accounted for. The transformation is driven by three operators: collapse, extend, and duplicate (pseudocode is given in Additional file 1: Listing S1), explained throughout the rest of this section. While the output of CHOP can depend on the order of these three operations, we observed no significant difference in runtime or indexing outcome for different orderings.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.