Construction of the nuCpos package

The codes for the construction of parameters used in the nuCpos package (ver. 1.2.0) are available online ( Genomic regions covered with the 147-bp non-redundant (unique) chemically mapped nucleosomes and uncovered were defined as nucleosome and linker regions, respectively. The DNA sequences of these regions were used to construct parameters that were transferred to the internal Fortran programs for dHMM-based prediction or calculation of affinity scores. Nucleosomes of which dyads were located within 73 bp of the chromosomal ends were omitted. For construction of the mouse model, hard-masked genomic sequences were used, and nucleosome and linker regions containing N were omitted before parameter construction to avoid potential prediction bias caused by repeat elements. In total, 67,538 nucleosome regions and 50,622 linker regions were obtained for the budding yeast genome (sacCer3); fission yeast (ASM294v2), 75,826 nucleosome and 46,557 linker regions; mice (mm9), 4,147,972 nucleosome and 2,484,347 linker regions.

We developed an R function designated predNuCpos, which predicts nucleosome positioning based on a dHMM, as previously proposed by Xi et al. [16]. Like its ancestral function predNuPoP in the NuPoP package (, ver. 1.34.0), predNuCpos receives a DNA sequence of any length, invokes an internal Fortran program, and outputs the prediction result either in the working directory or in the working environment of R. In predNuCpos, construction of the dHMM is based on chemical maps, as described below.

Parameters used in the predNuCpos function were constructed according to the NuPoP paper [16] using the functionalities of the Biostrings package (, ver. 2.52.0). The parameters were as follows: freqL, one-base frequencies for linker regions; tranL, tranL2, tranL3, and tranL4, First- to 4th-order transition probabilities for linker regions, respectively; freqN4, four-base frequencies at the first four nucleotide positions of nucleosome regions; tranN4, time-dependent 4th-order transition probabilities for nucleosome regions; Pd, linker length distribution that ranges from 1 to 500 bp. Linker sequences of 7–500 bp in length were used for linker model construction, as described elsewhere [16]. freqL and freqN4 were obtained using the oligonucleotideFrequency function of Biostrings; tranL, tranL2, tranL3, tranL4, and tranN4 were obtained using the oligonucleotideTransitions function of Biostrings. Moving average smoothing using the SMA function of the TTR package (, ver. 0.23-4) at a 3-bp window was applied to the 4th-order transition probability parameter tranN4 and to the linker length distribution parameter Pd. The parameters used in predNuCpos were also used in another function, mutNuCpos, which predicts the effect of genetic alterations on nucleosome positioning.

Xi et al. proposed the HBA score [16], which is also referred to as the ‘nucleosome affinity score’, as the log likelihood ratio of the probability for a given 147-bp sequence to be a nucleosome versus a linker. According to their definition, the HBA score for the 147-bp region x centering at position i (ai) on a given genomic sequence is,

where PN and GL represent the probability of observing the 147-bp sequence as a nucleosome or a linker, respectively [16]. The probability of being a nucleosome is calculated by referring to the parameters freqN4 and tranN4, which are derived from nucleosomal DNA sequences. Similarly, calculation of the probability of being a linker is based on linker DNA sequences. As nucleosomal and linker sequences do not overlap in terms of their genomic coordinates, negativity of HBA does not directly mean that the tested sequence is inappropriate for nucleosome formation. The predNuCpos function calculates chemical map-based HBA scores along the input sequence and outputs them as raw values as its default behavior. We developed an independent function designated HBA, which only calculates the HBA score for a given 147-bp sequence. The HBA function uses the abovementioned chemical parameters for predNuCpos: freqL, tranL, tranL2, tranL3, tranL4, freqN4, and tranN4.

We defined 13 overlapping nucleosomal subsegments, A through M, and developed a function designated localHBA that calculates “local” HBA scores for each segment. Segment A corresponds to nucleosomal nucleotide positions 1–21; B, 12–31; C, 22–42; D, 33–52; E, 43–63; F, 54–73; G, 64–84; H, 75–94; I, 85–105; J, 96–115; K, 106–126; L, 117–136; and M, 127–147. Similar to the calculation of HBA [16], the local HBA score for segment A of the 147-bp potential nucleosomal region x centering at position i (li) on a given genomic sequence is calculated as,

where the probabilities of observing the 21-bp sequence as segment A of a nucleosome and a linker are calculated. Local HBA scores for the other segments are calculated in the same way, except that the considered nucleotide positions are set appropriately. At the implementation level, four-base frequency values for the first four nucleotide positions of each segment were prepared: freqN4SA corresponds to nucleosome positions 1–4; freqN4SB, 12–15; freqN4SC, 22–25; freqN4SD, 33–36; freqN4SE, 43–46; freqN4SF, 54–57; freqN4SG, 64–67; freqN4SH, 75–78, freqN4SI, 85–88; freqN4SJ, 96–99; freqN4SK, 106–109; freqN4SL, 117–120; and freqN4SM, 127–130. These four-base frequency values were used to calculate the probability of the segment as that part of nucleosomal DNA as done for HBA calculations [16].

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.