Coupling Exonuclease Digestion with Selective Chemical Labeling for Base-resolution Mapping of 5-Hydroxymethylcytosine in Genomic DNA

引用 收藏 提问与回复 分享您的反馈 Cited by



Genome Biology
Mar 2016



This protocol is designed to obtain base-resolution information on the level of 5-hydroxymethylcytosine (5hmC) in CpGs without the need for bisulfite modification. It relies on (i) the capture of hydroxymethylated sequences by a procedure known as ‘selective chemical labeling’ (see Szulwach et al., 2012) and (ii) the digestion of the captured DNA by exonucleases. After Illumina sequencing of the digested DNA fragments, an ad hoc bioinformatic pipeline extracts the information for further downstream analysis.

Keywords: 5-Hydroxymethylcytosine (5-羟甲基胞嘧啶), Selective chemical labeling (选择性化学标记), Exonuclease digestion (核酸外切酶消化), CpG (CpG)


The methylation of cytosine in genomic DNA can be read by proteins and is mainly translated into gene silencing. Most CpG dinucleotides in the genome are methylated, including those located in gene regulatory regions such as enhancers. However, when required, these CpGs can be demethylated through oxidation of the methyl group by Ten Eleven Translocation (TET) enzymes and replacement by unmethylated cytosines by the base excision repair system. 5-Hydroxymethylcytosine (5hmC) is the first oxidative derivative of 5-methylcytosine, and mapping this modified base in the genome provides information on the regions undergoing active demethylation. Although selective chemical labeling (SCL) allows very specific detection of 5hmC, the resolution of this technique is limited by the size of the DNA fragments, especially when several CpGs are present in the captured DNA. In order to improve resolution, we have introduced a digestion step using exonucleases which trim the DNA molecule up to close proximity of the hydroxymethylated cytosines (Sérandour et al., 2016). Appropriate bioinformatic treatment of the sequencing reads then assigns hydroxymethylation score to the captured CpGs.

Materials and Reagents

  1. Pipette tips (TipOne, STARLAB, catalog numbers: S1161-1800 , S1182-1830 , and S1181-3810 )
  2. 0.65 ml Bioruptor microtubes (Diagenode, catalog number: C30010011 )
  3. 0.5 ml and 2 ml DNA LoBind tubes (Eppendorf, catalog numbers: 0030108035 and 0030108078 )
  4. Micro Bio-Spin 6 column (Bio-Rad Laboratories, catalog number: 7326221 )
  5. 1.5 ml Lobind tubes (Eppendorf, catalog number: 0030108051 )
  6. 2 ml Lobind tubes (Eppendorf, catalog number: 0030108078 )
  7. DNeasy Blood & Tissue Kit (QIAGEN, catalog number: 69504 )
  8. 100-bp DNA marker (Thermo Fisher Scientific, InvitrogenTM, catalog number: 15628019 )
  9. E-gel EX agarose gel 2% (Thermo Fisher Scientific, InvitrogenTM, catalog number: G401002 )
  10. β-Glucosyltransferase (β-GT) and associated reaction buffer (New England Biolabs, catalog number: M0357S )
  11. DBCO-PEG4-Biotin (Sigma-Aldrich, catalog number: 760749 )
  12. UDP-6-N3-Glc (Active Motif, catalog number: 55020 )
  13. DMSO (Sigma-Aldrich, catalog number: D8418 )
  14. QIAquick Nucleotide Removal Kit (QIAGEN, catalog number: 28304 )
  15. Dynabeads M-280 streptavidin (Thermo Fisher Scientific, InvitrogenTM, catalog number: 11205D )
  16. NEBuffer 2 (New England Biolabs, catalog number: B7002S )
  17. 10x NEBuffer 4 (New England Biolabs, catalog number: M0357S )
  18. ATP (10 mM) (New England Biolabs, catalog number: P0756S )
  19. dNTP solution mix (New England Biolabs, catalog number: N0447S )
  20. T4 DNA polymerase (New England Biolabs, catalog number: M0203S )
  21. DNA Polymerase I, Large (Klenow) Fragment (New England Biolabs, catalog number: M0210S )
  22. T4 PolyNucleotide Kinase (New England Biolabs, catalog number: M0201S )
  23. T4 DNA ligase high concentration (New England Biolabs, catalog number: M0202T )
  24. Nuclease-free water (Thermo Fisher Scientific, InvitrogenTM, catalog number: AM9937 )
  25. SCL-exo P7 adapter: annealing of 2 oligonucleotides (5’ Phos = phosphorylated 5’ end):
    P7 exo-adapter reverse: 5’ Phos-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC-OH 3’
    P7 exo-adapter forward: 5’ OH-GATCGGAAGAGCACACGTCT-OH 3’
  26. Phi29 polymerase (New England Biolabs, catalog number: M0269S )
  27. Lambda exonuclease (New England Biolabs, catalog number: M0262S )
  28. RecJf exonuclease (New England Biolabs, catalog number: M0264S )
  29. Glycogen (5 mg/ml) (Thermo Fisher Scientific, InvitrogenTM, catalog number: AM9510 )
  30. Sodium chloride (NaCl) (Acros Organics, catalog number: AC207790050 )
  31. EtOH (100%) (VWR, catalog number: 20821.310 )
  32. SCL-exo P7 primer:
  33. Agencourt AMPure XP (Beckman Coulter, catalog number: A63880 )
  34. Qiagen MinElute PCR Purification Kit (QIAGEN, catalog number: 28004 )
  35. SCL-exo P5 adapter: annealing of 2 oligonucleotides:
    P5 exo-adapter reverse: 5’ OH-AGATCGGAAGAGCG-OH 3’
  36. NEBNext High-Fidelity 2x PCR Master Mix (New England Biolabs, catalog number: M0541S )
  37. SCL-exo universal P5 PCR primer (* = Phosphorothioates S-linkage):
  38. SCL-exo index P7 PCR primer (* = Phosphorothioates S-linkage) (index sequences come from TruSeq LT):
    Index 2:
    Index 4:
    Index 5:
    Index 6:
    Index 7:
    Index 12:
    Index 13:
    Index 14:
    Index 15:
    Index 16:
    Index 18:
    Index 19:
    Notes (concerning the oligonucleotides):
    1. All oligonucleotides were produced by Sigma-Aldrich, purified by HLPC and resuspended in water at 100 μM final.
    2. The SCL-exo P7 adapter and the SCL-exo P5 adapter were obtained by mixing pairs of complementary oligonucleotides in 4 volumes of Annealing buffer (see Recipes) and annealed by heating for 5 min at 95 °C then let cool down slowly to room temperature.
    3. The oligonucleotides designed for SCL-exo were adapted from the P5 and P7 oligonucleotide sequences from Illumina ©2007-2012 Illumina, Inc. All rights reserved. Derivative works created by Illumina customers are authorised for use with Illumina instruments and products only. All other uses are strictly prohibited.
  39. Agilent High Sensitivity DNA Kit (Agilent Technologies, catalog number: 5067-4626 )
  40. Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, InvitrogenTM, catalog number: Q32854 )
  41. EDTA (500 mM, pH 8.0) (AppliChem, catalog number: A4892,0500 )
  42. HEPES (1 M) (GibcoTM, catalog number: 15630056 )
  43. Na deoxycholate (Sigma-Aldrich, catalog number: D6750 )
  44. NP-40, IGEPAL® CA-630 (Sigma-Aldrich, catalog number: I8896 )
  45. Lithium chloride (LiCl) (Sigma-Aldrich, catalog number: 62476 )
  46. Magnesium chloride hexahydrate (MgCl2·6H2O) (Merck, catalog number: 442611 )
  47. Ammonium sulfate ((NH4)2SO4) (Merck, catalog number: 101217 )
  48. DTT (Sigma-Aldrich, catalog number: D9779 )
  49. Tris (MP Biomedicals, catalog number: 04819638 )
  50. Hydrochloric acid (HCl) (Sigma-Aldrich, catalog number: H9892 )
  51. Formamide for molecular biology (Sigma-Aldrich, catalog number: F9037 )
  52. 1x PBS (Fisher Scientific, catalog number: BP399 )
  53. Annealing buffer (see Recipes)
  54. RIPA buffer (see Recipes)
  55. Nick Repair buffer low DTT 10x (see Recipes)
  56. TE buffer (see Recipes)
  57. Elution buffer (see Recipes)
  58. Binding & Washing (B&W) buffer (see Recipes)


  1. PIPETMAN ClassicTM Pipets (Gilson, catalog numbers: F123600 , F144801 , F123602 and F123615 )
  2. Bioruptor Pico with water cooler (Diagenode, catalog numbers: B01060001 and B02010003 )
  3. E-gel Power Snap Electrophoresis Device (Thermo Fisher Scientific, InvitrogenTM, catalog number: G8100 )
  4. Qubit 3 Fluorometer (Thermo Fisher Scientific, InvitrogenTM, catalog number: Q33216 )
  5. Refrigerated centrifuge (Eppendorf, model: 5424 R )
  6. Thermocycler ProFlex PCR system (Thermo Fisher Scientific, Applied BiosystemsTM, catalog number: 4484073 )
  7. ThermoMixer C and Eppendorf ThermoTop (Eppendorf, catalog numbers: 5382000015 and 5308000003 )
  8. DynaMag-2 Magnet (Thermo Fisher Scientific, catalog number: 12321D )
  9. Speed-Vac Savant (Thermo Fisher Scientific, catalog number: DNA120-115 )
  10. 2100 Bioanalyzer Instrument (Agilent Technologies, model: 2100, catalog number: G2939BA )
  11. Mini centrifuge (Bio-Rad Laboratories, catalog number: 1660603 )


Genomic DNA is extracted using the QIAGEN DNeasy kit and fragmented into 300 bp fragments by sonication. The enzyme β-glucosyltransferase catalyzes the addition of azide-glucose to 5hmCs present in the gDNA fragments. Azide then reacts with a biotin conjugate allowing immobilization of the modified DNA on streptavidin-coated magnetic beads (Figure 1A). After end repair, Illumina P7 adapter ligation and nick repair, the captured DNA is incubated with the 5’ → 3’ exonucleases lambda and RecJf. The lambda exonuclease digests one strand of the double-stranded DNA and stops when it encounters bead-bound biotinylated 5hmC, whereas the RecJf exonuclease digests single-stranded DNA that might result from digestion of unmodified contaminant DNA by the lambda exonuclease. After elution from the beads, the DNA is denatured into single-stranded DNA molecules. This is followed by second strand synthesis, ligation of the Illumina P5 adapter, PCR amplification and Illumina sequencing. Single end sequencing starts from the P5 adapter and identifies the location where the lambda exonuclease stopped digesting and its associated nearest hydroxymethylated CpG (Figure 1B).

Figure 1. Overview of the SCL-exo procedure. A. As a first step of gDNA chemical modification, β-glucosyltransferase catalyzes the transfer of azide-glucose from UDP-6-N3-Glc to 5hmCs. Click chemistry is then used to add a biotin conjugate (DBCO-PEG4-Biotin) to the N3-Glc-modified 5hmCs. B. Flow chart of the SCL-exo protocol.

  1. Preparation of samples for Illumina sequencing
    We highly recommend using RNA-free genomic DNA (gDNA) for the SCL-exo protocol. We purified RNA-free gDNA of interest by using the QIAGEN DNeasy kit and adding an RNaseA digestion step as described in the manufacturer’s protocol. RNA-free gDNA from any type of tissue or cultured cells can be used for the SCL-exo protocol. However, one should keep in mind that the global amount of 5hmC differs greatly between tissues, therefore the starting amount of gDNA required for SCL-exo might vary according to sample origin. When processing samples from different test conditions, we strongly recommend adding an identical amount of hydroxymethylated DNA standard in each sample after sonication. The number of reads covering this standard can then be used to normalize the SCL-exo signals between samples.
    1. Sonicate 1 μg of gDNA of interest in 10 μl of 10 mM Tris, pH 8 in a 0.65 ml sonication tube using the Bioruptor Pico to obtain DNA fragments of around 300 bp. Sonication cycles should be set at 30 sec off/30 sec on. To ensure a proper and reproducible sonication, we recommend doing 3 cycles of sonication, then a short centrifugation, then again 3 cycles of sonication, then a short centrifugation and finally 4 cycles of sonication.
    2. The sonication efficiency can be quickly checked by running a 100-bp DNA marker and 0.5 μl of sonicated gDNA (diluted in 19.5 μl water) in an E-gel EX Agarose Gel (2%) for 10 min. You should obtain DNA fragments around 250-300 bp.
    Note: The procedure Steps A3 to A12 come from our colleagues with minor modifications (Szulwach et al., 2012, Bio-protocol).
    1. Mix the remaining 9.5 μl of sonicated DNA with: 2 μl of 10x NEB Beta-GT reaction buffer (supplied with the Beta-GT enzyme) + 0.68 μl of UDP-6-N3-Glc (3 mM) + 1 μl NEB Beta-GT enzyme + 6.8 μl water.
    2. Mix by pipetting and incubate in a thermocycler at 37 °C for 1 h (no heating lid).
    3. Centrifuge quickly with the mini centrifuge (5 sec at 2,000 x g).
    4. Prepare a 3 mM working solution of DBCO-PEG4-Biotin conjugate in DMSO by ten-fold dilution of a 30 mM stock solution in DMSO. Store at -20 °C.
    5. Add 1 μl DBCO-PEG4-Biotin conjugate working solution to the DNA sample from Step A5 to reach a final concentration of 150 μM.
    6. Mix by pipetting and incubate in a thermocycler at 37 °C for 1 h (no heating lid).
    7. Centrifuge quickly with the mini centrifuge (5 sec at 2,000 x g) and clean up the reaction with QIAquick Nucleotide Removal Kit. Elute with at least 30 μl water per column.
      Note: The biotinylated DNA samples can be conserved at -20 °C for few days.
    8. Wash 25 μl of Dynabeads M-280 Streptavidin three times each with 100 μl of 1x Binding & Washing (B&W) buffer (see Recipes) in a 0.5 ml Lobind tube. Separate the beads from the buffer with a magnetic stand and resuspend the beads in 30 μl of 2x B&W buffer and 140 μl of 1x B&W buffer.
    9. Add the 30 μl DNA eluate (from Step A9) to the resuspended beads from the previous step. The final concentration of B&W buffer should be 1x.
    10. Incubate for 30 min at room temperature on rotation.
      Note: Prepare the mix of the Step A15 during this step.
    11. Transfer to a 2 ml Lobind tube and wash the beads five times with 1 ml of 1x B&W buffer using the magnetic stand.
    12. Wash 2 times with 1 ml of 10 mM Tris-HCl pH 8. Do not let the beads dry.
    13. The beads then undergo 5 successive reactions (in a 2 ml Lobind tube agitated at 900 rpm in a thermomixer) as followed:
      End repair: Prepare a mix containing 10 μl of NEB2 buffer (10x), 10 μl of ATP (10 mM), 1 μl of dNTP (10 mM), 5 μl of T4 DNA polymerase (3 U/μl), 1 μl of DNA Polymerase I Large Klenow Fragment (5 U/μl), 5 μl of T4 PolyNucleotide Kinase (T4 PNK) (10 U/μl) and 68 μl of nuclease-free water.
      Add the mix to the beads in the 2 ml Lobind. Incubate at 30 °C for 30 min with agitation at 900 rpm in a thermomixer.
    14. Wash 2 times with 1 ml RIPA buffer (see Recipes) and 2 times with 10 mM Tris-HCl, pH 8. After removing the last Tris wash, centrifuge quickly with the mini centrifuge (5 sec at 2,000 x g) and put the tube back in the magnetic stand. Remove the traces of Tris. Make sure you do the same for Steps A18, A20, A22 and A24. Do not let the beads dry.
    15. Ligation of P7 adapter:
      Prepare a mix containing 10 μl of NEB2 Buffer (10x), 10 μl of ATP (10 mM), 15 μl of SCL-exo P7 adapter (10 μM), 1 μl of T4 DNA ligase (2,000 U/μl) and 65 μl of nuclease-free water. Add the mix to the beads in the 2 ml Lobind tube. Incubate at 25 °C for 1 h with agitation at 900 rpm in a thermomixer.
    16. Wash twice with 1 ml of RIPA buffer and twice with 1 ml of 10 mM Tris-HCl, pH 8.
    17. Nick repair:
      Prepare a mix containing 1.5 μl of Phi29 polymerase (10 U/μl), 10 μl of Home-made Nick Repair low DTT buffer (10x) (see Recipes), 1.5 μl of dNTP (10 mM) and 87 μl of nuclease-free water. Add the mix to the beads in the 2 ml Lobind tube. Incubate at 30 °C for 20 min with agitation at 900 rpm in a thermomixer.
    18. Wash twice with 1 ml RIPA buffer and twice with 1 ml of 10 mM Tris-HCl, pH 8.
    19. Lambda exonuclease digestion:
      Prepare a mix containing 2 μl of Lambda exonuclease (5 U/μl), 10 μl of NEB Lambda exonuclease buffer (10x) and 88 μl of nuclease-free water. Add the mix to the beads in the 2 ml Lobind tube. Incubate at 37 °C for 30 min with agitation at 900 rpm in a thermomixer.
    20. Wash twice with 1 ml RIPA buffer and twice with 1 ml of 10 mM Tris-HCl, pH 8.
    21. RecJf exonuclease digestion:
      Prepare a mix containing 1 μl of RecJ exonuclease (30 U/μl), 10 μl NEB2 buffer (10x) and 89 μl nuclease-free water. Add the mix to the beads in the 2 ml Lobind tube. Incubate at 37 °C for 30 min with agitation at 900 rpm in a thermomixer.
    22. Wash twice with 1 ml RIPA buffer and twice with 1 ml of 10 mM Tris-HCl, pH 8.
    23. Elution:
      Incubate the beads in 100 μl of elution buffer (see Recipes) at 90 °C for 5 min, then put directly on ice to cool the sample.
    24. Transfer the 100 μl eluate to a new 1.5 ml Lobind tube and add 300 μl of 10 mM Tris-HCl, pH 8.
    25. DNA precipitation:
      1. Add 2 μl of glycogen, 16 μl of NaCl (5 M) and mix well. Add 800 μl of 100% EtOH and mix well.
      2. Incubate the tube at -80 °C for at least 30 min (overnight if possible).
      3. Centrifuge at 20,000 x g at 4 °C for 30 min.
      4. Carefully remove the supernatant without disturbing the pellet.
      5. Add 500 μl of 70% EtOH.
      6. Centrifuge at 20,000 x g at 4 °C for 5 min.
      7. Remove the supernatant carefully.
      8. Add 500 μl of 100% EtOH.
      9. Centrifuge at 20,000 x g at 4 °C for 5 min.
      10. Remove the supernatant carefully.
      11. Dry pellets 10-20 min in a Speed-Vac at 45 °C and resuspend in 20 μl of 10 mM Tris-HCl, pH 8.
      12. The purified DNA sample can be conserved for one night at -20 °C. Go to Step A28.
    26. DNA denaturation:
      1. Transfer the 20 μl of DNA solution to a PCR tube and incubate the DNA sample at 95 °C for 5 min in a thermocycler.
      2. Then put the tube directly on ice to cool the sample.
    27. Second strand synthesis:
      1. Add the following reagents to the tube containing the 20 ul of DNA solution: 20 μl of nuclease-free water, 5 μl of the SCL-exo P7 primer (1 μM) and 5 μl of NEB Phi29 Reaction Buffer (10x). Mix gently.
      2. In a thermocycler, incubate the sample at 65 °C for 5 min and then at 30 °C for 2 min. Pause the PCR program.
      3. Immediately add 1 μl of Phi29 polymerase (10 U/μl) and 1 μl of dNTP (10 mM), mix gently.
      4. Restart the PCR program and incubate the sample in a thermocycler at 30 °C for 20 min and then 65 °C for 10 min.
    28. DNA purification:
      1. Add 52 μl of room temperature Ampure beads (1 volume) to the 52 μl sample.
      2. Incubate at room temperature for 15 min.
      3. Put the tube on the magnetic stand and remove carefully the supernatant. With the tube staying on the magnetic stand, wash the beads twice with freshly made 80% EtOH (wait for at least 30 sec after adding the first ethanol wash).
      4. Centrifuge with the mini centrifuge (5 sec at 2,000 x g, put the tube back on the magnetic stand and remove the rest of ethanol.
      5. Leave the tube open on the magnetic stand and let it dry for 10-15 min.
      6. Add 22 μl of room temperature 10 mM Tris-HCl, pH 8, remove the tube from the magnetic stand and mix well. Make sure that all the beads are resuspended and wet.
      7. Remove the tube from the magnetic stand and incubate for 3 min at room temperature.
      8. Put the beads back to the magnetic stand and once they are well packed, pipet carefully 20 μl of the DNA eluate and put it in a new PCR tube.
    29. Ligation of SCL-exo P5 adapter:
      1. In a PCR tube, add the following reagents to the 20 μl of DNA sample: 22.5 μl nuclease-free water, 1.5 μl SCL-exo P5 adapter (10 μM), 5 μl of NEB T4 DNA ligase Buffer (10x) and 1 μl of T4 DNA ligase (2,000 U/μl). Mix gently.
      2. In a thermocycler, incubate at 25 °C for 60 min and then 65 °C for 10 min.
    30. DNA purification:
      Add 50 μl of room temperature Ampure beads (1 volume) to the 50 μl sample, and proceed like in Step A30. The resulting 20 μl eluted DNA solution is used for the final PCR.
    31. PCR amplification:
      1. In a PCR tube, prepare a mix containing 4 μl of nuclease-free water, 25 μl of NEBNext High-Fidelity PCR Master Mix (2x), 0.5 μl of SCL-exo universal P5 PCR primer (25 μM) and 0.5 μl of SCL-exo index P7 PCR primer (25 μM) (choose your index of interest). Add the 20 μl DNA sample and mix gently.
      2. Put the tube in a thermocycler and run the following program:
        98 °C for 30 sec
        Then 18 cycles of: 98 °C for 10 sec, 65 °C for 30 sec, 72 °C for 30 sec
        72 °C for 5 min
        4 °C forever
    32. DNA purification:
      Add 50 μl of room temperature Ampure beads (1 volume) to the 50 μl PCR sample, and proceed like in Step A30. You should get a 20 μl SCL-exo library.
    33. Measure the DNA concentration using Qubit and the dsDNA High Sensitivity kit. Check the library quality on Agilent BioAnalyzer (see Figure 2). In case there is an adapter or a primer contamination, it is advised to redo an Ampure purification (1 volume of beads for 1 volume of DNA). Pool the libraries to multiplex. Get enough index complexity so that the index sequencing is successful. Contact your sequencing facility if you have any doubt.
    34. Submit for Illumina single-end sequencing MiSeq/GAII/HiSeq to a sequencing facility.

      Figure 2. Quality control of SCL-exo libraries. A. Agarose gel electrophoresis of SCL and SCL-exo libraries. SCL libraries were obtained by omitting the exonuclease digestion steps. Note that DNA fragments from the SCL libraries are on average 100 bp longer than in the SCL-exo libraries. B. BioAnalyzer electropherogram profile of a pool of SCL-exo libraries. 1 μl of SCL-exo library was run on an Agilent High Sensitivity DNA chip following the manufacturer’s protocol. The DNA library length should range between 200 and 400 bp. It is important to notice the absence of adapter dimer peak (around 120 bp) and the absence of PCR primers (around 50 bp). If these contaminants are present, we recommend redoing an Ampure purification (1 volume of Ampure beads for 1 volume of DNA library) as in Step A30.

  2. Bioinformatic identification of hydroxymethylated CpGs from SCL-exo fastq files
    We conceived and implemented a bioinformatic protocol to identify hydroxymethylated CpGs from SCL-exo fastq files generated in triplicates by a sequencing platform. The protocol involves the following steps:
    1) Trimming and filtering the sequence reads with respect to their quality using program SolexaQA (Cox et al., 2010).
    2) Mapping high quality reads onto each strand of the genome separately, using the program Bowtie (Langmead et al., 2009), in order to generate sam files for both the forward and reverse strands. Sam files are text files that contain the sequence reads together with their associated genomic localization, if any, and can be parsed to identify reads mapping a unique location on the genome.
    3) Creating a hydroxymethylated CpG signal (wig) file for each replicate by directly reading the sequences in the sam files, using our python program generate-SCL-exo signal-from-sams. The program counts the number of reads uniquely overlapping any given CpG, and stores the values into a signal (wig) file at CpG or base-pair resolution. The wig file can be visualized using a genome browser, such as IGB (Nicol et al., 2009).
    Note: All our python programs are available at:
    4) Identifying putative hydroxymethylated cytosines by retrieving the consensus CpG dinucleotides that are present in at least two of the three replicates, using python program generate-SCL-exo consensus-signal.
    5) Determining the set of CpG dinucleotides significantly enriched in 5hmC using a peak-calling algorithm (generate-SCL-exo peaks) with a well-chosen threshold.
    Details for each of these steps are given below:
    1. Trimming and filtering the sequenced reads
      Only high-quality reads should be retained for sound identification of hydroxymethylated CpGs. Hence we used program SolexaQA (Cox et al., 2010) to trim and filter the reads present in the SCL-exo fastq files. The program takes two parameters: a quality threshold and a minimum length. First, all sequenced nucleotides whose quality is lower than the quality threshold are removed from the reads. Second, reads shorter than the minimum length are deleted. We used value 20 as the minimum nucleotide sequencing quality, corresponding to a p-value of 10-2 (or 1% chance of occurrence of a sequencing error on any given nucleotide) and 17 as the minimum read length. The trimming is achieved by going into the SolexaQA directory and typing under Linux:

      perl fastq -h quality -d.

      where fastq is the path and filename of the fastq file and quality is the quality value (e.g., 20). This will generate a trimmed fastq file fastq.trimmed. The filtering is then achieved by typing:

      perl fastq.trimmed -l minlength

      where minlength is the minimum length (e.g., 17) of retained reads.
    2. Mapping filtered reads onto both strands of the genome
      Bowtie (Langmead et al., 2009) can be used to map the retained high-quality reads onto the forward and reverse strands of the genome separately, with the following parameters:

      -p processors -l length -n nb_mismatches -m 1 --sam --strata --best --norc [or --nofw]

      processors: designate the number of computer processors available for the mapping process;
      length: the read length taken into account to map the read onto the genome;
      nb_mismatches: the allowed number of mismatches;
      -m 1 indicates that we only retain reads mapping the genome at a unique location;
      --norc (respectively --nofw) that the genome reverse strand (resp. forward strand) is not used for mapping.

      Note that Bowtie initially requires indexing the genome fasta files (see Bowtie user guide).
      The reads must be mapped onto the forward and reverse strands separately, producing one sam file for each strand. Mapping was launched with Bowtie using the Linux command:

      ./bowtie -p processors --best -l 28 -n 2 -m 1 --sam --strata --norc genome fastq > fw-sam

      to map reads from file fastq onto the forward strand of the indexed genome file (typically a .hs file), so as to generate a forward strand fw-sam file, and:

      ./bowtie -p processors --best -l 28 -n 2 -m 1 --sam --strata --nofw genome fastq > rv-sam

      to map reads from file fastq onto the reverse strand of the indexed genome file, so as to generate a reverse strand rv-sam file.
    3. Parsing single stranded sam files to generate a wig file at CpG or base-pair resolution
      1. DNA fragments were initially captured according to the presence of 5-hydroxymethylcytosine and then trimmed by exonuclease. As cytosines are mostly hydroxymethylated in a CpG context, 5hmC-positive reads should be enriched in CpGs within a few nucleotides from the start of every sequence (Sérandour et al., 2016). The sam files contain the sequence reads together with their associated genomic localization, if any. Our python program (generate-SCL-exo signal-from-sams) parses both the forward and reverse stranded sam files and considers in turn all reads uniquely mapped on the genome. It checks whether every localized read exhibits a CpG within the first few nucleotides of its sequence. Typically, a 10 base-pair long window, situated at the beginning of the read, is used to attest for the presence of a CpG. Reads not exhibiting any CpG inside the window are discarded. Reads exhibiting two or more CpGs inside the window are kept aside (their CpGs will be stored in a different file), as it is then not possible to determine with certainty which CpG was hydroxymethylated.
      2. When a single CpG is found within the window, its precise genomic coordinate is determined (from the read localization provided by the sam file and the CpG position within the read) and stored in memory, within a hash table-type structure. The first time a CpG position is encountered, a value of 1 is associated to the genomic position. If a CpG position already contains a value, that value is increased by one, storing effectively the number of reads covering that particular position. Note that the program accounts for the strand associated to the sam file: in the reverse stranded sam file, the read sequences must be read from right to left, while the CpG coordinate must be adjusted (this is automatically accounted for by the program). Once all reads have been parsed, the hash table is stored within a wig file at CpG resolution by default: the genomic positions of every CpG are associated with their number of overlapping reads. By default, our program adds up the number of reads overlapping a given CpG found on both strands. It is possible to use the program so that it produces a different signal value for each stranded cytosine of the CpG, associating each cytosine position with the number of reads overlapping each strand, thus producing a wig file at base-pair resolution.
      3. Our program is launched using the following Linux command:

        python signal-from-sams fw-sam rv-sam window SCL-exo-wig [resolution]

        fw-sam and rv-sam respectively designate the filenames of the forward and reverse stranded sam files;
        window stands for the length (typically 10), expressed in base-pairs, of the window used to identify hydroxymethylated CpG dinucleotides;
        resolution is an optional parameter that takes on value 2 or 1, depending on whether the SCL-exo-wig file will be generated respectively at CpG or base-pair resolution (default is 2 if the parameter is not specified).
    4. Identifying the consensus CpGs found in at least two out of the three replicates
      We provide a python program generate-SCL-exo consensus-signal that compares three SCL-exo signal files and returns a wig file containing the hydroxymethylated CpGs identified in at least two of the three replicates, together with their mean signal. Identified CpG positions must exhibit values greater than a minimum threshold min-threshold in at least two of the three files. An example of such consensus signal can be found in Figure 3.
      The function can be called thus:

      python consensus-signal SCL-exo-wig1 SCL-exo-wig2 SCL-exo-wig3 min-threshold

      Figure 3. Integrated Genome Browser view of SCL-exo signal in a region of mm8 chr11 from P19 embryonal carcinoma cells. For comparison, the Input-seq (genomic DNA of P19 cells) and SCL-seq (no exonuclease step) are shown.

    5. Determining significantly enriched hydroxymethylated CpGs within SCL-exo wig files
      A peak-calling algorithm is used to determine the CpGs that are significantly enriched in 5hmC within an SCL-exo signal file. The algorithm looks for adjacent genomic positions, within the SCL-exo wig file, that exhibit signal values for both CpG coordinates above a predefined threshold (Sérandour et al., 2016).
      The program can be used either on the consensus SCL-exo signal file or on the SCL-exo wig replicates separately.
      The python program generate-SCL-exo peaks takes an SCL-exo-wig file together with the predefined threshold, and generates a bed file gathering all CpG positions that satisfy the above constraints.
      Peak-calling on an SCL-exo-wig file at CpG or base-pair resolution is launched using the Linux command:

      python peaks SCL-exo-wig threshold.

Data analysis

Information about data processing and analysis can be found in the original research article at:


  1. Notes concerning Steps A15 to A24:
    1. Beads can be less magnetic in the 10 mM Tris-HCl, pH 8. Keep the tube on the magnetic stand during the removal of the second Tris wash, to avoid the loss of beads. Then spin the tube briefly, put it back on the magnetic stand and remove the residual Tris buffer. Tris washes should be done carefully to eliminate any trace of detergent that can be detrimental for the subsequent enzymatic reaction.
    2. Do not let the streptavidin beads dry out. Prepare the enzymatic mixes few minutes before the washes.
  2. Concerning the Ampure beads purification in Steps A30, A32 and A34:
    Be aware that any remaining trace of EtOH would inhibit the next enzymatic reaction.


  1. Annealing buffer
    10 mM Tris, pH 8
    50 mM NaCl
    1 mM EDTA
  2. RIPA buffer
    50 mM HEPES, pH 7.6
    1 mM EDTA
    0.7% Na deoxycholate
    1% NP-40
    0.5 M LiCl
  3. Nick Repair buffer low DTT (10x)
    100 mM MgCl2
    500 mM Tris-HCl, pH 7.5
    100 mM (NH4)2SO4
    10 mM DTT
  4. TE buffer (pH 7.4)
    10 mM Tris
    1 mM EDTA
  5. Elution buffer
    95% formamide
    10 mM EDTA, pH 8
  6. Binding & Washing (B&W) buffer (2x)
    10 mM Tris-HCl (pH 7.5)
    1 mM EDTA
    2 M NaCl


We thank M. Bizot and G. Palierne for technical assistance. This work was funded by La Ligue Contre le Cancer, Cancéropole Grand Ouest, The CNRS and the University of Rennes 1. The authors declare no competing interests.


  1. Cox, M. P., Peterson, D. A. and Biggs, P. J. (2010). SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics 11: 485.
  2. Langmead, B., Trapnell, C., Pop, M. and Salzberg, S. L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3): R25.
  3. Nicol, J. W., Helt, G. A., Blanchard, S. G., Jr., Raja, A. and Loraine, A. E. (2009). The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets. Bioinformatics 25(20): 2730-2731.
  4. Sérandour, A. A., Avner, S., Mahe, E. A., Madigou, T., Guibert, S., Weber, M. and Salbert, G. (2016). Single-CpG resolution mapping of 5-hydroxymethylcytosine by chemical labeling and exonuclease digestion identifies evolutionarily unconserved CpGs as TET targets. Genome Biol 17: 56.
  5. Szulwach, K. E., Song, C. X., He, C. and Jin, P. (2012). 5-hydroxymethylcytosine (5-hmC) specific enrichment. Bio-protocol 2(15).


该协议旨在获得CpGs中5-羟甲基胞嘧啶(5hmC)水平的碱基分辨率信息,而无需亚硫酸氢盐修饰。 它依赖于(i)通过称为“选择性化学标记”(参见Szulwach等人,2012)的方法捕获羟甲基化序列和(ii)通过外切核酸酶消化捕获的DNA。 在消化的DNA片段的Illumina测序之后,特设的生物信息学管道提取信息用于进一步的下游分析。

【背景】基因组DNA中胞嘧啶的甲基化可以被蛋白质读取,并且主要被翻译成基因沉默。基因组中的大多数CpG二核苷酸是甲基化的,包括位于基因调控区如增强子的那些。然而,当需要时,这些CpG可以通过Ten Eleven Translocation(TET)酶将甲基氧化并且通过碱基切除修复系统用未甲基化的胞嘧啶置换来去甲基化。 5-羟甲基胞嘧啶(5hmC)是5-甲基胞嘧啶的第一个氧化衍生物,并且在基因组中绘制该修饰的碱基提供了关于正在进行活性去甲基化的区域的信息。尽管选择性化学标记(SCL)可以非常特异地检测5hmC,但该技术的分辨率受DNA片段大小的限制,特别是当捕获的DNA中存在多个CpG时。为了提高分辨率,我们引入了使用外切核酸酶的消化步骤,所述核酸外切酶将DNA分子修剪成靠近羟甲基化的胞嘧啶(Sérandour et。,2016)。然后对测序读数进行适当的生物信息学处理,然后将羟甲基化评分赋予捕获的CpG。

关键字:5-羟甲基胞嘧啶, 选择性化学标记, 核酸外切酶消化, CpG


  1. 移液器吸头(TipOne,STARLAB,产品目录号:S1161-1800,S1182-1830和S1181-3810)
  2. 0.65毫升生物奥管微管(Diagenode,目录号:C30010011)
  3. 0.5ml和2ml DNA LoBind管(Eppendorf,目录号:0030108035和0030108078)
  4. Micro Bio-Spin 6柱(Bio-Rad Laboratories,目录号:7326221)
  5. 1.5 ml Lobind管(Eppendorf,目录号:0030108051)
  6. 2 ml Lobind管(Eppendorf,目录号:0030108078)
  7. DNeasy Blood&组织试剂盒(QIAGEN,目录号:69504)
  8. 100-bp DNA标记(Thermo Fisher Scientific,Invitrogen TM,目录号:15628019)
  9. E-gel EX琼脂糖凝胶2%(Thermo Fisher Scientific,Invitrogen TM,目录号:G401002)
  10. β-葡糖基转移酶(β-GT)和相关的反应缓冲液(New England Biolabs,目录号:M0357S)
  11. DBCO-PEG4-生物素(Sigma-Aldrich,目录号:760749)
  12. UDP-6-N3-Glc(Active Motif,目录号:55020)
  13. DMSO(Sigma-Aldrich,目录号:D8418)
  14. QIAquick核苷酸去除试剂盒(QIAGEN,目录号:28304)
  15. Dynabeads M-280链霉亲和素(Thermo Fisher Scientific,Invitrogen TM,产品目录号:11205D)
  16. NEBuffer 2(New England Biolabs,目录号:B7002S)
  17. 10x NEBuffer 4(New England Biolabs,目录号:M0357S)
  18. ATP(10mM)(New England Biolabs,目录号:P0756S)
  19. dNTP溶液混合物(New England Biolabs,目录号:N0447S)
  20. T4 DNA聚合酶(New England Biolabs,目录号:M0203S)
  21. DNA聚合酶I,Large(Klenow)片段(New England Biolabs,目录号:M0210S)
  22. T4多核苷酸激酶(New England Biolabs,目录号:M0201S)
  23. T4 DNA连接酶高浓度(New England Biolabs,目录号:M0202T)
  24. 无核酸酶的水(Thermo Fisher Scientific,Invitrogen TM,目录号:AM9937)
  25. SCL-exo P7衔接子2个寡核苷酸的退火(5'Phos =磷酸化5'末端):
  26. Phi29聚合酶(New England Biolabs,目录号:M0269S)
  27. λ外切核酸酶(New England Biolabs,目录号:M0262S)
  28. RecJ外切核酸酶(New England Biolabs,目录号:M0264S)
  29. 糖原(5mg / ml)(Thermo Fisher Scientific,Invitrogen TM,目录号:AM9510)
  30. 氯化钠(NaCl)(Acros Organics,目录号:AC207790050)
  31. EtOH(100%)(VWR,目录号:20821.310)
  32. SCL-exo P7引物:
  33. Agencourt AMPure XP(Beckman Coulter,目录号:A63880)
  34. Qiagen MinElute PCR纯化试剂盒(QIAGEN,目录号:28004)
  35. SCL-exo P5接头:2个寡核苷酸的退火:
    P5 exo适配器反向:5'OH-AGATCGGAAGAGCG-OH 3'
  36. NEBNext高保真2x PCR主混合物(New England Biolabs,目录号:M0541S)
  37. SCL-exo通用P5 PCR引物(* =硫代磷酸酯S-键):
  38. SCL-exo指数P7 PCR引物(* =硫代磷酸酯S键)(指数序列来自TruSeq LT):
    1. 所有寡核苷酸均由Sigma-Aldrich生产,通过HLPC纯化并以100μM终浓度重新悬浮于水中。
    2. SCL-exo P7适配器 和 SCL-exo P5适配器通过将4对体积的退火缓冲液(参见配方)中的互补寡核苷酸对混合并通过在95℃下加热5分钟退火,然后让其缓慢冷却至室温来获得。
    3. 设计用于SCL-exo的寡核苷酸来自Illumina©2007-2012 Illumina,Inc.的P5和P7寡核苷酸序列。保留所有权利。只有Illumina客户创建的衍生作品才被授权与Illumina仪器和产品一起使用。所有其他用途都是严格禁止的。
  39. 安捷伦高灵敏度DNA试剂盒(安捷伦科技,产品目录号:5067-4626)
  40. Qubit dsDNA HS分析试剂盒(Thermo Fisher Scientific,Invitrogen TM,目录号:Q32854)
  41. EDTA(500mM,pH8.0)(AppliChem,目录号:A4892,0500)
  42. HEPES(1M)(Gibco TM,目录号:15630056)
  43. 脱氧胆酸钠(Sigma-Aldrich,目录号:D6750)
  44. NP-40,IGEPAL CA-630(Sigma-Aldrich,目录号:I8896)
  45. 氯化锂(LiCl)(Sigma-Aldrich,目录号:62476)
  46. 氯化镁六水合物(MgCl 2•6H 2 O)(Merck,目录号:442611)
  47. 硫酸铵((NH 4)2 SO 4)(Merck,目录号:101217)
  48. DTT(Sigma-Aldrich,目录号:D9779)
  49. Tris(MP Biomedicals,目录号:04819638)
  50. 盐酸(HCl)(Sigma-Aldrich,目录号:H9892)
  51. 分子生物学甲酰胺(Sigma-Aldrich,目录号:F9037)
  52. 1x PBS(Fisher Scientific,目录号:BP399)
  53. 退火缓冲区(请参阅食谱)
  54. RIPA缓冲区(请参阅食谱)
  55. 尼克修复缓冲液低DTT 10倍(见食谱)
  56. TE缓冲液(见食谱)
  57. 洗脱缓冲液(见食谱)
  58. 装订&洗涤(B& W)缓冲液(见食谱)


  1. PIPETMAN经典TM吸管(Gilson,目录号:F123600,F144801,F123602和F123615)
  2. Bioruptor Pico带水冷却器(Diagenode,产品目录号:B01060001和B02010003)
  3. E-gel Power Snap电泳装置(Thermo Fisher Scientific,InvitrogenTM,产品目录号:G8100)
  4. Qubit 3荧光计(Thermo Fisher Scientific,Invitrogen TM,目录号:Q33216)
  5. 冷冻离心机(Eppendorf,型号:5424 R)
  6. 热循环仪ProFlex PCR系统(Thermo Fisher Scientific,Applied Biosystems TM,目录号:4484073)
  7. ThermoMixer C和Eppendorf ThermoTop(Eppendorf,产品目录号:5382000015和5308000003)
  8. DynaMag-2 Magnet(Thermo Fisher Scientific,目录号:12321D)
  9. Speed-Vac Savant(Thermo Fisher Scientific,目录号:DNA120-115)
  10. 2100生物分析仪(Agilent Technologies,型号:2100,目录号:G2939BA)
  11. 小型离心机(Bio-Rad Laboratories,目录号:1660603)


使用QIAGEN DNeasy试剂盒提取基因组DNA,并通过超声破碎成300bp片段。酶β-葡糖基转移酶催化叠加葡萄糖到存在于gDNA片段中的5hmC。然后叠氮化物与生物素缀合物反应,允许将修饰的DNA固定在链霉亲和素包被的磁珠上(图1A)。在修复完毕后,将Illumina P7衔接子连接和切口修复,将捕获的DNA与5'→3'外切核酸酶λ和RecJf孵育。 λ外切核酸酶消化双链DNA的一条链,并在遇到珠子结合的生物素化5hmC时停止,而RecJ外切酶消化可能由未修饰污染物DNA消化产生的单链DNA λ外切核酸酶。从珠上洗脱后,DNA变性为单链DNA分子。接着是第二链合成,Illumina P5衔接子的连接,PCR扩增和Illumina测序。单末端测序从P5衔接头开始,鉴定λ外切核酸酶停止消化的位置及其最近的羟甲基化CpG(图1B)。

图1. SCL-exo过程概述。 :一种。作为gDNA化学修饰的第一步,β-葡糖基转移酶催化叠氮化物葡萄糖从UDP-6-N3-Glc转移至5hmC。然后使用点击化学将生物素缀合物(DBCO-PEG4-生物素)添加至N3-Glc修饰的5hmC。 B. SCL-exo协议流程图。

  1. 为Illumina测序准备样品
    我们强烈建议使用无RNA基因组DNA(gDNA)用于SCL-exo方案。我们通过使用QIAGEN DNeasy试剂盒并按照制造商的方案中所述添加RNaseA消化步骤来纯化目标RNA的无gDNA。来自任何类型的组织或培养细胞的无RNA的gDNA可以用于SCL-exo方案。但是,人们应该记住,组织间5hmC的总量差异很大,因此SCL-exo所需的gDNA的起始量可能因样本来源而异。当处理来自不同测试条件的样品时,我们强烈建议在超声处理后在每个样品中加入相同量的羟甲基化DNA标准物。覆盖此标准的读数数量可用于标准化样品之间的SCL-exo信号。
    1. 使用Bioruptor Pico在0.65ml超声管中在10μl10mM Tris,pH 8中超声处理1μg感兴趣的gDNA以获得约300bp的DNA片段。超声波周期应设置为30秒/ 30秒。为确保正确和可重复的超声处理,我们建议进行3次超声处理循环,然后进行短时离心,然后再进行3次超声波处理,然后短时间离心,最后进行4次超声处理。
    2. 通过在E-gel EX琼脂糖凝胶(2%)中运行100-bp DNA标记和0.5μl超声处理的gDNA(在19.5μl水中稀释)10分钟,可以快速检查超声处理效率。你应该获得约250-300 bp的DNA片段。
    注意:程序步骤A3到A12来自我们的同事,稍作修改( Szulwach et al。,2012 ,Bio-protocol)。
    1. 将剩余的9.5μl超声处理的DNA与:2μl10x NEBβ-GT反应缓冲液(随β-GT酶提供)+0.68μlUDP-6-N3-Glc(3mM)+1μlNEB Beta- GT酶+ 6.8μl水。
    2. 通过移液混合并在37℃的热循环仪中孵育1小时(无加热盖)。

    3. 用微型离心机快速离心(5分钟,2000 em x em)。
    4. 通过10倍稀释在DMSO中的30mM储备溶液制备DBCO-PEG4-生物素缀合物在DMSO中的3mM工作溶液。在-20°C储存。
    5. 向来自步骤A5的DNA样品中加入1μlDBCO-PEG4-生物素缀合物工作溶液以达到150μM的终浓度。
    6. 通过移液混合并在37℃的热循环仪中孵育1小时(无加热盖)。
    7. 用微型离心机快速离心(5分钟,2,000 em x g),并用QIAquick Nucleotide Removal Kit清除反应。
      每个柱子至少要加30μl水 注意:生物素化的DNA样品可以在-20°C下保存几天。
    8. 每次用25μlDynabeads M-280链霉抗生物素洗涤3次,每次用100μl1×Binding&在0.5ml Lobind管中洗涤(B& W)缓冲液(见配方)。用磁力支架将珠子从缓冲液中分离出来,并将珠子重悬于30μl2×B& W缓冲液和140μl1×B& W缓冲液中。
    9. 将30μlDNA洗脱物(来自步骤A9)添加到来自之前步骤的重悬的珠子中。 B& W缓冲液的最终浓度应为1倍。

    10. 在室温下孵育30分钟 注意:在此步骤中准备步骤A15的混合物。
    11. 转移到2毫升Lobind试管中,用1毫升1x B& W缓冲液使用磁力架清洗珠子5次。
    12. 用1 ml 10 mM Tris-HCl pH 8清洗2次。不要让珠子干燥。
    13. 然后珠经历5次连续反应(在2ml Lobind管中,以900rpm在热混合器中搅拌)如下:
      末端修复:准备含10μlNEB2缓冲液(10x),10μlATP(10mM),1μldNTP(10mM),5μlT4 DNA聚合酶(3U /μl),1μl DNA聚合酶I大Klenow片段(5U /μl),5μlT4多核苷酸激酶(T4PNK)(10U /μl)和68μl无核酸酶的水。
    14. 用1ml RIPA缓冲液(参见配方)洗涤2次,用10mM Tris-HCl(pH 8)洗涤2次。除去最后的Tris洗涤液后,用微型离心机快速离心(5分钟,2,000 emg )并将管放回磁性支架。去除Tris的痕迹。确保您对步骤A18,A20,A22和A24执行相同的操作。不要让珠子干燥。

    15. P7适配器的结扎 准备含10μlNEB2缓冲液(10x),10μlATP(10mM),15μlSCL-exo P7接头(10μM),1μlT4 DNA连接酶(2,000μg) U /μl)和65μl无核酸酶的水。将混合物加入2毫升Lobind管中的珠子中。
    16. 用1ml RIPA缓冲液洗涤两次,用1ml 10mM Tris-HCl(pH 8)洗两次。
    17. 尼克修复:
      准备一个混合物,包含1.5微升Phi29聚合酶(10 U /μl),10微升自制尼克修复低DTT缓冲液(10倍)(见食谱),1.5微升dNTP(10毫米)和87微升无核酸酶水。将混合物加入2毫升Lobind管中的珠子中。
    18. 用1ml RIPA缓冲液洗涤两次,用1ml 10mM Tris-HCl(pH 8)洗两次。
    19. λ外切核酸酶消化:
      准备一个混合物,包含2μLλ外切核酸酶(5U /μl),10μLNEBλ外切核酸酶缓冲液(10x)和88μL无核酸酶水。将混合物加入2毫升Lobind管中的珠子中。
    20. 用1ml RIPA缓冲液洗涤两次,用1ml 10mM Tris-HCl(pH 8)洗两次。
    21. RecJ外切核酸酶消化:
      准备包含1微升RecJ外切核酸酶(30 U /μl),10微升NEB2缓冲液(10x)和89微升无核酸酶水的混合物。将混合物加入2毫升Lobind管中的珠子中。
    22. 用1ml RIPA缓冲液洗涤两次,用1ml 10mM Tris-HCl(pH 8)洗两次。
    23. 洗脱:
    24. 将100μl洗脱液转移至新的1.5 ml Lobind试管中,加入300μl10 mM Tris-HCl,pH 8。
    25. DNA沉淀:
      1. 加2μl糖原,16μlNaCl(5M)并充分混合。加入800μl100%乙醇并充分混合。
      2. 在-80°C孵育管至少30分钟(如果可能,过夜)。
      3. 在4℃以20000×gg离心30分钟。
      4. 小心取出上清液,不要扰动沉淀。
      5. 加入500μl70%EtOH。
      6. 在4℃下以20000×g克离心5分钟。

      7. 。清除上清液。
      8. 加入500μl的100%乙醇。

      9. 在4℃下以20000×g 离心5分钟

      10. 。清除上清液。
      11. 在45℃的Speed-Vac中干燥颗粒10-20分钟并重悬于20μl10mM Tris-HCl(pH 8)中。
      12. 纯化的DNA样品可以在-20℃保存一晚。转到步骤A28。
    26. DNA变性:
      1. 将20μlDNA溶液转移至PCR管中,并在热循环仪中于95℃孵育DNA样品5分钟。
      2. 然后将管直接放在冰上以冷却样品。
    27. 第二链合成:
      1. 向含有20ul DNA溶液的试管中加入下列试剂:20μl不含核酸酶的水,5μlSCL-exo P7引物(1μM)和5μlNEB Phi29反应物缓冲区(10x)。轻轻混合。
      2. 在热循环仪中,将样品在65℃孵育5分钟,然后在30℃孵育2分钟。暂停PCR程序。
      3. 立即加1μlPhi29聚合酶(10U /μl)和1μldNTP(10mM),轻轻混合。
      4. 重新启动PCR程序,并将样品在30℃的热循环仪中孵育20分钟,然后在65℃孵育10分钟。
    28. DNA纯化:
      1. 向52μl样品中加入52μl室温Ampure珠(1体积)。
      2. 在室温下孵育15分钟。
      3. 将试管放在磁力架上,小心取出上清液。将试管放置在磁力架上,用新鲜制成的80%乙醇洗涤珠子两次(加入第一次乙醇洗涤后至少等待30秒)。
      4. 用微型离心机离心(2000g×g 5秒),将管放回磁力架上,除去剩余的乙醇。
      5. 将管子在磁力架上打开并让其干燥10-15分钟。
      6. 加入22μl室温10 mM Tris-HCl,pH 8,从磁力架上取下试管并充分混合。确保所有的珠子重新悬浮并潮湿。
      7. 从磁力架上取下试管并在室温下孵育3分钟。
      8. 将珠子放回磁力架,一旦它们充分包装,小心吸取20μlDNA洗脱液,并将其放入新的PCR管中。
    29. SCL-exo P5接头的连接:
      1. 在PCR管中,将以下试剂加入到20μlDNA样品中:22.5μl无核酸酶水,1.5μlSCL-exo P5衔接子(10μM),5μlNEB T4 DNA连接酶缓冲液(10x)和1μlT4 DNA连接酶(2,000U /μl)。轻轻混合。
      2. 在热循环仪中,在25°C孵育60分钟,然后在65°C孵育10分钟。
    30. DNA纯化:
    31. PCR扩增:
      1. 在PCR管中,制备混合物,其中含有4μl不含核酸酶的水,25μlNEBNext高保真PCR主混合物(2x),0.5μlSCL-exo通用P5 PCR引物(25 μM)和0.5μlSCL-exo索引P7 PCR引物(25μM)(选择您感兴趣的指标)。加入20μlDNA样品并轻轻混合。
      2. 将管放入热循环仪并运行以下程序:
        72°C 5分钟
    32. DNA纯化:
    33. 使用Qubit和dsDNA高灵敏度试剂盒测量DNA浓度。使用Agilent BioAnalyzer检查库质量(请参见图2)。如果有接头或引物污染,建议重新进行Ampure纯化(1体积的珠子用于1体积的DNA)。将库合并为复用。获得足够的索引复杂性,以便索引排序成功。如果您有任何疑问,请联系您的测序设备。
    34. 提交Illumina单端测序MiSeq / GAII / HiSeq到测序设备。

      图2. SCL-exo文库的质量控制A. SCL和SCL-exo文库的琼脂糖凝胶电泳。通过省略外切核酸酶消化步骤获得SCL文库。请注意,来自SCL文库的DNA片段平均比SCL-exo文库长100 bp。 B.一组SCL-exo文库的BioAnalyzer电泳图谱。按照制造商的方案,在Agilent高灵敏度DNA芯片上运行1μl的SCL-exo文库。 DNA文库长度应在200到400 bp之间。注意不存在接头二聚体峰(大约120bp)和缺少PCR引物(大约50bp)是很重要的。如果这些污染物存在,我们建议按照步骤A30重新进行Ampure纯化(1体积的Ampure珠粒用于1体积的DNA文库)。

  2. 生物信息学鉴定SCL-exo fastq文件中的羟甲基化CpGs
    我们构思并实施了一种生物信息学方案,以通过测序平台一式三份地从SCL-exo fastq文件中鉴定羟甲基化的CpG。该协议涉及以下步骤:
    1)使用程序 SolexaQA (Cox et。,2010)对序列读数进行修剪和过滤。
    2)使用程序Bowtie (Langmead et al。,2009)分别将高质量读数映射到基因组的每条链上,以便为正向和反向股线。山姆文件是文本文件,包含序列读取及其相关的基因组定位(如果有的话),并且可以解析以识别映射基因组上独特位置的读段。
    3)使用我们的python程序 generate-SCL-exo signal-from-sams 直接读取sam文件中的序列,为每个重复项创建一个羟甲基化的CpG信号(假发)文件。该程序计算唯一重叠任何给定CpG的读取次数,并将这些值以CpG或基准对分辨率存储到信号(假发)文件中。使用基因组浏览器可以将假发文件可视化,例如 IGB (Nicol et al。,2009)。
    注意:我们所有的python程序都可以在以下网址找到: https:// mycore。
    4)使用python程序 generate-SCL-exo consensus-signal ,通过检索存在于三个重复中的至少两个中的共有CpG二核苷酸来鉴别推定的羟甲基化胞嘧啶。
    5)使用峰值调用算法( generate-SCL-exo peaks )确定在5hmC中显着富集的CpG二核苷酸集合,并且具有良好选择的阈值。
    1. 修剪和过滤顺序读取
      对于羟甲基化的CpGs,应该只保留高质量的读数。因此,我们使用SolexaQA程序(Cox et。,,2010)修剪和过滤SCL-exo fastq文件中的读取内容。该程序需要两个参数:质量阈值和最小长度。首先,从读数中去除质量低于质量阈值的所有测序核苷酸。其次,短于最小长度的读取被删除。我们使用值20作为最小核苷酸测序质量,对应于10 -2(或任何给定核苷酸上发生测序错误的概率为1%)的p值和17作为最小读数长度。修剪是通过进入SolexaQA目录并在Linux下键入来实现的:

      perl fastq -h quality -d。

      其中 是fastq文件的路径和文件名, quality 是质量值(,例如,20)。这将生成一个修剪过的fastq文件 fastq.trimmed 。然后通过输入:
      perl fastq.trimmed -l < minlength

      其中 minlength 是保留读取的最小长度(,例如,17)。
    2. 将过滤读数映射到基因组的两条链上
      可以使用Bowtie(Langmead et al。,2009)分别将保留的高质量读数映射到基因组的正向和反向链上,具有以下参数:

      -m 1 --sam --strata - 最好的--norc [或--nofw]

      : 指定可用于映射过程的计算机处理器的数量;
      nb _ 不匹配:允许的不匹配数量;
      -m 1表示我们只保留将基因组映射到独特位置的读段;

      请注意,最初需要对基因组fasta文件进行索引(请参阅 Bowtie用户指南 / a>)。
      阅读必须分别映射到正向和反向链上,为每个链产生一个sam文件。使用Linux命令与 Bowtie 启动映射:

      ./bowtie -p 处理器 --best -l 28 -n 2 -m 1 --sam --strata --norc基因组fastq &gt; fw - sam

      将读取文件 fastq 映射到索引基因组文件的正向链(通常为.hs文件),以生成正向链fw-sam文件,以及: ./bowtie -p processors --best -l 28 -n 2 -m 1 --sam --strata --nofw genome fastq&gt;用于将读取文件fastq映射到索引基因组文件的反向链上,从而生成反向链rv-sam文件。
    3. 解析单链sam文件以CpG或基准对分辨率生成假发文件
      1. 首先根据5-羟甲基胞嘧啶的存在捕获DNA片段,然后用外切核酸酶修剪。由于胞嘧啶在CpG背景中大部分是羟甲基化的,因此5hmC阳性读数应该在每个序列开始的几个核苷酸内富集于CpG中(Sérandour et al。,2016)。如果有的话,SAM文件包含序列读取及其相关的基因组定位。我们的python程序( generate-SCL-exo signal-from-sams )解析了正向和反向链接的sam文件,并依次考虑了唯一映射到基因组上的所有读取。它检查每个本地化读取是否在其序列的前几个核苷酸内呈现CpG。通常,位于读数开始处的10个碱基对的长窗口用于证明CpG的存在。读取不显示窗口内的任何CpG将被丢弃。在窗口内显示两个或多个CpG的读数被保留在一边(它们的CpG将被存储在不同的文件中),因为不可能确定地确定哪个CpG被羟甲基化。
      2. 当在窗口内找到一个CpG时,就可以确定其精确的基因组坐标(从sam文件提供的读取位置和读取内的CpG位置),并存储在内存中的哈希表类型结构中。第一次遇到CpG位置时,值为1与基因组位置相关联。如果CpG位置已经包含一个值,那么该值将增加1,从而有效地存储覆盖该特定位置的读取次数。请注意,程序计算与sam文件关联的链:在反向链sam文件中,必须从右向左读取读取序列,而CpG坐标必须进行调整(程序会自动解释这个坐标)。一旦所有读数都被解析,散列表默认以CpG分辨率存储在假发文件中:每个CpG的基因组位置与其重叠读数的数目相关联。默认情况下,我们的程序将叠加在两条链上发现的给定CpG的重叠数相加。可以使用该程序,以便它为CpG的每个链胞嘧啶产生不同的信号值,将每个胞嘧啶位置与每个链重叠的读数数目相关联,从而以碱基对分辨率产生假发文件。 >
      3. 我们的程序使用以下Linux命令启动:

        python signal-from-sams fw-sam rv-sam window SCL-exo-wig [resolution]

        fw-sam 和 rv-sam 分别指定正向和反向链接的sam文件的文件名;
        window 代表用于识别羟甲基化CpG二核苷酸的窗口的长度(通常为10),以碱基对表示;
        分辨率是一个可选参数,取值为2或1,具体取决于是否将以CpG或基准对分辨率生成 SCL-exo-wig 文件(默认如果参数未指定,则为2)。
    4. 确定在三次重复中至少有两次发现的共识CpGs
      我们提供了一个比较三个SCL-exo信号文件的Python程序 generate-SCL-exo consensus-signal ,并返回一个假发文件,其中包含至少两个三次重复中鉴定的羟甲基化CpG,连同它们平均信号。在三个文件中的至少两个文件中,识别出的CpG位置的值必须大于最小阈值 min-threshold 。这种共识信号的例子可以在图3中找到。

      python consensus-signal SCL-exo-wig1 SCL-exo-wig2 SCL-exo-wig3 min-threshold

      图3.来自P19胚胎癌细胞的mm8 chr11区域中的SCL-exo信号的综合基因组浏览器视图。为了比较,显示了Input-seq(P19细胞的基因组DNA)和SCL-seq(无外切核酸酶步骤)。

    5. 确定SCL-exo假发文件中显着富集的羟甲基化CpGs
      使用峰值调用算法来确定在SCL-exo信号文件内显着富集5hmC的CpG。该算法在SCL-exo假发文件中查找相邻基因组位置,该位置显示出两个CpG坐标高于预定义阈值的信号值(Sérandour et。,2016)。
      Python程序 generate-SCL-exo 峰会将一个 SCL-exo-wig 文件和预定义的阈值结合起来,所有符合上述限制的CpG头寸。
      使用Linux命令启动CpG或基准对分辨率下的 SCL-exo-wig 文件上的峰值调用:

      python peaks SCL-exo-wig threshold 。




  1. 有关步骤A15至A24的注意事项:
    1. 10mM Tris-HCl(pH 8)中磁珠的磁性更小。在去除第二次Tris洗涤过程中将管保持在磁力架上,以避免珠损失。然后短暂旋转试管,将其放回磁力架上并除去残留的Tris缓冲液。应该小心地进行Tris洗涤以消除任何可能对随后的酶促反应有害的洗涤剂痕迹。
    2. 不要让链霉亲和素珠干透。
  2. 关于步骤A30,A32和A34中的安瓿珠纯化:


  1. 退火缓冲区
    10mM Tris,pH 8
    50 mM NaCl
    1 mM EDTA
  2. RIPA缓冲区
    50 mM HEPES,pH 7.6
    1 mM EDTA
    0.5 M LiCl
  3. 尼克修复缓冲区低DTT(10x)
    100mM MgCl 2 2/2 500mM Tris-HCl,pH7.5
    100 mM(NH4)2 SO4 4 /
    10 mM DTT
  4. TE缓冲液(pH 7.4)
    10 mM Tris
    1 mM EDTA
  5. 洗脱缓冲液
    10mM EDTA,pH 8
  6. 装订&amp;洗涤(B&amp; W)缓冲液(2x)
    10 mM Tris-HCl(pH 7.5)
    1 mM EDTA
    2 M NaCl


我们感谢M. Bizot和G. Palierne提供技术援助。这项工作由La Ligue Contre le Cancer,CancéropoleGrand Ouest,CNRS和雷恩大学1资助。作者声明没有竞争利益。


  1. Cox,M.P.,Peterson,D.A。和Biggs,P.J。(2010)。 SolexaQA:Illumina第二代测序数据的快速质量评估。 BMC Bioinformatics 11:485.
  2. Langmead,B.,Trapnell,C.,Pop,M.和Salzberg,S.L。(2009)。 短DNA序列与人类基因组的超快速记忆效率比对 基因组生物学 10(3):R25。
  3. Nicol,J.W.,Helt,G.A.,Blanchard,S.G.,Jr.,Raja,A。和Loraine,A.E。(2009)。 综合基因组浏览器:用于分发和探索基因组数据集的免费软件。 Bioinformatics 25(20):2730-2731。
  4. Sérandour,A.A.,Avner,S.,Mahe,E.A.,Madigou,T.,Guibert,S.,Weber,M.and Salbert,G。(2016)。 通过化学标记和核酸外切酶消化进行5-羟甲基胞嘧啶的单CpG解析图谱将进化上不受保护的CpG鉴定为TET目标。 Genome Biol 17:56。
  5. Szulwach,K.E.,Song,C.X.,He,C.and Jin,P。(2012)。 5-hydroxymethylcytosine(5-hmC)specific enrichment。 Bio-protocol 2(15)。
  • English
  • 中文翻译
免责声明 × 为了向广大用户提供经翻译的内容, 采用人工翻译与计算机翻译结合的技术翻译了本文章。基于计算机的翻译质量再高,也不及 100% 的人工翻译的质量。为此,我们始终建议用户参考原始英文版本。 Bio-protocol., LLC对翻译版本的准确性不承担任何责任。
Copyright: © 2018 The Authors; exclusive licensee Bio-protocol LLC.
引用:Sérandour, A. A., Avner, S. and Salbert, G. (2018). Coupling Exonuclease Digestion with Selective Chemical Labeling for Base-resolution Mapping of 5-Hydroxymethylcytosine in Genomic DNA. Bio-protocol 8(5): e2747. DOI: 10.21769/BioProtoc.2747.