For training ChromBPNet models we used a prereleased version (v.1.3-pre-release; RRID: SCR_024806) from the ChromBPNet GitHub repository (https://github.com/kundajelab/chrombpnet/tree/v1.3-pre-release). We followed all the preprocessing and training steps as described in the tutorial: from the aligned ATAC reads in the MM001 BAM file, we made a BigWig of Tn5 insertion sites, trained a bias model that predict Tn5 binding sites in non-peak regions which is then used in the ChromBPNet model to filter out Tn5 bias. ChromBPNet uses 2,114 bp DNA sequence as input and predicts both the ATAC track and the natural log count of the aligned reads for the central 1,000 bp. To be able to score 500 bp DNA sequences (IRF4 enhancer and synthetic enhancers), we used the flanking sequences of the cloned/integrated enhancer sequences surrounded by the integrated cassette. Both scalar and track prediction were plotted. Flanking sequences are provided in the Supplementary Code.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.