Repeat pseudo-genome preparation

Mohamed Nadhir Djekidel

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Preprint

Repeat pseudo-genome preparation

MD Mohamed Nadhir Djekidel

Last updated date: Jan 23, 2021 Views: 1070 Forks: 0

An abbreviated version of this protocol was published in Science Advances in May, 2020

A transcriptional roadmap for 2C-like–to–pluripotent state transition

Download PDF

Ask a question

How to cite

Favorite

To map reads to repetitive elements, we created a pseudo-genome that only contains the repeat sequences. The used in house scripts are available here. In summary the pseudo-genome can be created as follows (in a Linux operating system)

Required software:

Make sure the following software are installed and:

python version 2.7.3
bowtie 1 version 0.12.9
bedtools version 2.20.1
samtools version >=1.6
Picard tools >= 2.6.0
STAR >=2.5.2b

Make sure that the Biopython library is installed

pip install BioPython

Download required files:

Clone our code repository
git clone https://github.com/sirusb/2CLike_analysis.git
Go to the Pseudogenome folder
cd Pseudogenome
Download the mm9 repeats annotation from RepEnrich google-dive here and put in the Pseudogenome folder.
Decompress the downloaded 'mm9_repeatmasker_clean.txt.gz` file as follows:
gunzip mm9_repeatmasker_clean.txt.gz
Create an mm9 fasta file that contains all the chromosomes present in the 'mm9_repeatmasker_clean.txt.gz` using the following bash script:
chroms=`cat mm9_repeatmasker_clean.txt  | awk '{print $5}' | uniq  | grep chr`
genome_version='mm9'
## Download the different chromosome .fa files
for f in $chroms
do
echo "Downloading chr${f}.fa.gz"
wget http://hgdownload.cse.ucsc.edu/goldenPath/${genome_version}/chromosomes/${f}.fa.gz -O ${f}.fa.gz zcat ${f}.fa >> ${genome_version}.fa
done
## Remove intermediate files
echo "removing intermediate files"
rm chr*.fa.gz
Open the file run_buildPseudogenome.sh and edit the path to Picard tools.
Run the run_buildPseudogenome.sh script
sh run_buildPseudogenome.sh
Once the script finish running it will create the rms_Pseudo_out folder that contains the newly created genome and its STAR index.

How to cite：

Readers should cite both the Bio-protocol preprint and the original research article where this protocol was used:

Djekidel, M N(2021). Repeat pseudo-genome preparation. Bio-protocol Preprint. bio-protocol.org/prep770.
Fu, X., Djekidel, M. N. and Zhang, Y.(2020). A transcriptional roadmap for 2C-like–to–pluripotent state transition . Science Advances 6(22). DOI: 10.1126/sciadv.aay5181