Download the mm10 genome fasta file here:
wget //ftp.ensembl.org/pub/release-/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
gunzip Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
Add the Synthetic Dux and tdTomato sequences to the downloaded mm10 fata.
wget https://raw.githubusercontent.com/sirusb/2CLike_analysis/master/Dux_tdTomato.fa
cat Dux_tdTomato.fa >> Mus_musculus.GRCm38.dna.primary_assembly.fa
The standard mm10 gene annotation. It can be download from here (we used version GRCm38.85, but any newer version is also fine).
wget ftp://ftp.ensembl.org/pub/release-85/gtf/mus_musculus/Mus_musculus.GRCm38.85.gtf.gz
gunzip Mus_musculus.GRCm38.85.gtf.gz
Add the Synthetic Dux and tdTomato gtf annotations (see attachments) to the GTF file.
wget https://raw.githubusercontent.com/sirusb/2CLike_analysis/master/SynDux_TdTomato.gtf
cat Mus_musculus.GRCm38.85.gtf SynDux_TdTomato.gtf >> GRCm38.85_SynDux_TdTomato.gtf
Convert gtf to refFlat format using the gtfToGenePred tool.
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/gtfToGenePred
gtfToGenePred GRCm38.85_SynDux_TdTomato.gtf GRCm38.85_SynDux_TdTomato.refFlat
the refFlat file need to be fixed in the R programming language as follows to be able to work with Drop-seq tools.
refFlat <- read.table("GRCm38.85_SynDux_TdTomato.refFlat")
options(scipen=999)
refFlat$V11 <- refFlat$V1
refFlat <- refFlat[,c(1,11,2:10)]
toUse <- grep("chrNT", refFlat$V2,invert=T)
refFlat = refFlat[toUse,]
write.table(refFlat, file="GRCm38.85_SynDux_TdTomato.refFlat", sep="\t",col.names=F, row.names=F, quote=F)
The pseudo-genome that contains the sequences of repeats is manually created following the steps described in details in this bio-protocol page.