Advanced Search
Last updated date: Nov 10, 2020 Views: 1019 Forks: 0
1. Quality control and adaptor trimming by Trim Galore:
mkdir 1.adaptor_trimmed
for sample in `find rawdata/ -name "*_R1.fastq.gz"`
do
sample2=${sample/R1/R2}
trim_galore -q 20 --phred33 --stringency 3 --length 20 -o 1.adaptor_trimmed/ --paired
$sample $sample2
done
2. Down-sampling by seqtk:
mkdir 2.DownSampling
for sample in `find 1.adaptor_trimmed/ -name "*fq.gz" |sort`
do
file=$(basename $sample)
name=${file%%_val*}
seqtk sample -s 100 $sample 30000000 >2.DownSampling/${name}_30M.fq
#seqtk sample -s 100 $sample 60000000 >2.DownSampling/${name}_60M.fq
done
3. Read mapping by STAR:
(1)
REFDATH=Reference/human/hg19
STAR --runThreadN 20 --runMode genomeGenerate --genomeDir ${REFDATH}/genome/
--genomeFastaFiles ${REFDATH}/genome/hg19_genome.fa --sjdbGTFfile${REFDATH}/
transcriptome/hg19_refGene.gtf --sjdbOverhang 149
#REFDATH=Reference/mouse/mm10
#STAR --runThreadN 20 --runMode genomeGenerate --genomeDir ${REFDATH}/genome/
--genomeFastaFiles ${REFDATH}/genome/mm10_genome.fa --sjdbGTFfile ${REFDATH}/
transcriptome/mm10_refGene.gtf --sjdbOverhang 149 40
(2)
REFDATH=Reference/human/hg19
#REFDATH=Reference/mouse/mm10
mkdir 3.STAR_out
for sample in `find 2.DownSampling -name "*R1_30M.fq" |sort`
do
sample2=${sample/R1/R2}
file=$(basename $sample)
name=${file%%_R1*}
STAR --runThreadN 20 --outSAMattributes NH HI AS nM NM MD jM jI XS MC ch --
genomeDir ${REFDATH}/genome/ --readFilesIn $sample $sample2--outSAMtype BAM
SortedByCoordinate --outMultimapperOrder Random --outFileNamePrefix
3.STAR_out/${name}_ --outFilterIntronMotifs RemoveNoncanonicalUnannotated
done
4. Filtering mapped Read by samtools:
for sample in `find 3.STAR_out -name "*sortedByCoord.out.bam"`
do
samtools view -f 3 -F 256 -b $sample -o ${sample/Aligned*out/primary_alignment
_only}
samtools index ${sample/Aligned*out/primary_alignment_only}
done
5. counting Reads per gene with HTSeq:
(1)
for sample in `find 3.STAR_out -name "*_primary_alignment_only.bam"`
do
samtools sort -n -@ 10 -o ${sample/primary_alignment_only/name_sorted}$sample
done
(2)
gtf=Reference/mouse/mm10/transcriptome/mm10_refGene.gtf
mkdir 4.HTseq_count
for sample in `find 3.STAR_out -name "*name_sorted.bam"|sort`
do
file=`basename $sample`
name=${file%%_name*}
htseq-count -m intersection-strict --secondary-alignments=ignore --supplementary-
alignments=ignore -f bam -s no $sample $gtf >../4.HTseq_count/${name}_strict.count
done
6. Differential gene expression analysis with DESeq2:
library(DESeq2)
htseq_dir<-"~/Documents/TRACE-seq/htseq_count_strict/"
sample_files<-sort(grep(".count",list.files(htseq_dir),value = T))
sample_names<-sub("(.*)_strict.count","\\1",sample_files)
sample_condition<-c("treated","treated","treated","untreated","untreated","untreated")
sampleTable<-data.frame(sampleName=sample_names, fileName=sample_files,
condition=sample_condition)
ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory =
htseq_dir, design = ~ condition)
ddsHTSeq<-ddsHTSeq[rowSums(counts(ddsHTSeq))>=10,]
ddsHTSeq$condition<-relevel(ddsHTSeq$condition, ref = "untreated")
ddsHTSeq<-DESeq(ddsHTSeq)
resLFC <- lfcShrink(ddsHTSeq, coef="condition_treated_vs_untreated", type="normal",
lfcThreshold = 1)
res_total<-data.frame(resLFC)
for (i in 1:nrow(res_total)) {
if (res_total[i,'padj'] < 0.05 & res_total[i,'log2FoldChange'] > 1) {
res_total[i,'type'] <- 'AA'
} else if (res_total[i,'padj'] < 0.05 & res_total[i,'log2FoldChange'] < -1) {
res_total[i,'type'] <- 'BB'
} else {
res_total[i,'type'] <- 'Non_sig'
105 }
106 }
sum(res_total$type == 'AA')
sum(res_total$type == 'BB')
pdf("volcano.pdf", height = 5, width = 7)
library('ggplot2')
as.factor(res_total$type)
ggplot(res_total,aes(log2FoldChange,-log10(padj),colour=type))+
geom_point()+
xlim(-10,10)+
ggtitle("NEBNext Ultra II RNA")+
theme(plot.title = element_text(size = 20,hjust = 0.5),
axis.title = element_text(size = 20),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line(colour = "black"))+
labs(colour="Differential Expressed Genes(4452)")+
xlab('log2FoldChange')+
ylab('-log10(padj)')+
scale_color_manual(labels=c("Up regulated: 3157", "Down regulated: 1295", "Non_
significant"), values = c('#F8766D','#00BFC4','black'))
dev.off()
7. Gene expression measurement by cuffnorm:
gtf=Reference/human/hg19/transcriptome/hg19_refGene.gtf
mkdir cuffnorm
cuffnorm -p 28 $gtf 3.STAR_out/trace_200ng_1_primary_alignment_only.bam,3.STAR_o
ut/trace_200ng_2_primary_alignment_only.bam,3.STAR_out/trace_200ng_3_primary_ali
gnment_only.bam -o cuffnorm
8. Gene body coverage analysis by QoRTs:
gtf=Reference/human/hg19/transcriptome/hg19_refGene.gtf
for sample in `find 3.STAR_out/ -name "*primary_alignment_only.bam"`
do
file=`basename $sample`
name=${file%%_primary*}
mkdir -p QoRTs/whole_genebody/genebody_${name}
java -Xmx4G -jar QoRTs.jar QC --maxReadLength 150 --prefilterImproperPairs --
runFunctions writeGeneBody --minMAPQ 50 --generatePlots $sample $gtf
QoRTs/whole_genebody/genebody_${name}/
done
9. Nucleotide-vs-Cycle counts by QoRTs:
gtf=Reference/human/hg19/transcriptome/hg19_refGene.gtf
for sample in `find 3.STAR_out/ -name "*primary_alignment_only.bam"`
do
file=`basename $sample`
name=${file%%_primary*}
mkdir NVC_${name}
java -Xmx4G -jar QoRTs.jar QC --maxReadLength 150 --prefilterImproperPairs --
runFunctions NVC --minMAPQ 50 --generatePlots $sample $gtf NVC_${name}
done
10. GC content analysis by RSeQC:
for sample in `find 3.STAR_out/ -name "*primary_alignment_only.bam"`
do
file=`basename $sample`
name=${file%%_primary*}
mkdir -p RSeQC/GC_${name}
read_GC.py -i $sample -o RSeQC/ GC_${name}
done
11. Reads distribution analysis by RSeQC:
bed=Reference/human/hg19/transcriptome/hg19_refseq.bed
for sample in `find 3.STAR_out/ -name "*primary_alignment_only.bam"`
do
read_distribution.py -i $sample -r $bed
done
12. Calculating median coefficient of variation of gene coverage by Picard Tools:
mkdir CV
REFDATH=Reference/human/hg19
for sample in `find 3.STAR_out/ -name "*primary_alignment_only.bam"`
do
file=`basename $sample`
name=${file%%_primary*}
java -jar picard.jar CollectRnaSeqMetrics I=$sample O=CV/${name}.RNA_Metrics
REF_FLAT=${REFDATH}/hg19_refFlat.txt STRAND=NONE RIBOSOMAL_INTERVALS=
${REFDATH}/hg19.rRNA.interval_list
done
13. Calculating insert size of library by Picard Tools:
mkdir insert_size
for sample in `find 3.STAR_out -name "*primary_alignment_only.bam"`
do
file=`basename $sample`
name=${file%%_primary*}
java -jar picard.jar CollectInsertSizeMetrics M=0.01 I=$sample O=insert_size/insert_
size_${name}_metrics.txt H=insert_size/insert_size_${name}_histogram.pdf
done
14. Library complexity analysis by Preseq:
mkdir preseq_out
for sample in `find 3.STAR_out -name "*primary_alignment_only.bam"`
do
file=`basename $sample`
name=${file%%_primary*}
preseq c_curve seg_len 10000000000000000 -o preseq_out/${name}_c_curve.txt -P -B
$sample
done
Category
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.
Share
Bluesky
X
Copy link