Advanced Search
Last updated date: Dec 30, 2020 Views: 1282 Forks: 0
1. Harvest 10ml of overnight S. cerevisiae in mid-log phase (OD600 = 1.0) and wash with DEPC-H2O.
2. Extract RNA with hot acid phenol as described (Ref. 1).
3. Generate RNA-seq libraries. For the following workflow, the QuantSeq 3’ mRNASeq Library Prep Kit FWD for Illumina (Lexogen) was used.
4. Sequence the single-end, 50 base pair reads. Here, an Illumina HiSeq 4000 sequencer was used.
5. Download S. cerevisiae genome from UCSC (http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/bigZips/). Here, version sacCer3 was used.
6. Convert sacCer3.2bit to fasta format with "twoBitToFa" (http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/)
./twoBitToFa sacCer3.2bit saccer3.fa
7. Create index-files for the yeast genome sacCer3 for bowtie2 (Step 12).
bowtie2-build saccer3.fa index/saccer3
8. Generate "genes.gtf" file, containing annotations for genome used (sacCer3), using the online repository http://genome.ucsc.edu/cgi-bin/hgTables. Here, the following options were chosen: clade: other, genome: Scerevisiae, assembly: Apr.2011(sacCer3), group: genes and predictions, track SGD, table: sgdGene, region: genome, output: gtf
9. Build transcriptome using “genes.gtf” from Step 8 and the transcriptome index from Step 7 using tophat (https://ccb.jhu.edu/software/tophat/index.shtml).
tophat -p 4 -G annotation/genes.gtf --transcriptome-index=index/saccer3 -o out/out1 --no-novel-juncs index/saccer3 data/wt_0_S46_L004_R1_001.fastq
10. Create index-files for the yeast rRNAs for bowtie2 (*.bt2).
bowtie2-build index/sc-rrna.fa index/saccerrrna
11. Remove the random primer sequence, adapter contamination, and low quality tails (example given from https://www.lexogen.com/quantseq-data-analysis/ for library kit used in step 3).
polyA = 14;
for sample in *.fastq; do cat ${sample} | bbmap/bbduk.sh in=stdin.fq out=${sample}_trimmed_clean ref=bbmap/resources/polyA.fa.gz,bbmap/resources/truseq.fa.gz k=13 ktrim=r forcetrimleft=11 useshortkmers=t mink=5 qtrim=t trimq=10 minlength=20; done
12. Align each fastq file to the rRNA index (index/saccerrrna from Step 10) and save unaligned sequences to new fastq file using bowtie (https://sourceforge.net/projects/bowtie-bio/). This filters out ribosomal RNA reads.
bowtie -v2 -p4 index/saccerrrna data/{sample}.fastq_trimmed_clean --un /data/{sample}_trim_saccerrrna-unalign.fastq >/dev/null
13. Create alignments with tophat of the filtered data.
tophat -p 4 --transcriptome-index=index/saccer3--no-novel-juncs -o out/trim-rrna index/saccer3 data/{sample}_trim_saccerrrna-unalign.fastq
14. Filter the data based on their quality using MAPQ filtering.
for fff in out/trim-rrna/*.bam; do echo "Running on this file: $fff"; samtools view -bq 50 $fff > $fff.mapq50.bam; done
15. Create index files for each .bam file from Step 14 using samtools (http://www.htslib.org/).
samtools index out/trim-rrna/{sample}_accepted_hits.bam
16. Extract counts for each sample using htseq-count (https://htseq.readthedocs.io/en/release_0.11.1/install.html).
htseq-count -f bam out/trim-rrna/{sample}_accepted_hits.bam genes.gtf > out/trim-rrna/{sample}_accepted_hits_count.txt
17. Counts-based expression values are calculated using R (https://cran.r-project.org/bin/windows/base/), RStudio (https://rstudio.com/products/rstudio/download/#download) and RTools (https://cran.r-project.org/bin/windows/Rtools/, BiocManager (https://bioconductor.org/install/) and Dseq2 (http://bioconductor.org/packages/release/bioc/html/DESeq2.html).
18. Generate combined “count”-files that contain the counts of each replica of a given sample as well as the reference sample (here “wildtype”) as columns in a tab delimited txt document. It should contain the name of the samples in the first row. The first column designates the target gene region. The data matrix is of the size of (number of samples) x (number of gene regions).
region_name wt-rep#1 wt-rep#2 wt-rep#3 sample1-rep1 sample1-rep2 sample1-rep3
YAL069W 1 0 2 0 0 0
[…] […] […] […] […] […] […]
19. Generate a “table”-file for each sample indexing each column of data. The column sample_name should match the names of the samples in the first row of the "count"-file from Step 18. The column condition allows DSEq2 to correctly identify replicas (Step 21).
sample_name condition
wt-rep#1 WT
wt-rep#2 WT
wt-rep#3 WT
sample1-rep1 sample1
sample1-rep2 sample1
sample1-rep3 sample1
20. In R, load the Dseq2 library, the combined counts-file from Step 18 and the table-file from Step 19.
library(DESeq2)
count_table ← read.delim(“combined_counts_wt-sample1.txt”,sep=”\t”,header=TRUE,row.names=”region_name”)
sample_table ← read.delim(“table_wt-sample1.txt”,sep=”\t”,header=TRUE,row.names=”sample_name”
21. Plot data and write RNAseq expression values to file.
dds ← DESeqDataSetFromMatrix(countData = count_table,colData = sample_table,design = ~ condition)
dds ←DESeq(dds)
res ← results(dds)
resOrdered ← res[order(res$padj),]
plot ← plotMA(res, main = “mutant”, ylim = c(-2,2), xlab = “mean count”)
write.table(as.data.frame(resOrdered),sep=”\t”,quote=FALSE,file=”out/wt_sample1_p-values.txt”
References
1. M. A. Collart, S. Oliviero, Preparation of yeast RNA. Curr Protoc Mol Biol Chapter 13, Unit13 12 (2001).
Related files
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.
Share
Bluesky
X
Copy link