Bulk RNA Barcoding and sequencing (BRB-seq) and data analysis

PV Philip V’kovski
MG Mitra Gultom
JK Jenna N. Kelly
SS Silvio Steiner
JR Julie Russeil
BM Bastien Mangeat
EC Elisa Cora
JP Joern Pezoldt
MH Melle Holwerda
AK Annika Kratzel
LL Laura Laloli
MW Manon Wider
JP Jasmine Portmann
TT Thao Tran
NE Nadine Ebert
HS Hanspeter Stalder
RH Rune Hartmann
VG Vincent Gardeux
DA Daniel Alpern
BD Bart Deplancke
VT Volker Thiel
RD Ronald Dijkman
request Request a Protocol
ask Ask a question
Favorite

Total cellular RNA from mock and virus-infected hAEC cultures was extracted using the NucleoMag RNA kit (Macherey-Nagel, Oensingen, Switzerland) according to the manufacturer’s guidelines on a Kingfisher Flex Purification system (Thermofisher). Total RNA concentration was quantified with the QuantiFluor RNA System (Promega, Madison, WI, USA) according to the manufacturer’s guidelines on a Cytation 5 multimode reader (Biotek, Sursee, Switzerland). A total of 100 ng of total cellular RNA was used for the generation of BRB-seq libraries, and subsequent sequencing on an Illumina HiSeq 4000 platform was performed as described previously to a depth of approximately 12 million raw reads per sample [30]. The sequencing reads were demultiplexed using the BRB-seqTools suite and were aligned against a concatenation of the human genome (hg38), the SARS coronavirus Frankfurt-1 (AY291315) genome, and the SARS-CoV-2/Wuhan-Hu1/2020 (NC_045512) genome using the STAR aligner and HTSeq for producing the count matrices [30,63,64]. Following the alignment, the raw count matrices were randomly downsampled across all samples to compute the sequence saturation using the average number of reads per sample and median number of detected genes (>1 read) across samples. All downstream analyses were performed using R (version 3.6.1). ComBat-seq was used with default settings to adjust for batch effects in the raw data and generate an adjusted count matrix used for downstream analyses [65]. Library normalization and expression differences between uninfected and virus-infected samples were quantified using the DESeq2 package (version 1.28) with a fold change (FC) cut-off of ≥ 1.5 and a false discovery rate (FDR) of ≤0.1 [66]. Due to the multifactor design of these experiments, DE analysis was performed using several approaches: (1) Samples were subset by temperature prior to DE analysis (e.g., subset of samples for all time points at 33°C), and infected samples were compared to uninfected samples using the design ~ Batch + Condition; (2) Samples were subset by temperature and time prior to DE analysis (e.g., subset of samples at 33°C and 24 hpi), and infected samples were compared to uninfected samples using the design ~ Batch + Condition; (3) For the temporal analysis, all samples were kept together, and the identification of significant DE genes over time was performed using the likelihood ratio test (LRT) with both the complete design ~ Condition + TH + Condition:TH (TH = conjugation of the Temperature and Time variables) and the reduced design ~ Condition + TH. Hierarchical gene clustering was subsequently performed on a variance-stabilizing transformation (VST) processed count matrix of identified DE genes using the degPatterns function from the DEGreports package [67]. Venn diagrams of overlapping DE genes were generated using the VennDiagram package [68]. Pathway enrichment analysis was performed using the clusterProfiler and ReactomePA packages in R [69,70]. Significantly enriched pathways with a gene count >1 and p-value of ≤0.05 were visualized using the enrichplot package. Further data analysis and visualization was performed using a variety of additional packages in R, including ComplexHeatmap and ggplot2 [71].

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A