When quality control was finished, the general RNA-seq analyses would be carried out. There are four main steps that need to be done: aligning the reads to the reference; assembling the alignments on the alignment into a full-length transcript; quantitative expression of genes and transcripts; calculating the expression difference of all genes under different experimental conditions. The “new Tuxedo” package including HISAT, StringTie, and Ballgown has been used to perform this process. During this process, HISAT [40] has been used to align RNA-seq reads to the genome, and StringTie [41] is responsible for assembling transcripts and constructing isoforms to estimate gene expression. Ballgown [42] uses the results of StringTie splicing to calculate gene expression, then obtained the FPKM (Fragments Per Kilobase Million) results. The input data was generated by the BGISEQ-500 instrument; after running our pipeline, useful outputs were produced, including transcripts, gene expression values (FPKM), differentially expressed gene (DEG) list, and the merged statistical results. The detailed steps are shown in Table 2.
Detailed information of RNA-seq analysis pipeline.
The table lists analysis steps, software, and main scripts in our pipeline. Starting from the input FASTQ files produced by sequencing and finally generating the results of candidate medicine and genes for NSCLC cancer research.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.