Illumina sequencing generated 117,478,076 paired-end reads from the pooled cDNA libraries (Table 1). Sequence filtering was performed in CLC Genomics Workbench 12 software (Qiagen, Denmark). Adapter sequences, low quality reads (Phred score ≤ 30) and reads shorter than 50 bp were removed. The resulting filtered reads were assembled using the de Bruijn graph-based de novo assembler of CLC Genomics [2]. Assembly parameters: k-mer and bubble size were varied to optimize the assembled contigs. The final assembly (minimum contig length = 200 bp) was done with k-mer = 35, and bubble size = 300, which was based on the output parameters: high N50, low total number of contigs, high average contig length and high percentage of reads mapped back to transcripts. The cluster tool cd-hit-est with a sequence identity threshold of 0.95 was used for redundancy filtration of the assembly [3]. Numbers of reads mapping back to the contigs were converted to transcripts per million (TPM) expression values [4] to estimate the transcript abundance. In the initial data investigation, we also performed a principal component analysis and global Pearson correlation analysis to test the significance of the clusters and correlation between samples.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.