We created a coexpression network at the isoform level through the following steps introduced in our previous publications24,36. First, expression data for isoforms from overlapping cell lines of the same cancer type in the CCLE and gCSI datasets were downloaded from the PharmacoGx platform (version 1.12.0)37, which comprises pharmacological profiles for several hundred cell lines. The updated CCLE and gCSI PharmacoSets contain isoform-level expression data processed from raw RNA-seq profiles extracted from CGHub38 and NCBI GEO39. Zhaleh et al.40 aligned the RNA-seq reads to the Ensembl Genome Reference Consortium release GRCh3841 using HISAT242, annotated the isoforms and calculated their expression with StringTie43. A total of 58,037 genes, including 19,950 protein-coding genes, 15,767 long noncoding RNAs (lncRNAs) and 14,650 pseudogenes, was annotated by Gencode (version 25)44. Then, the FPKM values (the number of fragments per kilobase per million mapped reads units) were converted to log2 (FPKM + 1) to obtain the expression values of the isoforms. Noncoding isoforms were removed based on Ensembl identifiers using the R package BiomaRt (version 2.34.3)45. We calculated the Pearson correlation coefficients of two isoform expression values for each dataset as follows:
where E is the expression value of protein isoforms i and j. The value log2 (FPKM + 1) was ≥1 in at least 30 cancer cell line types in each dataset. Protein isoforms i and j are also common isoforms in both the gCSI and CCLE datasets.
Interactions between isoforms in the same genes were removed. To find a balance between removing weak interactions and keeping more isoforms in the network, the isoform network was filtered by the threshold s = 0.5, which was calculated as follows:
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.