For each PDX model, an intraclass correlation coefficient, denoted by ICCg, was computed to examine the impact of each quantification measure on the variability between genes relative to the total variation (across genes and replicate samples) [24–26].
This analysis was based on a components of variance model:
where denotes the log transformed unit of gene i in the replicate j for a particular model. The error variance component associated with (technical error) reflects the reproducibility of the measure. The variance component associated with (true gene expression) represents the true gene-to-gene variability.
The intra-class correlation (ICCg) for each PDX model is defined as
and estimated by the following equation defined by Shrout et al. [25]:
where is the between-genes mean squares, is the between-samples mean squares, k is the number of samples. The ICCg, which ranges between 0 and 1, estimates the proportion of the total variance due to the between-gene variance. Larger ICCg values indicate higher similarity (i.e., agreement) between replicate samples while preserving biological differences among genes within a PDX model. Computing an ICCg for each PDX model, as described above, resulted in a set of 20 ICCg values for each quantification method.
Next, in order to evaluate which measure can better preserve true biological differences within the same gene across different PDX models, another version of intraclass correlation, denoted by ICCm, was computed for each gene. This metric allowed for examination of the impact of each quantification measure on the variability between PDX models relative to the total variation (across models and replicate samples). This analysis was based on a components of variance model:
where denotes the log transformed unit of PDX model i in the replicate j for a particular gene. For simplicity of notation, gene index was not included in the formula. The error variance component associated with (technical error) reflects the reproducibility of the measure. The variance component associated with (true gene expression) represents the true model-to-model variability.
The intra-class correlation (ICCm) for each gene is defined as
and estimated by the following equation defined by Shrout et al. [25]:
where is the between-models mean squares, is the between-samples mean squares, k is the number of samples. The ICCm, which ranges between 0 and 1, estimates the proportion of the total variance due to the between-model variance. Larger ICCm values indicate higher similarity (i.e., agreement) between replicate samples. Computing an ICCm for each gene, as described above, resulted in a set of 28,109 ICCm values for each quantification method. A known feature of the ICC estimator used here is that sometimes it could produce negative values when the true ICC is close to zero and sample size is small. For practical purposes, these negative estimates of ICC are considered to be equivalent to ICC ≈ 0.
Model 947758-054-R is the only model that has four replicates, while the other 19 models all have three replicates. For simplicity, the first three replicates of model 947758-054-R were selected to form a uniform data matrix (20 × 3 for each gene) for the calculation of ICC for each gene. The resulting balance in number of replicates allowed for easier calculation of the ICCg and ICCm estimates using the irr R package (version 0.84.1) [25, 26].
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.