Intraclass correlation coefficient (ICC)

Yingdong Zhao; Ming-Chung Li; Mariam M. Konaté; Li Chen; Biswajit Das; Chris Karlovich; P. Mickey Williams; Yvonne A. Evrard; James H. Doroshow; Lisa M. McShane

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Intraclass correlation coefficient (ICC)

YZ Yingdong Zhao

ML Ming-Chung Li

MK Mariam M. Konaté

LC Li Chen

BD Biswajit Das

CK Chris Karlovich

PW P. Mickey Williams

YE Yvonne A. Evrard

JD James H. Doroshow

LM Lisa M. McShane

This method is extracted from research article: J Transl Med, Jun 2021

TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository

DOI: 10.1186/s12967-021-02936-w

Request a Protocol

Ask a question

Favorite

For each PDX model, an intraclass correlation coefficient, denoted by ICC_g, was computed to examine the impact of each quantification measure on the variability between genes relative to the total variation (across genes and replicate samples) [24–26].

This analysis was based on a components of variance model:

where $Y_{ij}$ denotes the log transformed unit of gene i in the replicate j for a particular model. The error variance component $σ_{e}^{2}$ associated with $e_{ij}$ (technical error) reflects the reproducibility of the measure. The variance component $σ_{g}^{2}$ associated with $g_{i}$ (true gene expression) represents the true gene-to-gene variability.

The intra-class correlation (ICC_g) for each PDX model is defined as

and estimated by the following equation defined by Shrout et al. [25]:

where $M S_{g}$ is the between-genes mean squares, $M S_{e}$ is the between-samples mean squares, k is the number of samples. The ICC_g, which ranges between 0 and 1, estimates the proportion of the total variance due to the between-gene variance. Larger ICC_g values indicate higher similarity (i.e., agreement) between replicate samples while preserving biological differences among genes within a PDX model. Computing an ICC_g for each PDX model, as described above, resulted in a set of 20 ICC_g values for each quantification method.

Next, in order to evaluate which measure can better preserve true biological differences within the same gene across different PDX models, another version of intraclass correlation, denoted by ICC_m, was computed for each gene. This metric allowed for examination of the impact of each quantification measure on the variability between PDX models relative to the total variation (across models and replicate samples). This analysis was based on a components of variance model:

where $Y_{ij}$ denotes the log transformed unit of PDX model i in the replicate j for a particular gene. For simplicity of notation, gene index was not included in the formula. The error variance component $σ_{e}^{2}$ associated with $e_{ij}$ (technical error) reflects the reproducibility of the measure. The variance component $σ_{m}^{2}$ associated with $m_{i}$ (true gene expression) represents the true model-to-model variability.

The intra-class correlation (ICC_m) for each gene is defined as

and estimated by the following equation defined by Shrout et al. [25]:

where $M S_{m}$ is the between-models mean squares, $M S_{e}$ is the between-samples mean squares, k is the number of samples. The ICC_m, which ranges between 0 and 1, estimates the proportion of the total variance due to the between-model variance. Larger ICC_m values indicate higher similarity (i.e., agreement) between replicate samples. Computing an ICC_m for each gene, as described above, resulted in a set of 28,109 ICC_m values for each quantification method. A known feature of the ICC estimator used here is that sometimes it could produce negative values when the true ICC is close to zero and sample size is small. For practical purposes, these negative estimates of ICC are considered to be equivalent to ICC ≈ 0.

Model 947758-054-R is the only model that has four replicates, while the other 19 models all have three replicates. For simplicity, the first three replicates of model 947758-054-R were selected to form a uniform data matrix (20 × 3 for each gene) for the calculation of ICC for each gene. The resulting balance in number of replicates allowed for easier calculation of the ICC_g and ICC_m estimates using the irr R package (version 0.84.1) [25, 26].

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol