Statistical Analysis

GS Gunasekaran Suwarnalata
AT Ai Huey Tan
HI Hidayah Isa
RG Ranganath Gudimella
AA Arif Anwar
ML Mun Fai Loke
SM Sanjiv Mahadeva
SL Shen-Yang Lim
JV Jamuna Vadivelu
request Request a Protocol
ask Ask a question
Favorite

The output from the microarray scanner is a raw.tiff format image file. In order to identify and detect the spots automatically and accurately, the GenePix Pro 7 software was used for spot segmentation. The objective of the spot segmentation is to perform a semi-automatic QC process in order to produce a viable result.

The main objectives of the statistical analysis are to determine the quality of the data, success rate of the experiment based on positive controls and statistically identifying putative biomarkers from the study. Data mining and analysis for both quality control and identification of biomarkers were done using customized scripts created in R and Perl.

Different methods of quality control based on both raw and normalized data were done to verify the quality of the protein array data before proceeding with the data analysis:

Median of the raw signal intensities were calculated from quadruplet protein spots on each slide (i.e. each sample):

m = signal intensity of replicates for each protein

X = raw median for each protein in each sample

Median background signals were subtracted from the median raw median signal intensities.

Signal intensities of two positive controls (IgG and Cy3BSA) were examined.

Quantile normalization of data was performed with the exclusion of control proteins, i.e. normalization of only 1631 protein spots across all samples.

Set d = 1√N,…..,1√N

Sort each column of X to give Xsort

Project each row of Xsort onto d to get Xsort

Get Xnorm by rearranging each column of Xsort to have the same ordering as original X

p = number of proteins

N = number of samples

Percentage of coefficient of variant (CV%) of intra-protein, intra-slide and inter-array were calculated to determine the variations between the quadrupled signal intensity for each protein spot on the slide.

M.A.D. = median absolute deviation of each sample

Median = median of quadrupled signal intensity of each protein

Identification and ranking of protein biomarkers were done using penetrance-based fold change. A penetrance-based fold change measures the likelihood that a given raw fold change (FC) is true, thus increasing the significance and reliability of the results. A step-by-step description of this method is as follows:

Quantile normalization of data was performed with the exclusion of control proteins, i.e. normalization of only 1631 protein spots across all samples as described in Eq 2.

Individual fold changes for both case and control were calculated by dividing each normalized data, H from Eq 2 by the mean of each protein across all samples <P>.

Penetrance frequency for both case (FrequencyCase) and control (FrequencyControl) were calculated for each protein.

Penetrance Fold Change for both case and control were calculated for each protein.

A volcano plot was achieved by calculating the p-value using a Student T-Test for the two groups and plotting it against the Log2 transformed overall fold change (ratio). The overall fold change was calculated by dividing the mean of each protein across all case samples, μ(HCase) with the mean of each protein across all control samples, μ(HControl).

Biomarkers were identified and ranked according to the following criteria: (1) p-value < 0.05; (2) for up-regulated biomarkers, Penetrance fold change difference (i.e. Penetrance fold changeCasePenetrance fold changeControl) must be ≥ 2 and Frequency Differential ≥ 1; (3)

for down-regulated biomarkers, Penetrance fold change difference (i.e. Penetrance fold changeCasePenetrance fold changeControl) must be ≤ -2 and Frequency differential ≤ -1; (4)

frequency percentage in case (i.e. FrequencyCase/Number of case × 100%) must be ≥ 10%; and (5) frequency percentage in control (i.e. FrequencyControl/Number of control × 100%) must be ≥ 10%.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A