Proteomics data normalization and differential expression analysis

Jessica A. Hess; Mark L. Eberhard; Marcelo Segura-Lepe; Kathrin Grundner-Culemann; Barbara Kracher; Jeffrey Shryock; John Harrington; David Abraham

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Proteomics data normalization and differential expression analysis

JH Jessica A. Hess

ME Mark L. Eberhard

MS Marcelo Segura-Lepe

KG Kathrin Grundner-Culemann

BK Barbara Kracher

JS Jeffrey Shryock

JH John Harrington

DA David Abraham

This method is extracted from research article: Sci Rep, Jan 2023

A rodent model for Dirofilaria immitis, canine heartworm: parasite growth, development, and drug sensitivity in NSG mice

DOI: 10.1038/s41598-023-27537-z

Request a Protocol

Ask a question

Favorite

All proteomics analyses, including data pre-processing and differential expression analysis, were conducted using R. For all downstream analyses, MaxQuant label-free quantitation (LFQ) values^⁴¹ were log10-transformed and protein groups flagged as reverse or contaminant hits were excluded. To normalize for general loading effects between samples, a scaling factor was subtracted from the log10-transformed LFQ intensity values for each sample in all data sets. This scaling factor was calculated for each sample by subtracting the overall median of log10 LFQ values from each sample median log10 LFQ value.

To account for potential technical variability between the two sample processing batches (B1, B2), the log10-transformed LFQ intensities were adjusted for corresponding batch effects using the ‘ComBat’ function in the R package sva to perform a parametric batch adjustment for the known processing batch covariate separately for each tissue^⁴²,⁴³. Principal component analysis (PCA) was performed on normalized and batch-corrected log10 LFQ intensities, considering only protein groups with no missing values. To test for significant protein expression differences between the ‘early’ (1 week and 3 weeks) and the ‘transition’ (6 weeks) or ‘late’ (10 weeks and 15 weeks) time points, the R package limma^⁴⁴ was used to fit the following linear model to the normalized and batch-adjusted log10 LFQ intensity values separately for each tissue:

Normalised intensities ~ 0 + Time point

Based on this linear regression model moderated t-tests between the ‘transition’ or ‘late’ time points and the early time point were performed for each protein group and tissue. The resulting p values were corrected for multiple hypothesis testing over all protein groups using the Benjamin-Hochberg procedure with an FDR threshold of 5%.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol