In order to remove experimental variation from the measurements, normalisation and batch correction were performed on the UPLC glycan data. To make measurements across samples comparable, normalisation by total area was performed. Prior to batch correction, normalised glycan measurements were log-transformed due to right-skewness of their distributions and the multiplicative nature of batch effects. Batch correction was performed on log-transformed measurements using the ComBat method (R package sva) [31], where the technical source of variation (which sample was analysed on which plate) was modelled as batch covariate. To correct measurements for experimental noise, estimated batch effects were subtracted from log-transformed measurements.

Longitudinal analysis of patient samples through their observation period was performed by implementing a linear mixed effects model, where time was modelled as fixed effect, while the individual ID was modelled as random effect, without additional modelling of age. In regards to this, age was not included in the model since the follow-up period for Bariatric cohort was measured in months, therefore the changes in patients’ age are not relevant for glycosylation. Prior to the analyses, glycan variables were all transformed to standard normal distribution by inverse transformation of ranks to Normality (R package “GenABEL”, function rntransform). Using rank transformed variables makes estimated effects of different glycans comparable, as these will have the same standardised variance. False discovery rate (FDR) was controlled by the Benjamini–Hochberg procedure at the specified level of 0.05. Data were analysed and visualised using R programming language (version 3.5.2) [32].

Normalisation of peak intensities to the total chromatogram area was performed for each measured sample separately. Calculated proportions were then batch corrected using ComBat method (R package sva) [31]. Since only plasma N-glycoprofile data was available for the TwinsUK cohort, the extrapolation of the IgG N-glycoprofile from plasma N-glycoprofile had to be performed as this was the only way to deduce IgG N-glycosylation information from the available data. Previous studies demonstrated that neutral glycans in the total plasma protein N-glycoprofile originate nearly exclusively from immunoglobulins, mostly IgG [33], which allowed us to use the total plasma N-glycome data as a source for the IgG N-glycosylation. Mentioned neutral glycans which originate primarily from IgG are mostly located in the first 11 peaks of the total plasma N-glycome which were used to calculate six IgG derived glycan traits – agalactosylation (G0), monogalactosylation (G1), digalactosylation (G2), bisecting GlcNAc (B), core fucosylation (CF) and high mannose structures (HM). Prior to calculation of mentioned derived traits, the first 11 plasma glycan peaks had to be normalised to their total chromatogram area (calculated by adding up the areas under GP1, GP2, … GP11). For example, the relative abundance of GP1 was recalculated by dividing its area with the total IgG chromatogram area and multiplying with 100 (GP1/GP1 + GP2 + ⋯ + GP11 *100). Formulas used for the normalization of the first 11 plasma glycan peaks used for acquisition of IgG N-glycosylation data are presented in Supplementary Table 4. Mixed models were fitted to estimate the effect of BMI change on IgG N-glycome (R package lme4) [34]. Directly measured or derived glycan trait was used as a dependent variable in the mixed model. To differentiate between BMI change and the absolute BMI value, the variable was separated to BMIbaseline and BMIdifference (calculated according to the following equation: BMIdifference=BMIfollowupageBMIbaselineage), and both were used in the model as a fixed effect. Since IgG N-glycome is affected by aging, and the follow-up period for the TwinsUK cohort was measured in years (average follow-up period ≈ 8 years) which resulted in significant change of participants’ age during the follow-up period, age was included both as a fixed effect and a random slope. Finally, to meet the independency criteria, family ID and individual ID (nested within family) were included in the model as a random intercept. Due to multiple model fitting (for 11 directly measured and six derived glycan traits) false discovery rate was controlled using Benjamini–Hochberg method. All statistical analyses were performed using R programming language (version 3.6.3) [32].

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.