4.6. Statistical Analysis

Benjamin Buchard; Camille Teilhet; Natali Abeywickrama Samarakoon; Sylvie Massoulier; Juliette Joubert-Zakeyh; Corinne Blouin; Christelle Reynes; Robert Sabatier; Anne-Sophie Biesse-Martin; Marie-Paule Vasson; Armando Abergel; Aicha Demidem

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

4.6. Statistical Analysis

BB Benjamin Buchard

CT Camille Teilhet

NS Natali Abeywickrama Samarakoon

SM Sylvie Massoulier

JJ Juliette Joubert-Zakeyh

CB Corinne Blouin

CR Christelle Reynes

RS Robert Sabatier

AB Anne-Sophie Biesse-Martin

MV Marie-Paule Vasson

AA Armando Abergel

AD Aicha Demidem

This method is extracted from research article: Metabolites, Jan 2021

Two Metabolomics Phenotypes of Human Hepatocellular Carcinoma in Non-Alcoholic Fatty Liver Disease According to Fibrosis Severity

DOI: 10.3390/metabo11010054

Request a Protocol

Ask a question

Favorite

A pre-screening was proposed to remove useless features (ppm locations) according to discrimination: we removed technical artefacts, constant, and redundant features. We applied the latter two steps independently for each comparison (Figure 5).

Complete workflow of the discrimination process: HCC-F0F1 compared to HCC-F3F4; Raw Nuclear Magnetic Resonance (NMR) aqueous spectra of HCC-F0F1 ≈ 4500 ion peaks (A); removal of technical artefacts, constant and redundant features = 1275 ion peaks (B); choice of the most discriminant metabolites in the aqueous phase by using Genetic Algorithm with Linear Discriminant Analysis = 5 ion peaks (minimum 45 selections), Final solution = 3 discriminant identified metabolites (C).

A univariate analysis is not likely to highlight the best synergistic subset of features. Hence, a multivariable analysis using a combination of several metabolites is a more informative approach. However, after this pre-screening, it was impossible to test all feature subsets within a reasonable amount of time. We chose genetic algorithms (GAs) to perform the selection of subsets. GAs are optimization algorithms, based on the process of natural selection [50,51]. They provide approximate solutions to complex optimization problems. In a first step, a population of potential solutions is randomly generated. Then, this population evolves through the iterative application of mutation, cross-over and selection.

In our model, solutions were subsets of features. The mutation randomly altered each solution by addition, removal, or substitution of a feature. The cross-over randomly combines the features of two solutions. Selection is the only operator increasing the quality of solutions across generations. It relies on a fitness function quantifying the solution quality. A Linear Discriminant Analysis (LDA) was applied on each solution [52]. To avoid over-fitting, a two-fold cross-validation was used to evaluate the accuracy. The fitness function uses this accuracy penalized by the subset size to favor parsimonious solutions. For this purpose, we chose 10 as the maximal size for subsets. The GA was run 10 times, and the solutions obtained on the last generations were evaluated by the average cross-validated LDA accuracy. In order to identify the most interesting features, we used the frequency of selection of each feature in the final generations (Figure 5). Indeed, the more frequently a feature is selected to survive across generations, the more likely it is to play its part in discrimination. The value of the frequency threshold has been set using “random” GAs without any learning step.

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol