Tissue Specific Gene Expression Imputation

Carlo Maj; Tiago Azevedo; Valentina Giansanti; Oleg Borisov; Giovanna Maria Dimitri; Simeon Spasov; Pietro Lió; Ivan Merelli

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Tissue Specific Gene Expression Imputation

CM Carlo Maj

TA Tiago Azevedo

VG Valentina Giansanti

OB Oleg Borisov

GD Giovanna Maria Dimitri

SS Simeon Spasov

PL Pietro Lió

IM Ivan Merelli

This method is extracted from research article: Front Genet, Sep 2019

Integration of Machine Learning Methods to Dissect Genetically Imputed Transcriptomic Profiles in Alzheimer’s Disease

DOI: 10.3389/fgene.2019.00726

Request a Protocol

Ask a question

Favorite

Data used for the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). ADNI was launched in 2003 as a public-private partnership led by Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET), other biological markers, clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). In the present work, we analyzed the ADNI1-GWAS dataset including gene array genotyping data for 808 samples available on ADNI portal.

Rigorous quality control has been performed. Namely, samples have been checked for sex, missing genotype rates lower than 0.05 and heterozygosity levels F < 0.2, while variants with Hardy–Weinberg p-value < 1e – 10 have been removed. Then, using the tool by W. Rayner⁵ we checked SNPs for strand consistency, allele names, position, Ref/Alt assignments and minor allele frequency (MAF) in comparison to the reference panel. In order to increase the available genetic information, we imputed our data using Sanger Imputation Server⁶ exploiting Eagle2 for phasing (Loh et al., 2016) and Positional Burrows–Wheeler Transform (Durbin, 2014), considering Haplotype Reference Consortium version 1.1 (McCarthy et al., 2016) as reference panel. As a postimputation quality control, we removed variants with info quality level < 0.6. Genotype calls with posterior probability < 0.9 were set to missing. Post-QC imputed data was used to estimate gene expression regulation across the different samples.

In order to predict the genetic component of gene expression, we used PrediXcan that evaluates the aggregate effects of cis-regulatory variants (within 1MB upstream or downstream of genes of interest) on gene expression via an elastic net regression method (Gamazon et al., 2015). PediXcan needs a reference dataset in which both genome variation and gene expression levels have been measured to build prediction models for gene expression. We exploited already available models trained on GTEX data⁷ to impute tissues specific transcriptomic profiles in a total of 42 tissues (we excluded sex specific tissues, e.g., prostate, ovary, etc.). The imputed transcriptomic profiles were subsequently analyzed using different machine learning approaches ( Figure 1 ). On the one hand, unsupervised machine learning methods were used to analyze data structure, on the other hand, supervised methods were used to test for the presence of “signal” compared to AD related phenotypes.

Framework of integrative analysis of multi-tissues expression profiles. Starting from genotyping data (m individuals per n variants) we imputed tissues specific transcriptomic profiles (for any tissue T _i, where i = 1‚…‚ k) by means of cis-eQTL PrediXcan models trained on GTEx data. Variational autoencoder followed by support vector machine (SVM) latent dimension-tissue match on the imputed gene expression matrices (m individuals per z genes) is used as a feature selection to identify the most relevant genes per tissue (T _i = gene ₁‚…‚ gene _s where i is the i _th tissue and s in the number of prioritized genes) to provide as input of the recurrent neural network classifier.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol