tensorQTL

Amaro Taylor-Weiner; François Aguet; Nicholas J. Haradhvala; Sager Gosai; Shankara Anand; Jaegil Kim; Kristin Ardlie; Eliezer M. Van Allen; Gad Getz

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

tensorQTL

AT Amaro Taylor-Weiner

FA François Aguet

NH Nicholas J. Haradhvala

SG Sager Gosai

SA Shankara Anand

JK Jaegil Kim

KA Kristin Ardlie

EA Eliezer M. Van Allen

GG Gad Getz

This method is extracted from research article: Genome Biol, Nov 2019

Scaling computational genomics to millions of individuals with GPUs

DOI: 10.1186/s13059-019-1836-7

Request a Protocol

Ask a question

Favorite

The core of tensorQTL is a reimplementation of FastQTL [10] in TensorFlow [7] and relies on pandas-plink (https://github.com/limix/pandas-plink) to efficiently read genotypes stored in PLINK [19] format into dask arrays [20].

The following QTL mapping modalities are implemented:

Cis-QTL: nominal associations between all variant–phenotype pairs within a specified window (default ± 1 Mb) around the phenotype (transcription start site for genes), as implemented in FastQTL.

Cis-QTL: beta-approximated empirical p values, based on permutations of each phenotype, as implemented in FastQTL.

Cis-QTL: beta-approximated empirical p values for grouped phenotypes; for example, multiple splicing phenotypes for each gene, as implemented in FastQTL.

Conditionally independent cis-QTL, following the stepwise regression approach described in [16].

Interaction QTLs: nominal associations for a linear model that includes a genotype × interaction term.

Trans-QTL: nominal associations between all variant–phenotype pairs. To reduce output size, only associations below a given p value threshold (default 1e−5) are stored.

Trans-QTL: beta-approximated empirical p values for inverse-normal-transformed phenotypes, in which case the genome-wide associations with permutations of each phenotype are identical. To avoid potentially confounding cis effects, the computation is performed for each chromosome, using variants on all other chromosomes.

To benchmark tensorQTL, we compared its trans-QTL mapping performance on a machine with and without an attached GPU, and cis-QTL mapping relative to the CPU-based FastQTL [10] (an optimized QTL mapper written in C++). For FastQTL, we computed the runtime per gene by specifying the gene and cis-window using the --include-phenotypes and --region options, respectively. The cis-mapping comparisons were performed using skeletal muscle data from the V6p release of GTEx [16]. To facilitate the comparison of GPU vs. CPU performance when mapping trans-QTLs across a wide range of sample sizes, we used randomly generated genotype, phenotype, and covariate matrices. All tensorQTL benchmarks were conducted on a virtual machine on Google Cloud Platform with 8 Intel Xeon CPU cores (2.30 GHz), 52 GB of memory, and an Nvidia Tesla P100 GPU. For CPU-based comparisons, computations were limited to a single core.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol