tensorQTL

AT Amaro Taylor-Weiner
FA François Aguet
NH Nicholas J. Haradhvala
SG Sager Gosai
SA Shankara Anand
JK Jaegil Kim
KA Kristin Ardlie
EA Eliezer M. Van Allen
GG Gad Getz
request Request a Protocol
ask Ask a question
Favorite

The core of tensorQTL is a reimplementation of FastQTL [10] in TensorFlow [7] and relies on pandas-plink (https://github.com/limix/pandas-plink) to efficiently read genotypes stored in PLINK [19] format into dask arrays [20].

The following QTL mapping modalities are implemented:

Cis-QTL: nominal associations between all variant–phenotype pairs within a specified window (default ± 1 Mb) around the phenotype (transcription start site for genes), as implemented in FastQTL.

Cis-QTL: beta-approximated empirical p values, based on permutations of each phenotype, as implemented in FastQTL.

Cis-QTL: beta-approximated empirical p values for grouped phenotypes; for example, multiple splicing phenotypes for each gene, as implemented in FastQTL.

Conditionally independent cis-QTL, following the stepwise regression approach described in [16].

Interaction QTLs: nominal associations for a linear model that includes a genotype × interaction term.

Trans-QTL: nominal associations between all variant–phenotype pairs. To reduce output size, only associations below a given p value threshold (default 1e−5) are stored.

Trans-QTL: beta-approximated empirical p values for inverse-normal-transformed phenotypes, in which case the genome-wide associations with permutations of each phenotype are identical. To avoid potentially confounding cis effects, the computation is performed for each chromosome, using variants on all other chromosomes.

To benchmark tensorQTL, we compared its trans-QTL mapping performance on a machine with and without an attached GPU, and cis-QTL mapping relative to the CPU-based FastQTL [10] (an optimized QTL mapper written in C++). For FastQTL, we computed the runtime per gene by specifying the gene and cis-window using the --include-phenotypes and --region options, respectively. The cis-mapping comparisons were performed using skeletal muscle data from the V6p release of GTEx [16]. To facilitate the comparison of GPU vs. CPU performance when mapping trans-QTLs across a wide range of sample sizes, we used randomly generated genotype, phenotype, and covariate matrices. All tensorQTL benchmarks were conducted on a virtual machine on Google Cloud Platform with 8 Intel Xeon CPU cores (2.30 GHz), 52 GB of memory, and an Nvidia Tesla P100 GPU. For CPU-based comparisons, computations were limited to a single core.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A