Association analysis of Mtb genome with MDR -TB phenotypes
We have used mixed linear model for associating the SNPs with multi-drug resistant phenotype using GAPIT. GAPIT is a package runs in the R software environment, which is freely downloaded from http://www.r-project.org and https://zzlab.net/GAPIT/. We preferred mixed linear model (MLM) for association mapping over General Linear Model (GLM) because the genotypes were segregated in multiple lineages.
For the covariant analysis, the simplest model (t test) is to directly detect the association between a phenotype (y) and markers (Si) one at a time, where i=1 to m, and m is number of markers.
The mixed linear model adds the genetic effects as random cofactor effects with variance structure defined by the kinship (K) among individuals. In both Q or Q+K models, Q and K stay the same. There are no cofactors that are adjusted by the marker tests.
MLM includes both fixed and random effects. Including individuals as random effects gives an MLM the ability to incorporate information about relationships among individuals. This information about relationships is conveyed through the kinship (K) matrix, which is used in an MLM as the variance - covariance matrix between the individuals. When a genetic marker- based kinship matrix (K) is used jointly with population structure (commonly called the “Q” matrix, and can be obtained through STRUCTURE or conducting a principal component analysis), the “Q+K” approach improves statistical power compared to “Q” only.
An MLM can be described using Henderson’s matrix notation as follows: Y = Xβ + Zu + e, (1)
where Y is the vector of observed phenotypes; β is an unknown vector containing fixed effects, including the genetic marker, population structure (Q), and the intercept; u is an unknown vector of random additive genetic effects from multiple background QTL for individuals/lines; X and Z are the known design matrices; and e is the unobserved vector of residuals.
To run the GAPIT, minimally, we need
Genotype file – we used hapmap file format.
Phenotype file – As given in Supplementary file (1 & 2)
Change to the folder where analysis has to be done setwd(path_to_the_folder_where_analysis_has_to_be_done)
Import the following packages under R environment: library(multtest) library(“gplots”) library (“LDheatmap”) library(“genetics”) library(“compiler”)
Check the phenotypedata as follows str(phenotype_file) mean(phenotype_file$mdr) range(phenotype_file$mdr) which(is.na(phenotype_file$mdr))
Analyse the association with following commands analysis <- GAPIT(y = phenotype_file, G= genotype_file, PCA.total= 4, Major.allele.zero=1)
A lot of data will be generated from the analysis. Parameters should be adjusted based on the results. In the manuscript, we selecteda corrected p-valueof 10–5 as the threshold for selecting associated genes and used Bonferroni algorithm for FDR correction.
Readers should cite both the Bio-protocol preprint and the original research article where this protocol was used:
Naz, S and Nandicoori, V(2023). Association Analysis. Bio-protocol Preprint. bio-protocol.org/prep2509.
Naz, S., Paritosh, K., Sanyal, P., Khan, S., Singh, Y., Varshney, U. and Nandicoori, V. K.(2023). GWAS and functional studies suggest a role for altered DNA repair in the evolution of drug resistance in Mycobacterium tuberculosis. eLife. DOI: 10.7554/eLife.75860
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this
article to respond.
0/150
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.
Spinning
Post a Question
0 Q&A
Spinning
This protocol preprint was submitted via the "Request
a Protocol" track.