# Also in the Article

Statistical analysis
This protocol is extracted from research article:
Genetic loci associated with skin pigmentation in African Americans and their effects on vitamin D deficiency
PLoS Genet, Feb 18, 2021;

Procedure

M-Index was log-transformed to normalize the distribution in the population. For the analysis of M-Index in the GWAS discovery dataset, a linear model was used adjusting for age, sex, and the first 3 principal components (PCs). The model building process included up to 10 PCs in the regression model. The final model includes minimum number of PCs necessary to correct for population structure and reduce genomic inflation. In the replication and pooled analysis, West African Ancestry (WAA) was used instead of principal components. Individual admixture proportion was estimated using STRUCTURE v2.3, a model-based clustering method [47,48]. STRUCTURE was run under the admixture model using K = 3 ancestral populations (West African, European, and Native American). We used a burn-in length of 100,000 for 100,000 repetitions. For the GWAS analyses we used P<5.0 x 10−8 as the genome-wide significant threshold and P<0.05 as a statistically significant cutoff for replication.

A weighted Genetic Score was calculated using the top 3 and 10 associated SNPs for skin pigmentation. The weighted Genetic Score is sum of the effects of each SNP weighted by its estimated effect size (β) from regression model, $GeneticScore=∑j=1m(χijβj)$, where m is number of SNP included and χij is the genotype for the ith individual and jth SNP (coded as 0, 1, and 2 for increase number of allele associated with darker skin pigmentation) [49]. One SNP from a single genomic region was included for calculation. When there were more than 2 SNPs with P<0.05 in a same genomic region, the SNP with the lowest P-value was used after conditional analysis to test if the SNP with the second lowest P-value was independently associated with M-Index by including the lead SNP in the region in the regression model. The Genetic Score calculated from top SNPs from 3 and 10 loci were initially assessed to estimate variance in skin pigmentation. Because the top 3 and top 10 SNPs accounted for a similar amount of skin pigmentation variation, subsequently, we focused our analysis using the Genetic Score estimated from the top 3 SNPs. Sex-specific Genetic Scores were also calculated. First, linear regression analysis was performed separately for males and females for the top 3 SNPs associated with M-Index in the pooled dataset. Then, β coefficients obtained for each SNP in males and females were used for calculation of sex-specific Genetic Scores.

Associations with serum 25(OH)D levels were tested using linear regression models adjusting for age, UV season (season of blood draw), total vitamin D intake, recruitment site, and WAA as described previously [31]. Log-transformed serum 25(OH)D levels were used in the linear regression analysis. Genetic Score from vitamin D metabolic and signaling pathway gene variants were calculated using the same formula described above using the SNPs associated with serum 25(OH)D levels in our dataset. Binary logistic regression was used to examine the association between skin pigmentation gene variants and vitamin D deficiency adjusting for age, UV season, total vitamin D intake, recruitment site, WAA, and also Genetic Score calculated from vitamin D metabolic pathway gene variants associated with serum 25(OH)D levels. Statistical analysis was performed using PLINK 1.07 [50], SPSS (IBM Corp., Armonk, NY), and R.

Note: The content above has been extracted from a research article, so it may not display correctly.

Q&A