All the study samples were genotyped using the OncoArray Illumina beadchip65. The array includes a backbone of ~260,000 SNPs that provide genome-wide coverage of most common variants, together with markers of interest for breast and other cancers identified through GWAS, fine-mapping of known susceptibility regions, and other approaches65.

A standard genotype quality control process was followed for both the BCAC and CIMBA samples which have been described in detail elsewhere35,48. Briefly, this involved excluding SNPs located on chromosome Y; SNPs with call rates <95%; SNPs with MAF < 0.05 and call rate <98%; monomorphic SNPs; and SNPs for which evidence of departure from Hardy-Weinberg equilibrium was observed (P < 10−7 based on a country-stratified test).

Genotypes for ~21 million SNPs were imputed for all subjects using the 1000 Genomes Phase III data (released October 2014) as reference panel, as described previously66. Briefly, the number of reference haplotypes used as templates when imputing missing genotypes was fixed to 800 (-k_hap = 800). A two-stage imputation approach was used: phasing with SHAPEIT67,68 and imputation with IMPUTE269 using 5 Mb non-overlapping intervals. Genotypes were imputed for all SNPs that were found polymorphic (MAF > 0.1%) in either European or Asian populations.

The genome-wide imputation process described above was carried out separately for the BCAC and CIMBA samples. However, this may potentially lead to spurious associations if there are differences in the quality of the imputation (measured using the imputation accuracy r² metric70) for a given SNP between the two datasets. To address this, a stringent approach was employed which involved including only SNPs for which the difference in r² between the BCAC and CIMBA SNP imputations (Δr²) was minimal relative to their r² values. SNPs with r² > 0.9 in both BCAC and CIMBA were kept in the analyses only if Δr² < 0.05; SNPs with 0.8 < r² ≤ 0.9 in both BCAC and CIMBA were kept if Δr² < 0.02 and, SNPs with 0.5 < r² ≤ 0.8 in both BCAC and CIMBA were kept if Δr² < 0.01. All SNPs with r² < 0.5 in either CIMBA or BCAC were excluded. Only SNPs with a MAF > 0.01 in BCAC cases were included.

Consequently, 9,072,535 SNPs were included in the BRCA1 analyses (402,336 genotyped and 8,670,199 imputed SNPs) and 9,047,403 SNPs in the BRCA2 analyses (402,397 genotyped and 8,645,006 imputed SNPs).

