Sequencing and read mapping

TC Thomas F. Cooke
MY Muh-Ching Yee
MM Marina Muzzio
AS Alexandra Sockell
RB Ryan Bell
OC Omar E. Cornejo
JK Joanna L. Kelley
GB Graciela Bailliet
CB Claudio M. Bravi
CB Carlos D. Bustamante
EK Eimear E. Kenny
request Request a Protocol
ask Ask a question
Favorite

Libraries were sequenced on the Illumina HiSeq 2000 in 2 x 101 bp mode following the standard TruSeq SBS protocol. The eight HapMap samples were sequenced on a single lane, with a mean of 18.3 M paired end reads per sample. In the Argentine study, the two pooled libraries were sequenced on four and five separate lanes respectively, with a mean of 17.5 M reads per sample. Reads were mapped to the human reference genome (build 37) with BWA[23] with the -q 20 parameter to include soft clipping of low quality bases. Local realignment of reads around known indels and base quality recalibration were performed with GATK[18]. We defined the target region for the HapMap samples by taking the union of predicted restriction site fragments between 400–700 bp that had ≥ 3X mean coverage, and where ≥ 10% of reads had a mate pair mapped to a restriction site (S2A Fig). The target region for the Argentine samples was defined in the same way, but with predicted fragments between 200–600 bp. Argentine samples with < 30% of reads mapped to restriction sites (26/89 samples) were excluded from further analyses.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A