Deducing the nucleotide sequence (base calling) from the raw data (Fast5 files) was performed with the Nanopore Guppy application at high or super accuracy mode. The sequences (FastQ files) were filtered and analyzed by a dedicated application termed Telomere Analyzer 60. We noticed a difficulty of the base calling application in distinguishing between purines (A and G) in the context of the telomeric repeats, the repeats on the C-strand were identified by searching for the sequence CCCTRR (R represents a purine) in the six possible permutations. The telomere length and read length were extracted and presented on plots of telomere density along the entire read. Telomere density was calculated in a moving window of 100 nt and describes the portion within each 100 nt sequence that contains telomeric repeats (from 0—no telomeric sequence to 1—fully telomeric). A telomere is identified by a summed telomere density of at least 2 for at least three consecutive windows, each having a telomere density of at least 0.3. Then, the telomere beginning and end are defined by the first and last telomeric repeat within the identified windows or flanking windows.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.