First, we retrieved all RI from element families which are known to have been recently active (all AluJ, AluS and AluY subfamilies, all LINE-1HS and LINE1-PA subfamilies, all SVAs) in the two species reference sequences (GRCh37-hg19 and panTro5) from RepBase [36]. The 5′ and 3′ flanking regions (100 bp) for all retrieved insertions were aligned using blastn (identity 95%) to the genome of the other species in order to find the respective putative empty (pre-insertional) sites. Two matching sequences (at least 85 bp), in close proximity to each other (less than 50 bp), were selected as a putative “empty” site for each “filled” site. These putative empty sites were then aligned back to the first species DNA using blastn (identity 95%) in order to confirm them as pre-insertional loci. After this procedure, we obtained the insertions specific to the first species (i.e. absent in the second) and vice versa.
RT-DB insertions were retrieved from the human reference sequence GRCh37-hg19 and represent all reference insertions of AluS and AluY subfamilies, LINE-1HS, LINE-1PA2, LINE-1PA3, LINE-1PA4 and all SVAs annotated in RepBase [36].
RT-DB Chimp insertions were retrieved from the chimpanzee reference sequence PanTro5 and represent all reference insertions of AluS and AluY subfamilies, LINE-1Pt, LINE-1PA2, LINE-1PA3, LINE-1PA4 and all SVAs annotated in RepBase [36].
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.