We found that for many genes, annotated 3' UTRs in the Ensembl 93 zebrafish reference transcriptome were shorter than true UTR lengths observed empirically in pileups of reads mapped to the genome. This led to genic reads being counted as intergenic. To correct for this bias in aligning reads to the transcriptome, we extended all 3' UTR annotations by 500 bp. In rare cases, UTR extension resulted in overlap with a neighboring gene and in these instances we manually truncated the extension to avoid such overlap. We built a custom zebrafish STAR genome index using gene annotations from Ensembl GRCz11 with extended 3’ UTRs plus manually annotated entries for mCherry transcript, filtered for protein-coding genes.
The custom index can be accessed here:
http://os.bio-protocol.org/attached/file/20210405/GRCz11.Ensembl.93.protein.coding.transcripts.3p.UTR.extended.500.bp.plus.mCherry.gtf.gz
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this
article to respond.