In Press, 发布时间: 2025年11月06日 DOI: 10.21769/BioProtoc.5523 浏览次数: 38
评审: Madhumala K. SadanandappaAnonymous reviewer(s)
Abstract
Insects rely on chemosensory proteins, including gustatory receptors, to detect chemical cues that regulate feeding, mating, and oviposition behaviours. Conventional approaches for studying these proteins are limited by the scarcity of experimentally resolved structures, especially in non-model pest species. Here, we present a reproducible computational protocol for the identification, functional annotation, and structural modelling of insect chemosensory proteins, demonstrated using gustatory receptors from the red palm weevil (Rhynchophorus ferrugineus) as an example. The protocol integrates publicly available sequence data with OmicsBox for functional annotation and ColabFold for high-confidence structure prediction, providing a step-by-step framework that can be applied to genome-derived or transcriptomic datasets. The workflow is designed for broad applicability across insect species and generates structurally reliable protein models suitable for downstream applications such as ligand docking or molecular dynamics simulations. By bridging functional annotation with structural characterisation, this protocol enables reproducible studies of chemosensory proteins in agricultural and ecological contexts and supports the development of novel pest management strategies.
Key features
• Designed for insect chemosensory research, demonstrated using gustatory receptors from the red palm weevil (Rhynchophorus ferrugineus) as a representative pest species.
• Combines OmicsBox for functional annotation with ColabFold for reproducible, high-confidence protein structure prediction in a streamlined workflow.
• Accepts input from genome assemblies, transcriptomic datasets, or curated sequence databases, enabling broad application across model and non-model insects.
• Produces reliable structural models suitable for downstream studies, including ligand screening, molecular dynamics simulations, and comparative evolutionary analyses.
Keywords: Gustatory receptorsGraphical overview
Computational workflow for identification, annotation, and structural characterisation of insect gustatory proteins. Databases and software are shown in blue.
Background
Rhynchophorus ferrugineus, commonly known as the red palm weevil (RPW), is a highly invasive pest that poses a serious threat to palm species globally, particularly date, coconut, and oil palms. Native to the Middle East, RPW has rapidly expanded across tropical and subtropical regions, causing substantial agricultural and economic damage [1,2]. Its concealed infestation behaviour and the difficulty of early detection contribute to its ecological impact [3]. Despite extensive management efforts, control remains challenging due to its complex life cycle, adaptability, and increasing resistance to conventional methods. Current strategies, including chemical insecticides and pheromone traps [4], have shown limited success, hindered by the development of resistance and inefficient monitoring systems [5]. These limitations underscore the need for molecular-level approaches to disrupt pest behaviour. Targeting insect chemosensory systems offers a promising alternative, as these systems are essential for survival and reproduction. Chemosensory proteins, particularly gustatory receptors (GRs), which mediate responses to non-volatile compounds involved in feeding and mating, are particularly relevant. However, GRs in RPW remain poorly characterised, and no experimentally resolved structures are currently available [6,7], which complicates molecular intervention efforts.
To address this gap, we present an integrated computational pipeline that combines OmicsBox-based sequence annotation with high-accuracy structural prediction. Functional annotation performed with OmicsBox [8] enables the reliable identification of candidate GRs, while structure prediction with LocalColabFold [9] generates three-dimensional models enriched with confidence metrics such as pLDDT (predicted local distance difference test) and pTM (predicted template modelling) scores and PAE (predicted alignment error). This dual-layered approach provides not only accurate identification of GRs but also structural insights, including the prediction of potential ligand-binding sites. Such integrative information is critical for advancing molecular studies of insect chemosensation and supports downstream applications such as protein-ligand docking and virtual screening. Previous bioinformatics efforts have generally applied these tools independently, depending on the scope of analysis. OmicsBox has been extensively used for transcriptome and genome annotation in insects, such as the cabbage webworm (Hellula undalis) [10] and the boll weevil (Anthonomus grandis grandis) [11], but these studies were limited to sequence-level analyses. Conversely, ColabFold has gained attention for protein modelling [12], though its applications often depend on pre-annotated datasets without integrated annotation workflows. By combining both strategies into a single reproducible pipeline, our protocol overcomes these limitations and enables a more comprehensive characterisation of GR proteins, particularly in non-model pest species.
This workflow was validated through comparison of RPW GRs with proteins from the well-characterised fruit fly (Drosophila melanogaster), confirming its accuracy and reproducibility. This benchmarking demonstrates the pipeline’s ability to generate consistent, biologically meaningful results and highlights its applicability across insect species. Unlike previous studies that addressed annotation or structural modelling separately, our protocol integrates both, providing a holistic characterisation of GR proteins. Its reproducibility, adaptability, and biological relevance make it a reusable framework for advancing insect chemosensory research and supporting innovative pest management strategies.
Software and datasets
1. Data (NCBI, May 2025, public domain, https://www.ncbi.nlm.nih.gov/protein/, free)
2. Data (UniProt, May 2025, CC BY 4.0, https://www.uniprot.org/, free)
3. Script/Code (Custom Python Scripts, May 2025, user-defined, https://github.com/norazlannm/Protein-structure-annotation-and-3D-model-generation/, free)
4. Module [Biopython (Python module), V1.85 (via pip), May 2025, Biopython License, https://biopython.org/wiki/Download, free]
5. Software 1 (OmicsBox, V3.4, June 2025, commercial, https://www.biobam.com/download-omicsbox/, 7-day free trial; paid subscription required thereafter)
6. Software 2 (LocalColabFold, V1.5.5, June 2025, MIT License, https://github.com/YoshitakaMo/localcolabfold, free)
7. Workflow manager
a. Desktop/laptop system
b. AMD Ryzen 7700x, 8-core CPU
c. 64 GB DDR5 RAM, 2 TB SSD
d. NVIDIA RTX 4070 Ti GPU
Software configuration
LocalColabFold was installed using the standard installation instructions from the official GitHub page (https://github.com/YoshitakaMo/localcolabfold), without any special configuration files.
Software dependencies
• LocalColabFold: Requires Python version 3.8 or higher, CUDA version 11.1 or higher, and GCC version 9.0 or higher.
• Custom Python scripts: Depend on the Biopython library.
• Biopython: Requires Python 3.7 or higher.
System requirements
The following hardware requirements are suggested for computational implementation to improve reproducibility:
1. Processor: Multi-core CPU; minimum: ≥4 cores, base clock speed ≥2.5 GHz; recommended: ≥8 cores, base clock speed ≥3.0 GHz.
2. Memory: 8 GB (minimum), 16 GB RAM (recommended).
Note: 16 GB RAM is sufficient for OmicsBox analysis and small protein size, and more than 16 GB RAM is required for large-scale protein modelling using LocalColabFold to avoid memory limitations.
3. Storage: ≥200 GB available disk space for databases (e.g., AlphaFold database ~100 GB), intermediate files (~50 GB, may vary depending on project size), and output files (~50 GB, depends on number and size of the models).
4. Graphics: NVIDIA GPU with CUDA support (≥8 GB VRAM) is recommended for efficient LocalColabFold execution (e.g., RTX 3060 or higher), but LocalColabFold can also run without a GPU, albeit more slowly. For single-protein modelling, ColabFold v1.5.5 notebook mode can be used without a dedicated GPU (https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb).
Note: While these specifications are sufficient for most analyses, a higher-performance workstation (≥8 cores, ≥64 GB RAM, ≥2 TB SSD storage, and GPU) is strongly recommended for large datasets and multiple protein modelling tasks to reduce runtime and improve computational efficiency significantly.
Software/OS requirements
• Operating system: Linux (Ubuntu 20.04+ recommended), Windows 10/11 (for LocalColabFold on WSL2), or macOS version 10.15 or later.
• Python: Version 3.8 or later.
• CUDA Toolkit: (for GPU users) Version 11.1 or later.
• Additional libraries: Biopython version 1.85
Procedure
文章信息
稿件历史记录
提交日期: Aug 25, 2025
接收日期: Oct 14, 2025
在线发布日期: Nov 6, 2025
版权信息
© 2025 The Author(s); This is an open access article under the CC BY-NC license (https://creativecommons.org/licenses/by-nc/4.0/).
如何引用
Kalepu, R., Hamid, A. A. A., Hassan, M., Mohd-Assaad, N. and Muhammad, N. A. N. (2025). A Step-by-Step Computational Protocol for Functional Annotation and Structural Modelling of Insect Chemosensory Proteins. Bio-protocol 15(22): e5523. DOI: 10.21769/BioProtoc.5523.
分类
生物信息学与计算生物学
生物化学 > 蛋白质 > 结构
环境生物学 > 植物 > 植物-昆虫互作
您对这篇实验方法有问题吗?
在此处发布您的问题,我们将邀请本文作者来回答。同时,我们会将您的问题发布到Bio-protocol Exchange,以便寻求社区成员的帮助。
提问指南
+ 问题描述
写下详细的问题描述,包括所有有助于他人回答您问题的信息(例如实验过程、条件和相关图像等)。
Share
Bluesky
X
Copy link



