Annotate constituent domains
This protocol is extracted from research article:
Building blocks and blueprints for bacterial autolysins
PLoS Comput Biol, Apr 1, 2021; DOI: 10.1371/journal.pcbi.1008889

Possible domains within each sequence are annotated via command-line RPS-BLAST (0.01 E-value cutoff) [79] against the Pfam profile database [47] as supplied with NCBI’s Conserved Domain Database (CDD) [48]. The most significant non-overlapping domains within each sequence are identified by the CD-search post-processing utility rpsbproc (0.01 E-value cutoff). Rather than indicating that a portion of a sequence is an instance of a specific type of domain, an annotation could indicate only that it belongs to a domain superfamily, which is generally defined as a set of sequence-similar domains assumed or known to be functionally related. rpsbproc distinguishes these two levels of annotation based on how well a sequence matches a domain profile: a specific domain type is assigned when the sequence matches the domain profile to within a profile-specific threshold, while a more general superfamily type is assigned when the sequence matches a domain profile but does not meet the domain-specific threshold. For simplicity, we label each form of annotation as a “domain type”, appending an apostrophe to the name of a superfamily annotation. Thus for example, NLPC_P60 and NLPC_P60’ can both indicate that the sequence matches the NLPC_P60 domain profile, with the former meeting a threshold that more definitively indicates it is that type of domain and the latter suggesting it could be an instance of that domain or something closely related. The domain types are stored as individual entries, with associations linking a protein to its constituent domain types including the start and stop residues of the domain types within the protein.

Domain types themselves are classified into CAT, CWB, and Other using both Interpro’s GO terms [80] for the domain types, along with manual curation of domain types inferred from CDD’s functional descriptions. In addition, a domain type is inferred to be a CAT or CWB if another member in the same superfamily is annotated to be one. Likewise, a superfamily is inferred to be a CAT or CWB if one of its domains is known to be such. CAT domain types are further classified based on their catalytic target (MurNAc-LAA, N-acetylglucosaminidase, N-acetylmuramidase, Peptidase, and Unknown catalytic) according to [21]. S1 Table summarizes the resulting annotations for the presented results.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.