S, is found in only one of the D. capensis proteases
S, is found in only one of the D. capensis proteases, DCAP_3370 in the DCAP cluster. For all previously uncharacterized sequences, SignalP 4.1 [25] was used to predict the location of the signal sequences, if any, while the pro-sequences were predicted by sequence similarity and structural homology to papain. These sequence annotations were then used as the basis for further structure prediction and functional analysis. In addition to the common sequence features in the N-terminal proregion, other variations are observed, such as the presence of C-terminal jasp.12117 granulin domains in some sequences and extra insertions that may be responsible for specific activities in others. Examples of organellespecific targeting sequences are observed; several sequences have a C-terminal KDEL sequence targeting them for retention in the endoplasmic reticulum, while others have targeting sequences indicating their destination in the cells, including signals indicating transport to the vacuole (NPIR, but not FAEAI or LVAE) or the peroxisome (SSM at the C-terminus). The level of sequence conservation among the membersof each cluster varies dramatically, as can be seen in Fig. S8, where sequence conservation is mapped onto the structure of a representative member of each cluster. The sequences in the DCAP cluster are less closely related to each other than the members of any of the other clusters, and some are homologous to reference sequences used by Richau et al. [44]. DCAP_2263 and DCAP_7862 belong to the Richau aleurain (cathepsin H) cluster. In humans, cathepsin H is an aminopeptidase that processes neuropeptides in the brain [55], as well as acting as a lysozomal protein in other tissues. Its barley (Hordeum vulgare) purchase ABT-737 homolog, aleurain, has both aminopeptidase and endopeptidase activity [56], suggesting that DCAP_2263 and DCAP_7862 may have both types of activity as well. This hypothesis is supported by the presence of the Cathepsin H minichain sequence in its plant orthologs, as discussed in the section devoted to these proteins. DCAP_3370 is related to the Richau RD19 (cathepsin F) cluster, and is the only protease in this set that contains the characteristic pro-sequence motif (EX 3RX 3FX 2NX 3AX 3Q), of the RD19 (cathepsin F) family. Human cathepsin F is distinguished by its unusually long pro-domain, which is approximately 100 residues longer than that of other cysteine proteases and adopts a cystatin fold [57]. In contrast, the pro-sequence of DCAP_3370 is about 140 residues, typical for a plant cysteine protease. The last enzyme journal.pone.0158910 in the DCAP cluster, DCAP_5561 is not closely related to anything in either reference set. A BLAST search yields numerous matches to uncharacterized predicted cysteine proteases from a variety of plant genomes, however, the specific function of this enzyme remains enigmatic. 3.3. Molecular Modeling Predicts Many Variations on the Papain Structural Theme Carnivorous plants require a variety of proteases with different substrate affinities and cleavage sites to effectively digest the EPZ-5676 site proteins from their prey, in addition to the standard spectrum of protease activities required by all plants. Cysteine protease activity has previously been inferred from biochemical activity assays of the digestive fluids of D. indica [58], and dionain 1 from D. muscipula has been structurally and biochemically characterized [21]. However, with the exception of the nepethesins and dionain 1, these enzymes have yet to be extensively investigated.S, is found in only one of the D. capensis proteases, DCAP_3370 in the DCAP cluster. For all previously uncharacterized sequences, SignalP 4.1 [25] was used to predict the location of the signal sequences, if any, while the pro-sequences were predicted by sequence similarity and structural homology to papain. These sequence annotations were then used as the basis for further structure prediction and functional analysis. In addition to the common sequence features in the N-terminal proregion, other variations are observed, such as the presence of C-terminal jasp.12117 granulin domains in some sequences and extra insertions that may be responsible for specific activities in others. Examples of organellespecific targeting sequences are observed; several sequences have a C-terminal KDEL sequence targeting them for retention in the endoplasmic reticulum, while others have targeting sequences indicating their destination in the cells, including signals indicating transport to the vacuole (NPIR, but not FAEAI or LVAE) or the peroxisome (SSM at the C-terminus). The level of sequence conservation among the membersof each cluster varies dramatically, as can be seen in Fig. S8, where sequence conservation is mapped onto the structure of a representative member of each cluster. The sequences in the DCAP cluster are less closely related to each other than the members of any of the other clusters, and some are homologous to reference sequences used by Richau et al. [44]. DCAP_2263 and DCAP_7862 belong to the Richau aleurain (cathepsin H) cluster. In humans, cathepsin H is an aminopeptidase that processes neuropeptides in the brain [55], as well as acting as a lysozomal protein in other tissues. Its barley (Hordeum vulgare) homolog, aleurain, has both aminopeptidase and endopeptidase activity [56], suggesting that DCAP_2263 and DCAP_7862 may have both types of activity as well. This hypothesis is supported by the presence of the Cathepsin H minichain sequence in its plant orthologs, as discussed in the section devoted to these proteins. DCAP_3370 is related to the Richau RD19 (cathepsin F) cluster, and is the only protease in this set that contains the characteristic pro-sequence motif (EX 3RX 3FX 2NX 3AX 3Q), of the RD19 (cathepsin F) family. Human cathepsin F is distinguished by its unusually long pro-domain, which is approximately 100 residues longer than that of other cysteine proteases and adopts a cystatin fold [57]. In contrast, the pro-sequence of DCAP_3370 is about 140 residues, typical for a plant cysteine protease. The last enzyme journal.pone.0158910 in the DCAP cluster, DCAP_5561 is not closely related to anything in either reference set. A BLAST search yields numerous matches to uncharacterized predicted cysteine proteases from a variety of plant genomes, however, the specific function of this enzyme remains enigmatic. 3.3. Molecular Modeling Predicts Many Variations on the Papain Structural Theme Carnivorous plants require a variety of proteases with different substrate affinities and cleavage sites to effectively digest the proteins from their prey, in addition to the standard spectrum of protease activities required by all plants. Cysteine protease activity has previously been inferred from biochemical activity assays of the digestive fluids of D. indica [58], and dionain 1 from D. muscipula has been structurally and biochemically characterized [21]. However, with the exception of the nepethesins and dionain 1, these enzymes have yet to be extensively investigated.