Therapeutic protein prediction is crucial to accelerating the process of developing better medications. Thus, it is essential to prove isolated proteins' therapeutic potential right away. ThpPred is a tool to screen and identify if a protein is therapeutic or not by combining motif search with RF machine learning model. The explanation for a protein's therapeutic effect cannot be provided by machine learning alone. Therefore, we combined motif-based analysis together with ML approach. In the current study, we have developed a comprehensive platform that allows users to classify therapeutic and non-therapeutic proteins.
The prediction server for therapeutic proteins has been designed in a very user-friendly manner. Here, on this page, users can get the details of all the algorithms and procedures exploited in the different modules.
Main Dataset: It comprises of 356 FDA approved therapeutic proteins and 356 random non-therapeutic proteins.
Alternate Dataset: It comprises 356 FDA approved therapeutic proteins and 3560 random non-therapeutic proteins.
This algorithm of three major modules incorporated in this web server are:
The "Predict" module provides the facility for the user to classify therapeutic and non-therapeutic proteins. Users can provide multiple sequences as input to the server to predict whether the given sequence is therapeutic or non-therapeutic. In this study, we used various machine learning techniques to develop prediction modules.
In this study, the "Design" module is used to create all possible analogs/mutants of the input sequence and identify the best analog, which will act as a therapeutic protein. Using various machine learning techniques, it can predict whether mutants are therapeutic or non-therapeutic.
The "Motif Scan" module is based on searching motifs in the query sequence and predicting therapeutic proteins only based on motifs. No machine learning model is applied here. The input query sequence is searched against the list of positive motifs. A query sequence is predicted as therapeutic if a match or hit is found in the motifs; otherwise, it is predicted as a non-therapeutic protein.