Development of PVP-SVM consisted of four steps: (i) construction of the training and independent datasets; (ii) extraction of various features from the primary sequences, including amino acid composition, atomic composition, chain-transition-composition, dipeptide composition, and physicochemical properties; (iii) generation of 25 different feature sets based on feature importance scores (FIS) computed using the RF algorithm. These different sets were inputted to the SVM to develop their respective prediction models; and (iv) the model producing the best performance in terms of MCC was considered the final model, and the corresponding feature set was considered the optimal feature set.
Reference
PVP-SVM: sequence-based prediction of bacteriophage virion proteins using a support vector machine (Frontiers in Microbiology). [Please cite this paper if you find PVP-SVM useful in your research]