We proposed a novel predictor termed STALLION (STacking-based Predictor for ProkAryotic Lysine AcetyLatION) containing six prokaryotic species-specific models to identify Kace sites accurately. To extract crucial patterns around Kace sites, we employed eleven different encodings representing three different characteristics. Subsequently, a systematic and rigorous feature selection approach was employed to identify the optimal feature set independently for five tree-based ensemble algorithms and built their respective baseline model for each species. Finally, the baseline models' predicted values were utilized and trained with an appropriate classifier using the stacking strategy to develop STALLION.
|
Example Protein sequence fragments
>Positive|Q8NTW9|K58
GKSLDIIIPEKHRKAHWDGWD
>Positive|Q8NNA0|K105
FITADNKAIVKYFRKLESGQN
>Positive|Q8NMV1|K114
IAGKSQDEINKRVDEAAATLG
>Negative|P66326|K5
OOOOOOMAGQKIRIRLKAYDH
>Negative|P66326|K24
DHEAIDASARKIVETVTRTGA
>Negative|P66326|K46
VVGPVPLPTEKNVYAVIRSPH
|
|