Deciphering apicoplast targeting signals--feature extraction from nuclear-encoded precursors of Plasmodium falciparum apicoplast proteins
journal contribution
posted on 2023-06-08, 18:23authored byJochen Zuegge, Stuart Ralph, Michael SchmukerMichael Schmuker, Geoffrey I McFadden, Gisbert Schneider
The malaria causing protozoan Plasmodium falciparum contains a vestigal, non-photosynthetic plastid, the apicoplast. Numerous proteins encoded by nuclear genes are targeted to the apicoplast courtesy of N-terminal extensions. With the impending sequence completion of an entire genome of the malaria parasite, it is important to have software tools in place for prediction of subcellular locations for all proteins. Apicoplast targeting signals are bipartite; containing a signal peptide and a transit peptide. Nuclear-encoded apicoplast protein precursors were analyzed for characteristic features by statistical methods, principal component analysis, self-organizing maps, and supervised neural networks. The transit peptide contains a net positive charge and is rich in asparagine, lysine, and isoleucine residues. A novel prediction system (PATS, predict apicoplast-targeted sequences) was developed based on various sequence features, yielding a Matthews correlation coefficient of 0.91 (97% correct predictions) in a 40-fold cross-validation study. This system predicted 22% apicoplast proteins of the 205 potential proteins on P. falciparum chromosome 2, and 21% of 243 chromosome 3 proteins. A combination of the PATS results with a signal peptide prediction yields 15% potentially nuclear-encoded apicoplast proteins on chromosomes 2 and 3. The prediction tool will advance P. falciparum genome analysis, and it might help to identify apicoplast proteins as drug targets for the development of novel anti-malaria agents.