How good are protein disorder prediction programmes actually?
Until now it was difficult to answer this question, as a good benchmark for testing these bioinformatics programmes was lacking. AU scientists, Dr. Jakob T. Nielsen and Dr. Frans A.A. Mulder present an analysis in Scientific Reports using a comprehensive compilation of experimental data from NMR spectroscopy.
Disorder in proteins is vital for biological function, and structural disorder in protein is more pervasive than you might think. Proteins with disordered regions may also be sticky, and clump together inside and between cells, and are directly implicated in a number of neurodegenerative diseases. Thus, being able to identify disordered regions in proteins is highly important.
Unfortunately, it is challenging and time-consuming to characterise the structural propensities of polypeptides experimentally, and therefore bioinformatics methods for predicting protein disorder from sequence are indispensable.
Over recent years many bioinformaticians have therefore constructed algorithms to differentiate peptide sequences that will fold from those that do not, and these algorithms can be based on various 'features', derived from physicochemical parameters (like charge or hydrophobicity of an amino acid) as well as looking at evolutionary relatedness.
Now that many such prediction programs have become available, it is of obvious value to have some kind of benchmark to validate and test the predictions. To resolve this quandary, Nielsen and Mulder generated and validated a representative experimental benchmarking set of site-specific and continuous disorder, using deposited NMR chemical shift data for more than a hundred selected proteins. They then analysed the performance of 26 widely-used disorder prediction methods and found that these vary noticeably.
The thorough comparison presented in their research will help protein scientists around the globe to make better informed choices about which programmes are best to use.