Abstract
The accurate identification of protein-ligand binding sites is paramount for rational drug design and understanding molecular mechanisms. Traditional experimental methods, while definitive, are often resource-intensive and time-consuming. Computational approaches have emerged as powerful complements, with AlphaFold revolutionizing protein structure prediction by providing highly accurate models for a vast array of proteins. However, AlphaFold primarily predicts static structures, and the direct inference of ligand binding sites, particularly cryptic or allosteric ones, remains a challenge. This study presents a novel integrated methodology that leverages the structural accuracy of AlphaFold predictions by combining them with diverse experimental data for enhanced binding site identification. We systematically evaluate the performance of this integrated approach against purely computational predictions and solely experimental data, demonstrating significant improvements in precision, recall, and overall F1 score across a diverse set of target proteins. The integration workflow incorporates co-crystallization data, NMR chemical shift perturbations, and Site-Identification by Ligand Competitive Saturation (SILCS) data to refine initial *in silico* predictions. Our findings highlight a synergistic effect, where experimental insights guide and validate computational predictions, leading to more robust and biologically relevant binding site annotations. This approach offers a powerful paradigm for accelerating drug discovery and functional annotation in structural biology.
Keywords
AlphaFold, Protein-ligand binding, Binding site identification, Experimental data integration, Drug discovery, Computational biology, Structural biology, Protein structure prediction