Abstract
Halide perovskites have emerged as promising materials for photovoltaic applications due to their exceptional optoelectronic properties. However, the vast compositional space and the need for stable, non-toxic, and high-efficiency materials pose significant challenges for traditional trial-and-error approaches. Machine learning (ML) offers a powerful alternative to accelerate the discovery and optimization of novel halide perovskites. In this study, we develop a comprehensive ML framework that integrates high-throughput density functional theory (DFT) calculations, experimental data, and advanced regression models to predict key properties such as band gap, formation energy, and stability. We curate a dataset of over 10,000 halide perovskite compositions from literature and computational sources, including lead-free double perovskites and hybrid organic-inorganic perovskites. Using gradient boosting, random forest, and neural network models, we achieve high predictive accuracy (R² > 0.85) for band gap and formation energy. Feature importance analysis reveals that tolerance factor, octahedral factor, and elemental electronegativity are critical descriptors. The ML models are employed to screen over 50,000 hypothetical compositions, identifying 120 promising candidates with optimal band gaps (1.1–1.6 eV) and high thermodynamic stability. Experimental validation of top candidates confirms the reliability of predictions. This work demonstrates the efficacy of ML in navigating the complex perovskite landscape, significantly reducing the time and cost of materials discovery. Our approach provides a blueprint for accelerating the development of next-generation photovoltaic materials.
Keywords
machine learning, halide perovskites, photovoltaics, materials discovery, band gap prediction, high-throughput screening, lead-free perovskites, density functional theory