Abstract
This paper presents a comprehensive review of vision-based techniques used for detecting human-object interactions (HOI) in still images. We explore various approaches divided into four main categories: traditional computer vision methods, deep learning-based methods, hybrid methods, and transformer-based methods. For each category, we discuss key algorithms, their strengths, and limitations. Furthermore, we provide an overview of publicly available datasets commonly used for HOI detection and evaluate the performance of different techniques on these benchmarks. The review also highlights current challenges in the field and outlines promising directions for future research, including the integration of common-sense reasoning and the development of more robust and interpretable models. This survey aims to serve as a valuable resource for researchers and practitioners interested in understanding the state-of-the-art and advancing the field of human-object interaction detection.
Keywords
Human-object interaction, HOI detection, Computer vision, Deep learning, Still images, Object recognition, Action recognition, Survey