Estimating the Position of a Moving Body from Monocular Video

Cameras traditionally capture a two-dimensional representation of a space, yet many applications, such as robotics, autonomous navigation, and augmented reality, necessitate a three-dimensional comprehension of the environment. Various strategies have emerged to address this need, including specialized hardware like LiDAR, laser, and radar, as well as the use of paired cameras with a known separation for triangulation. However, circumstances may arise, such as budget constraints or the availability of only single-camera imagery, where these solutions are impractical. To address these challenges, our project delved into diverse methodologies aimed at solving this issue. We subsequently utilized one of these methods as the foundation for estimating the three-dimensional position of objects from single-camera video.