In the project a comprehensive real-time 3D hoop detection and localization system was developed using a ZED M stereo camera and a YOLOv8 model.
Initially, a baseline YOLOv8 network was trained on approximately 1,000 images annotated via Roboflow -incorporating varied viewing angles and distances to ensure robustness across imaging conditions – and a dedicated ROS 2 node was implemented to stream frames and perform real-time inference for hoop detection.
Following conversion of the model weights from `.pt` to ONNX format for seamless integration with the `zed_wrapper`, a 3D bounding cube was generated around the hoop;
however, extraction of the hoop’s plane normal via the ZED API proved unreliable, prompting adoption of a keypoints-based strategy.
The dataset was subsequently expanded to roughly 3,000 images through data augmentation, and the four corner keypoints of the hoop (LB, LT, RB, RT) were manually labelled.
A YOLOv8-Pose model was then trained to accurately detect these keypoints.
Finally, the classical Perspective-n-Point (PnP) algorithm was applied, using the hoop’s known diameter (0.72 m) as a scale reference, to recover the camera’s translation and rotation vectors, which were converted into yaw, pitch, and roll angles and used to compute the precise distance to the hoop’s center.
The system overlays a color-coded XYZ axes triad on the live video stream and displays real-time orientation and distance metrics.
Performance evaluations demonstrated accurate predictions and orientation estimates, providing a solid foundation for continued integration with the autonomous drone.