The goal of this project was to develop an automated system for scanning and digitizing documents from raw smartphone images. The system aims to bridge the gap between a casual photograph and a high-quality digital scan by addressing common real-world challenges:
- Perspective Distortions: Correcting the angled view of a document.
- Uneven Lighting: Managing shadows and varying brightness.
- Background Noise: Distinguishing the document from its surroundings
Methodology: Two Core Approaches
- The Naive Approach: Relies on edge detection and linear geometry. It involves pre-processing images in various colour spaces (RGB, HSV, Grayscale), applying Canny Edge Detection, and using the Hough Transform to find lines and intersections.
- The Segmentation Approach: A topological method that treats the image as a 3D surface where pixel intensity represents elevation. By “flooding” the image from background seeds at the edges, the algorithm creates “dams” that form the document’s boundaries.
The Processing Pipeline
- Pre-processing & Filtering: Utilizing the HSV colour space (specifically the Saturation channel) and Bilateral Filtering to preserve edges while reducing noise.
- Edge & Corner Detection: Identifying the four most likely corners of the document.
- Perspective Transformation: Using a transformation matrix to rectify the image into a flat, top-down “bird’s-eye” view.
- Enhancement: Applying the Otsu Thresholding algorithm to create a high-contrast binary image for better readability.
- OCR: Using the EasyOCR engine to extract and digitize the textual content.
Key Findings & Challenges
- The Segmentation Bottleneck: Robustly isolating the page from the background remains the most difficult phase. No single algorithm worked perfectly for every image in the dataset.
- Computational Complexity: The geometric approach often faces high time complexity when dealing with many line intersections.
- The Importance of Pre-processing: The success of the final OCR depends heavily on the quality of the initial noise reduction and colour space selection.